Experimental Estimation of Number of Clusters Based on Cluster Quality

Volume 12, Issue 4, pp 304-315 http://dx.doi.org/10.22436/jmcs.012.04.06

Download PDF

Download XML

2326 Downloads
4185 Views

Authors

G. Hannah Grace - Department of Mathematics, School of Advanced Sciences, VIT University, Chennai 600127, India. Kalyani Desikan - Department of Mathematics, School of Advanced Sciences, VIT University, Chennai 600127, India.

Abstract

Text Clustering is a text mining technique which divides the given set of text documents into significant clusters. It is used for organizing a huge number of text documents into a well-organized form. In the majority of the clustering algorithms, the number of clusters must be specified apriori, which is a drawback of these algorithms. The aim of this paper is to show experimentally how to determine the number of clusters based on cluster quality. Since partitional clustering algorithms are well-suited for clustering large document datasets, we have confined our analysis to a partitional clustering algorithm.

Share and Cite

ISRP Style

G. Hannah Grace, Kalyani Desikan, Experimental Estimation of Number of Clusters Based on Cluster Quality, Journal of Mathematics and Computer Science, 12 (2014), no. 4, 304-315

AMA Style

Grace G. Hannah, Desikan Kalyani, Experimental Estimation of Number of Clusters Based on Cluster Quality. J Math Comput SCI-JM. (2014); 12(4):304-315

Chicago/Turabian Style

Grace, G. Hannah, Desikan, Kalyani. "Experimental Estimation of Number of Clusters Based on Cluster Quality." Journal of Mathematics and Computer Science, 12, no. 4 (2014): 304-315

Keywords

clusters
cluster quality
CLUTO
entropy
purity.

MSC

62H30
68T10

References

[1] Jiawei Han, Micheline Kamber, Jian Pei , Data Mining Concepts and Techniques, second edition Morgan Kaufmann Publishers, ISBN 13: 978-1-55860-901-3. ()
- Google Scholar

[2] Pankaj Jajoo , Document clustering, IIT Kharagpur, Thesis (2008)

[3] A. K. Jain, M. N. Murty, P. J. Flynn, Data Clustering: A Review, ACM Computing Surveys, Vol.31, No.3, September (1999)
- View Article
- Google Scholar

[4] Ying Zhao, George Karypis, Empirical and Theoretical Comparisons of Selected Criterion Functions for Document Clustering, supported by NSF ACI-0133464, CCR-9972519, EIA-9986042, ACI-9982274, and by Army HPC Research Center. (2004)
- View Article
- Google Scholar

[5] K. P. Soman, Shyam Diwakar, V. Ajay, Insight Into Data Mining: Theory and Practice, by Prentice Hall of India Private Limited , ISBN-81-203-2897-3. ( 2006 )
- Google Scholar

[6] Satya Chaitanya Sripada, Dr. M. Sreenivasa Rao, Comparison of purity and entropy of k-means clustering and fuzzy c means clustering, Indian journal of computer science and engineering; Vol 2 no.3 June , ISSN:0976-5166. (2011)
- Google Scholar

[7] Tim Van de Cruys, Mining for meaning: the extraction of lexico-semantic knowledge from text, Dissertation, Evaluation of cluster quality, chapter 6 , University of Groningen (2010)
- Google Scholar

[8] Anna Huang, Similarity measures for Text Document Clustering, University of Waikato, Hamilton, New Zealand, NZCSRSC 2008, Christ Church, New Zealand (2008)
- Google Scholar

[9] CLUTO-A Clustering Toolkit, , http://glaros.dtc.umn.edu/gkhome/views/cluto, ()
- Google Scholar