Experimental Estimation of Number of Clusters Based on Cluster Quality
-
2326
Downloads
-
4185
Views
Authors
G. Hannah Grace
- Department of Mathematics, School of Advanced Sciences, VIT University, Chennai 600127, India.
Kalyani Desikan
- Department of Mathematics, School of Advanced Sciences, VIT University, Chennai 600127, India.
Abstract
Text Clustering is a text mining technique which divides the given set of text documents into significant clusters. It is used for organizing a huge number of text documents into a well-organized form. In the majority of the clustering algorithms, the number of clusters must be specified apriori, which is a drawback of these algorithms. The aim of this paper is to show experimentally how to determine the number of clusters based on cluster quality. Since partitional clustering algorithms are well-suited for clustering large document datasets, we have confined our analysis to a partitional clustering algorithm.
Share and Cite
ISRP Style
G. Hannah Grace, Kalyani Desikan, Experimental Estimation of Number of Clusters Based on Cluster Quality, Journal of Mathematics and Computer Science, 12 (2014), no. 4, 304-315
AMA Style
Grace G. Hannah, Desikan Kalyani, Experimental Estimation of Number of Clusters Based on Cluster Quality. J Math Comput SCI-JM. (2014); 12(4):304-315
Chicago/Turabian Style
Grace, G. Hannah, Desikan, Kalyani. "Experimental Estimation of Number of Clusters Based on Cluster Quality." Journal of Mathematics and Computer Science, 12, no. 4 (2014): 304-315
Keywords
- clusters
- cluster quality
- CLUTO
- entropy
- purity.
MSC
References
-
[1]
Jiawei Han, Micheline Kamber, Jian Pei , Data Mining Concepts and Techniques, second edition Morgan Kaufmann Publishers, ISBN 13: 978-1-55860-901-3. ()
-
[2]
Pankaj Jajoo , Document clustering, IIT Kharagpur, Thesis (2008)
-
[3]
A. K. Jain, M. N. Murty, P. J. Flynn, Data Clustering: A Review, ACM Computing Surveys, Vol.31, No.3, September (1999)
-
[4]
Ying Zhao, George Karypis, Empirical and Theoretical Comparisons of Selected Criterion Functions for Document Clustering, supported by NSF ACI-0133464, CCR-9972519, EIA-9986042, ACI-9982274, and by Army HPC Research Center. (2004)
-
[5]
K. P. Soman, Shyam Diwakar, V. Ajay, Insight Into Data Mining: Theory and Practice, by Prentice Hall of India Private Limited , ISBN-81-203-2897-3. ( 2006 )
-
[6]
Satya Chaitanya Sripada, Dr. M. Sreenivasa Rao, Comparison of purity and entropy of k-means clustering and fuzzy c means clustering, Indian journal of computer science and engineering; Vol 2 no.3 June , ISSN:0976-5166. (2011)
-
[7]
Tim Van de Cruys, Mining for meaning: the extraction of lexico-semantic knowledge from text, Dissertation, Evaluation of cluster quality, chapter 6 , University of Groningen (2010)
-
[8]
Anna Huang, Similarity measures for Text Document Clustering, University of Waikato, Hamilton, New Zealand, NZCSRSC 2008, Christ Church, New Zealand (2008)
-
[9]
CLUTO-A Clustering Toolkit, , http://glaros.dtc.umn.edu/gkhome/views/cluto, ()