A Novel Method for Document Clustering Using Ant-fuzzy Algorithm


Authors

Javad Rajaie - Department of Computer Engineering, Mahshahr Branch, Islamic Azad University, Mahshahr, Iran Babak Fakhar - Department of Computer Engineering, Mahshahr Branch, Islamic Azad University, Mahshahr, Iran


Abstract

Availability of large full-text document collection in electronic forms has created a need for tools techniques that assist users in organization. Document clustering is one of the popular methods used for this purpose. Ant-based text clustering is a promising technique that has attracted great research attention. This paper attempts to improve the standard ant-based text-clustering algorithm. The ant behavior model is modified to pursue better algorithmic performance. In this paper, a hybrid approach based on Ant clustering and Fuzzy clustering methods is used. First ant based clustering is used for creating raw and imprecise clusters and then these clusters are refined by means of fuzzy C-Mean (FCM) algorithm. For large datasets these two stages does not suffice and many homogenous small clusters are formed. Thus more iteration of these two stages is usually required and clusters from previous iterations are used as a building block in the following iterations to build finer and larger clusters. The proposed algorithm is tested with a sample set of documents excerpted from the Reuters-21578 corpus and the experiment results partly indicate that the proposed algorithm perform better than the standard ant-based text-clustering algorithm and the k-means algorithm.


Keywords


MSC


References