Tabsum- A New Persian Text Summarizer

Volume 11, Issue 4, pp 330-342 http://dx.doi.org/10.22436/jmcs.011.04.08

Download PDF

Download XML

4067 Downloads
5687 Views

Authors

Saeid Masoumi - M.Sc in Software Engineering at University of Tabriz, Tabriz, Iran. Mohammad-Reza Feizi-derakhshi - Assistant professor at University of Tabriz, Tabriz, Iran. Raziyeh Tabatabaei - M.Sc in Software Engineering at University of Tabriz, Tabriz, Iran.

Abstract

With the rapid increase in the amount of online text information, it became more important to have tools that would help users distinguish the important content. Automatic text summarization attempts to address this problem by taking an input text and extracting the most important content of it. However, the determination of the salience of information in the text depends on different factors and remains as a key problem of automatic text summarization. In the literature, there are some studies that use lexical chains as an indicator of lexical cohesion in the text and as an intermediate representation for text summarization. Also, some studies make use of genetic algorithms in order to examine some manually generated summaries and learn the patterns in the text which lead to the summaries by identifying relevant features which are most correlated with human generated summaries. In this study, we combine these two approaches of summarization. Firstly, some of preprocessing operations like normalizer, tokenizer, stop word remover, stemmer, and POS tagger are done on the text. After that for each sentence we have only semantic words that are independent. Then, by set of position, thematic, and coherence features we score sentences. The final score of each sentence will be the integration of those features. Each feature has its own weight and should be identified to have well summary. For this reason first system goes throw learning phase to determine ache feature weight by genetic algorithm. The next phase is testing phase. In this phase system receives new documents and uses Persian WordNet and lexical chains to extract deep level of knowledge about the text. This knowledge is combined with other higher level analysis results. Finally, sentences are scored, sorted, and selected and summary is made. We evaluated our proposed system by two methods. 1) Precision/recall, 2) TabEval (a new evaluation tool for Persian text summarizers). We compared our system with two other Persian summarizers (FarsiSum, Ijaz). Results showed that our system had higher performance rather than others (i.e. higher precision/recall average and the best average score of TabEval).

Share and Cite

ISRP Style

Saeid Masoumi, Mohammad-Reza Feizi-derakhshi, Raziyeh Tabatabaei, Tabsum- A New Persian Text Summarizer, Journal of Mathematics and Computer Science, 11 (2014), no. 4, 330-342

AMA Style

Masoumi Saeid, Feizi-derakhshi Mohammad-Reza, Tabatabaei Raziyeh, Tabsum- A New Persian Text Summarizer. J Math Comput SCI-JM. (2014); 11(4):330-342

Chicago/Turabian Style

Masoumi, Saeid, Feizi-derakhshi, Mohammad-Reza, Tabatabaei, Raziyeh. "Tabsum- A New Persian Text Summarizer." Journal of Mathematics and Computer Science, 11, no. 4 (2014): 330-342

Keywords

Summarization
Text Summarizer
Mono-Document Summarization
Extractive Summarization
Persian Text Summarization.

MSC

68Uxx

References

[1] A. Kiani, M. R. Akbarzadeh, Automatic Text Summarization Using: Hybrid Fuzzy GA-GP, In IEEE International Conference on Fuzzy Systems, (2006)
- View Article
- Google Scholar

[2] I. Mani, Automatic Summarization, John Benjamins Publishing Company, Amsterdam/Philadelphia (2001)
- Google Scholar

[3] Inderjcet Main, the MITRE corporation 11493 Sanset Hills noad , , USA (2003)
- Google Scholar

[4] N. Mazdak, FarsiSum-a persian text summarizer, Master thesis,Department of linguistics, Stockholm University. (2004)
- Google Scholar

[5] H. Dalianis, SweSum-A Text Summarizer for Swedish, Technical report, TRITANA-P0015, IPLab-174. (2000)
- Google Scholar

[6] M. Shamsfard, T. Akhavan, M. E. Joorabchi, Persian Document Summarization by Parsumist, World Applied Sciences Journal 7 (Special Issue of Computer & IT), (2009), 199- 205
- Google Scholar

[7] F. Kiyomarsi, F. R. Esfahani, Optimizing Persian Text Summarization Based on Fuzzy Logic Approach, International Conference on Intelligent Building and Management. , (2011)
- Google Scholar

[8] Z. Karimi, M. Shamsfard, Summarization of Persian texts, In Proceedings of 11th International CSI computer Conference, Tehran, Iran. (2006)
- Google Scholar

[9] M. A. Honarpisheh, G. Ghasem-sani, G. Mirroshandel, A Multi-Document Multi-Lingual Automatic Summarization System, Proceedings of the 3rd Joint Conference on Natural Language Processing, (2008), 733-738.
- Google Scholar

[10] G. F. D. Jong, An overview of the FRUMP system, W. G. Lehnert and M. H. Ringle (Editors), Strategies for Natural Language Processing, Erlbaum, Hillsdale, NJ (1982)
- Google Scholar

[11] U. Hahn, I. Mani, Automatic Text Summarization: Methods, Systems, and Evaluations, In International Joint Conference on Artificial Intelligence (IJCAI), (1998)
- Google Scholar

[12] E. Hovy, C. Y. Lin, Automated Text Summarization in SUMMARIST, I. Mani and M. T. Maybury (Editors), Advances in Automatic Text Summarization, , The MIT Press, Cambridge, MA, (1999), 81-94