A Review of Artificial Intelligence Algorithms in Document Classification

Journal title

International Journal of Electronics and Telecommunications




No 3

Publication authors

Divisions of PAS

Nauki Techniczne


Polish Academy of Sciences Committee of Electronics and Telecommunications




ISSN 2081-8491 (until 2012) ; eISSN 2300-1933 (since 2013)


Yan T. (1995), Sift-a tool for wide-area information dissemination, null, 177. ; Lang K. (1995), Newsweeder: learning to filter netnews, null, 331. ; Shang W. (2006), A noval feature selection algorithm for text categorization, Elsevier, science Direct Expert system with application, 33, 1, ; Chakrabarti S. (2003), Fast and accurate text classification via multiple linear discriminant projection, The International Journal on Very Large Data Bases (VLDB), 170, ; Weiss S. (1996), Text classification in usenet newsgroup: a progress report, null. ; Hull D. (1996), Document routing as statistical classification, null. ; C. Faloutsos and D. Oard, "A survey of information retrieval and filtering methods," University of Maryland, MA, Tech. Rep. CS-TR-3541, 1995. ; Montanes E. (2003), Measures of rule quality for feature selection in text categorization, null, 589. ; Fox C. (1992), Lexical analysis and stoplist, null, 102. ; Geisser S. (1992), Predictive Inference. ; Liu H. (1998), Feature Extraction, construction and selection: A Data Mining Perspective, ; Wang Y. (2005), A new approach to feature selection in text classification, null, 6, 3814. ; Aurangzeb K. (2010), A review of machine learning algorithms for text-documents classification, Journal of Advances in Information Technology, 1, 1. ; Wang Z.-Q. (2006), An optimal svmbased text classification algorithm, null, 13. ; Miguel E. (2002), Information Retrieval, 87. ; Myllymaki P. (1993), Bayesian case-based reasoning with neural network, null, 1, 422. ; Yu B. (2008), Latent semantic analysis for text categorization using neural network, Knowledge-Based Systems, 21, 900, ; Tam V. (2002), A comparative study of centroidbased, neighborhood-based and statistical approaches for effective document categorization, null, 235. ; Cichosz P. (2000), Systemy uczce si. ; Changa M. (1996), Using phrases as fetures in email classification, The Journal of Systems International Conference on Research and Development in Informational Retrieval, 307. ; Joachims T. (1998), Text categorization with support vector machines: Learning with many relevant features, null, 137. ; Kim H. (2009), Associative naive bayes classifier: Automated linking of gene ontology to medline documents, Pattern Recognition, 1777, ; Apte C. (1994), Towards language independent automated learning of text categorization models, null, 23. ; Apte C. (1994), Automated learning of decision rules for text categorization, ACM Transactions on Information Systems (TOIS), 12, 3, 233, ; Wu C.-H. (2009), Behavior-based spam detection using a hybrid method of rule-based techniques and neural networks, Expert Systems with Applications, 4321, ; McCallum A. (2003), A comparison of event models for naive bayes text classification, Journal of Machine Learning Research, 3, 1265. ; Rish I. (2001), An empirical study of the naive bayes classifier, null. ; Isa D. (2008), Text documents preprocessing with the bahes formula for classification using the support vector machine, IEEE, Traction of Knowledge and Data Engineering, 20, 1264, ; Isa D. (2008), Using self organizing map for clustering of text documents, Elsever, Expert System with Applications. ; Domingos P. (1997), On the optimality of the simple bayesian classifier under zero-one loss, Machine Learning, 29, 103, ; Guzella T. (2009), A review of machine learning approches to spam filtering, Elsever, Expert System with Applications. ; Vapnik V. (1995), The Nature of Statistical Learning Theory, ; Bilski P. (2011), Automated selection of kernel parameters in diagnostics of analog systems, Electrical Review, 5, 9. ; Brcher H. (2002), Document classification methods for organizing explicit knowledge, null. ; S. Sahay. Support vector machines and document classification. [Online]. Available: <a target="_blank" href=''></a> ; Lee C.-H. (2009), Construction of supervised and unsupervised learning systems for multilingual text categorization, Expert Systems with Applications, 2400, ; Wang S.-J. (2009), Empirical analysis of support vector machine ensemble classifiers, Expert Systems with Applications, 6466, ; Ikonomakis M. (2005), Text classification using machine learning techniques, Wseas Transactions on Computers, 4, 8, 966. ; How B. (2005), An examination of feature selection frameworks in text categorization, AIRS, 558. ; Kamruzzaman S. (2004), Hybrid learning algorithm for text classification, null. ; Miao D. (2009), Rough set based hybrid algorithm for text classification, Expert Systems with Applications. ; Markov A. (2005), A simple, structure-sensitive approach for web document classification, null, 293. ; Li C. (2009), Combination of modified bpnn algorithms and an efficient feature selection method for text categorization, Information Processing and Management, 45, 329, ; Shang W. (2006), An adaptive fuzzy knn text classifier, null, 216. ; Lee K. (2002), A Comparative Study on Statistical Machine Learning Algorithms and Thresholding Strategies for Automatic Text Categorization. ; Liu B. (2002), Partially supervised classification of text documents, null. ; Li M. (2008), Semi-supervised document retrieval, Information Processing and Management. ; Wu W. (2006), An efficient feature selection method for classification data mining, WSEAS Transactions on Information Science and Applications, 3, 2034. ; Yah A. (2003), Evaluation of text data mining for database curation: lessons learned from the kdd challenge cup, Bioinformatics, 19. ; Yang Y. (1999), An re-examination of text categorization, null, 42. ; Yuan P. (2008), Msvm-knn: Combining svm and k-nn for multi-class text classification, null, 133. ; Colas F. (2006), Comparison of svm and some older classification algorithms in text classification tasks, null, 169. ; Zhu Z.-F. (2008), Research of text classification technology based on genetic annealing algorithm, null, 1, 265.