Ensembles of Classifiers for Parallel Categorization of Large Number of Text Documents Expressing Opinions
František Dařena and
Jan Zizka ()
Additional contact information
Jan Zizka: Department of Informatics, Faculty of Business and Economics, Mendel Uni- versity in Brno, Zemedelska 1, 613 00 Brno, Czech Republic
No 2016-65, MENDELU Working Papers in Business and Economics from Mendel University in Brno, Faculty of Business and Economics
Abstract:
Opinions provided by people that used some services or purchased some goods are a rich source of knowledge. The opinion classification, applying mostly supervised classifiers, is one of the essential tasks. Computer’s technological capabilities are still a major obstacle, especially when processing huge volumes of data. This study proposes and evaluates experimentally a parallelism application to the classification of a very large number of contrary opinions expressed as freely written text reviews. Instead of training a single classifier on the entire data set, an ensemble of classifiers is trained on disjunctive subsets of data and a group decision is used for the classification of unlabelled items. The main assessment criteria are computational efficiency and error rates, combined into a single measure to be able to compare ensembles of different sizes. Support vector machines, artificial neural networks, and deci- sion trees, belonging to frequently used classification methods, were examined. The paper demonstrates the suggested method viability when the number of text reviews leads to com- putational complexity, which is beyond the contemporary common PC’s capabilities. Classification accuracy and the values of other classification performance measures (Precision, Recall, F-measure) did not decrease, which is a positive finding.
Keywords: text documents; natural language; classification; parallel processing; ensembles of classifiers; machine learning (search for similar items in EconPapers)
JEL-codes: C38 C89 (search for similar items in EconPapers)
Pages: 17
Date: 2016-12
New Economics Papers: this item is included in nep-cmp
References: View references in EconPapers View complete reference list from CitEc
Citations: Track citations by RSS feed
Downloads: (external link)
http://ftp.mendelu.cz/RePEc/men/wpaper/65_2016.pdf Full text (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:men:wpaper:65_2016
Access Statistics for this paper
More papers in MENDELU Working Papers in Business and Economics from Mendel University in Brno, Faculty of Business and Economics Contact information at EDIRC.
Bibliographic data for series maintained by Luděk Kouba ().