Abstract
In this paper a novel approach to fuzzy hierarchical clustering of short text fragments is presented. Nowadays dataset which contains a large and even huge amount of short text fragments becomes quite a common object. Different kinds of short messages, paper or news headers are examples of this kind of objects. Authors have taken another similar object which is a dataset of key process indicators of Strategic Planning System of Russian Federation.
In order to reveal structure and thematic variety, fuzzy clustering approach is proposed. Fuzzy graph as a model has been chosen as the most natural view of connected set of words. Finally, hierarchy as a result of clustering obtained as desirable presentation structure of large amount of information.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Here and after all the examples translated into English from Russian, so some linguistic specific features could be lost.
- 2.
For Russian language and quite large text corpuses the reasonable value will be in a range [0.4–0.5].
- 3.
The reasonable value will be in a range [0.001, 0.05].
- 4.
The python-program source codes are available in GitHub (https://github.com/PavelDudarin/sentence-clustering). There are two modules: working with RusVectores and clustering algorithm itself.
- 5.
References
Ball, G.H., Hall, D.J.: Isodata: a method of data analysis and pattern classification, Stanford Research Institute, Menlo Park, United States. Office of Naval Research. Information Sciences Branch (1965)
Chandrasekaran, E., Sathyaseelan, N.: Fuzzy node fuzzy graph and its cluster analysis. Int. J. Eng. Res. Appl. (IJERA) 2(3), 733–738 (2012). ISSN: 2248-9622
Hou, D., Gu, Y.: An efficient successive iteration partial cluster algorithm for large datasets. In: Fuzzy Information and Engineering. Advances in Intelligent and Soft Computing, vol. 78, pp. 557–562 (2010)
Dudarin, P., Pinkov, A., Yarushkina, N.: Methodology and the algorithm for clustering economic analytics object. Autom. Control Processes 47(1), 85–93 (2017)
Federal law “About strategic planning in Russian Federation” (2014). http://pravo.gov.ru/proxy/ips/?docbody=&nd=102354386
Grechachin, V.A.: About text tokenization problem. Int. Sci. J. 6(48), 25–27 (2016). Part 4
Zhang, J., Wang, Y., Feng, J.: A hybrid clustering algorithm based on PSO with dynamic crossover. Soft Comput. 18(5), 961–979 (2014)
Kutuzov, A., Andreev, I.: Texts in, meaning out: neural language models in semantic similarity task for Russian. In: Proceedings of the Dialog 2015 Conference, Moscow, Russia (2015)
Mansoori, E.G.: GACH: a grid based algorithm for hierarchical clustering of high-dimensional data. Soft Comput. 18(5), 905–922 (2014)
Novák, V., Perfilieva, I., Jarushkina, N.G.: A general methodology for managerial decision making using intelligent techniques. In: Recent Advances in Decision Making. Studies in Computational Intelligence, vol. 222, pp. 103–120 (2009)
Yeh, R.T., Bang, S.Y.: Fuzzy relation, fuzzy graphs and their applications to clustering analysis. In: Fuzzy Sets and Their Applications to Cognitive and Decision Processes, pp. 125–149. Academic Press (1975). ISBN: 9780127752600
Rosenfeld, A.: Fuzzy graphs. In: Zadeh, L.A., Fu, K.S., Tanaka, K., Shimura, M. (eds.) Fuzzy Sets and Their Applications to Cognitive and Decision Processes, pp. 77–95. Academic Press, New York (1975)
Ruspini, E.H.: A new approach to clustering. Inf. Control 15(1), 22–32 (1969)
Russian Federation Government order. About the list of monoprofiled municipalities of Russian Federation (monocities). 29 June of 2014 № 1398-p. (2014)
Sameena, K.: Clustering using strong arcs in fuzzy graphs. Gen. Math. Notes 30(1), 60–68 (2015). ISSN: 2219-7184
Sandeep Narayan, K.R., Sunitha, M.S.: Connectivity in a fuzzy graph and its complement. Gen. Math. Notes 9(1), 38–43 (2012). ISSN: 2219-7184
Slavnov, K.A.: Social graph analysis (2015).http://www.machinelearning.ru/wiki/images/6/60/2015_417_SlavnovKA.pdf
Blondel, V.D., Guillaume, J.-L., Lambiotte, R., Lefebvre, E.: Fast unfolding of communities in large networks. J. Stat. Mech. (2008)
Li, W., Dong, L., Tao, J.: A fast global fuzzy clustering algorithm for the chemical gray box modeling. In: Fuzzy Information and Engineering. Advances in Intelligent and Soft Computing, vol. 78, pp. 571–579 (2010)
Web resource. Gephi as a tool of data visualization (2012). https://habrahabr.ru/post/136575/
Web resource. Pymorphy2. (2013). https://habrahabr.ru/post/176575/
Han, X., Ma, J., Wu, Y., Cui, C.: A novel machine learning approach to rank web forum posts. Soft Comput. 18(5), 941–959 (2014)
Dong, Y., Zhuang, Y., Chen, K., Tai, X.: A hierarchical clustering algorithm based on fuzzy graph connectedness. Fuzzy Sets Syst. 157(13), 1760–1774 (2006). ISSN: 0165-0114
Chen, Y., Han, M., Zhu, H.: Ant spatial clustering based on fuzzy IF-THEN Rule. In: Fuzzy Information and Engineering. Advances in Intelligent and Soft Computing, vol. 78, pp. 563–569 (2010)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this paper
Cite this paper
Dudarin, P.V., Yarushkina, N.G. (2018). An Approach to Fuzzy Hierarchical Clustering of Short Text Fragments Based on Fuzzy Graph Clustering. In: Abraham, A., Kovalev, S., Tarassov, V., Snasel, V., Vasileva, M., Sukhanov, A. (eds) Proceedings of the Second International Scientific Conference “Intelligent Information Technologies for Industry” (IITI’17). IITI 2017. Advances in Intelligent Systems and Computing, vol 679. Springer, Cham. https://doi.org/10.1007/978-3-319-68321-8_30
Download citation
DOI: https://doi.org/10.1007/978-3-319-68321-8_30
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-68320-1
Online ISBN: 978-3-319-68321-8
eBook Packages: EngineeringEngineering (R0)