[go: up one dir, main page]

Skip to main content

An Approach to Fuzzy Hierarchical Clustering of Short Text Fragments Based on Fuzzy Graph Clustering

  • Conference paper
  • First Online:
Proceedings of the Second International Scientific Conference “Intelligent Information Technologies for Industry” (IITI’17) (IITI 2017)

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 679))

  • 928 Accesses

Abstract

In this paper a novel approach to fuzzy hierarchical clustering of short text fragments is presented. Nowadays dataset which contains a large and even huge amount of short text fragments becomes quite a common object. Different kinds of short messages, paper or news headers are examples of this kind of objects. Authors have taken another similar object which is a dataset of key process indicators of Strategic Planning System of Russian Federation.

In order to reveal structure and thematic variety, fuzzy clustering approach is proposed. Fuzzy graph as a model has been chosen as the most natural view of connected set of words. Finally, hierarchy as a result of clustering obtained as desirable presentation structure of large amount of information.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
€32.70 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
EUR 29.95
Price includes VAT (France)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
EUR 160.49
Price includes VAT (France)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
EUR 210.99
Price includes VAT (France)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    Here and after all the examples translated into English from Russian, so some linguistic specific features could be lost.

  2. 2.

    For Russian language and quite large text corpuses the reasonable value will be in a range [0.4–0.5].

  3. 3.

    The reasonable value will be in a range [0.001, 0.05].

  4. 4.

    The python-program source codes are available in GitHub (https://github.com/PavelDudarin/sentence-clustering). There are two modules: working with RusVectores and clustering algorithm itself.

  5. 5.

    http://gasu.gov.ru/.

References

  1. Ball, G.H., Hall, D.J.: Isodata: a method of data analysis and pattern classification, Stanford Research Institute, Menlo Park, United States. Office of Naval Research. Information Sciences Branch (1965)

    Google Scholar 

  2. Chandrasekaran, E., Sathyaseelan, N.: Fuzzy node fuzzy graph and its cluster analysis. Int. J. Eng. Res. Appl. (IJERA) 2(3), 733–738 (2012). ISSN: 2248-9622

    Google Scholar 

  3. Hou, D., Gu, Y.: An efficient successive iteration partial cluster algorithm for large datasets. In: Fuzzy Information and Engineering. Advances in Intelligent and Soft Computing, vol. 78, pp. 557–562 (2010)

    Google Scholar 

  4. Dudarin, P., Pinkov, A., Yarushkina, N.: Methodology and the algorithm for clustering economic analytics object. Autom. Control Processes 47(1), 85–93 (2017)

    Google Scholar 

  5. Federal law “About strategic planning in Russian Federation” (2014). http://pravo.gov.ru/proxy/ips/?docbody=&nd=102354386

  6. Grechachin, V.A.: About text tokenization problem. Int. Sci. J. 6(48), 25–27 (2016). Part 4

    Google Scholar 

  7. Zhang, J., Wang, Y., Feng, J.: A hybrid clustering algorithm based on PSO with dynamic crossover. Soft Comput. 18(5), 961–979 (2014)

    Article  Google Scholar 

  8. Kutuzov, A., Andreev, I.: Texts in, meaning out: neural language models in semantic similarity task for Russian. In: Proceedings of the Dialog 2015 Conference, Moscow, Russia (2015)

    Google Scholar 

  9. Mansoori, E.G.: GACH: a grid based algorithm for hierarchical clustering of high-dimensional data. Soft Comput. 18(5), 905–922 (2014)

    Article  Google Scholar 

  10. Novák, V., Perfilieva, I., Jarushkina, N.G.: A general methodology for managerial decision making using intelligent techniques. In: Recent Advances in Decision Making. Studies in Computational Intelligence, vol. 222, pp. 103–120 (2009)

    Google Scholar 

  11. Yeh, R.T., Bang, S.Y.: Fuzzy relation, fuzzy graphs and their applications to clustering analysis. In: Fuzzy Sets and Their Applications to Cognitive and Decision Processes, pp. 125–149. Academic Press (1975). ISBN: 9780127752600

    Google Scholar 

  12. Rosenfeld, A.: Fuzzy graphs. In: Zadeh, L.A., Fu, K.S., Tanaka, K., Shimura, M. (eds.) Fuzzy Sets and Their Applications to Cognitive and Decision Processes, pp. 77–95. Academic Press, New York (1975)

    Google Scholar 

  13. Ruspini, E.H.: A new approach to clustering. Inf. Control 15(1), 22–32 (1969)

    Article  MATH  Google Scholar 

  14. Russian Federation Government order. About the list of monoprofiled municipalities of Russian Federation (monocities). 29 June of 2014 № 1398-p. (2014)

    Google Scholar 

  15. Sameena, K.: Clustering using strong arcs in fuzzy graphs. Gen. Math. Notes 30(1), 60–68 (2015). ISSN: 2219-7184

    Google Scholar 

  16. Sandeep Narayan, K.R., Sunitha, M.S.: Connectivity in a fuzzy graph and its complement. Gen. Math. Notes 9(1), 38–43 (2012). ISSN: 2219-7184

    Google Scholar 

  17. Slavnov, K.A.: Social graph analysis (2015).http://www.machinelearning.ru/wiki/images/6/60/2015_417_SlavnovKA.pdf

  18. Blondel, V.D., Guillaume, J.-L., Lambiotte, R., Lefebvre, E.: Fast unfolding of communities in large networks. J. Stat. Mech. (2008)

    Google Scholar 

  19. Li, W., Dong, L., Tao, J.: A fast global fuzzy clustering algorithm for the chemical gray box modeling. In: Fuzzy Information and Engineering. Advances in Intelligent and Soft Computing, vol. 78, pp. 571–579 (2010)

    Google Scholar 

  20. Web resource. Gephi as a tool of data visualization (2012). https://habrahabr.ru/post/136575/

  21. Web resource. Pymorphy2. (2013). https://habrahabr.ru/post/176575/

  22. Han, X., Ma, J., Wu, Y., Cui, C.: A novel machine learning approach to rank web forum posts. Soft Comput. 18(5), 941–959 (2014)

    Article  Google Scholar 

  23. Dong, Y., Zhuang, Y., Chen, K., Tai, X.: A hierarchical clustering algorithm based on fuzzy graph connectedness. Fuzzy Sets Syst. 157(13), 1760–1774 (2006). ISSN: 0165-0114

    Article  MathSciNet  MATH  Google Scholar 

  24. Chen, Y., Han, M., Zhu, H.: Ant spatial clustering based on fuzzy IF-THEN Rule. In: Fuzzy Information and Engineering. Advances in Intelligent and Soft Computing, vol. 78, pp. 563–569 (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pavel V. Dudarin .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG

About this paper

Cite this paper

Dudarin, P.V., Yarushkina, N.G. (2018). An Approach to Fuzzy Hierarchical Clustering of Short Text Fragments Based on Fuzzy Graph Clustering. In: Abraham, A., Kovalev, S., Tarassov, V., Snasel, V., Vasileva, M., Sukhanov, A. (eds) Proceedings of the Second International Scientific Conference “Intelligent Information Technologies for Industry” (IITI’17). IITI 2017. Advances in Intelligent Systems and Computing, vol 679. Springer, Cham. https://doi.org/10.1007/978-3-319-68321-8_30

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-68321-8_30

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-68320-1

  • Online ISBN: 978-3-319-68321-8

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics