Content- and Topology-Aware Representation Learning for Scientific Multi-Literature

Kai Zhang, Kaisong Song, Yangyang Kang, Xiaozhong Liu

Abstract

Representation learning forms an essential building block in the development of natural language processing architectures. To date, mainstream approaches focus on learning textual information at the sentence- or document-level, unfortunately, overlooking the inter-document connections. This omission decreases the potency of downstream applications, particularly in multi-document settings. To address this issue, embeddings equipped with latent semantic and rich relatedness information are needed. In this paper, we propose SMRC², which extends representation learning to the multi-document level. Our model jointly learns latent semantic information from content and rich relatedness information from topological networks. Unlike previous studies, our work takes multi-document as input and integrates both semantic and relatedness information using a shared space via language model and graph structure. Our extensive experiments confirm the superiority and effectiveness of our approach. To encourage further research in scientific multi-literature representation learning, we will release our code and a new dataset from the biomedical domain.

Anthology ID:: 2023.emnlp-main.465
Volume:: Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
Month:: December
Year:: 2023
Address:: Singapore
Editors:: Houda Bouamor, Juan Pino, Kalika Bali
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 7490–7502
Language:
URL:: https://aclanthology.org/2023.emnlp-main.465
DOI:: 10.18653/v1/2023.emnlp-main.465
Bibkey:
Cite (ACL):: Kai Zhang, Kaisong Song, Yangyang Kang, and Xiaozhong Liu. 2023. Content- and Topology-Aware Representation Learning for Scientific Multi-Literature. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 7490–7502, Singapore. Association for Computational Linguistics.
Cite (Informal):: Content- and Topology-Aware Representation Learning for Scientific Multi-Literature (Zhang et al., EMNLP 2023)
Copy Citation:
PDF:: https://aclanthology.org/2023.emnlp-main.465.pdf
Video:: https://aclanthology.org/2023.emnlp-main.465.mp4

PDF Cite Search Video