2015
pdf
bib
Clustering Sentences with Density Peaks for Multi-document Summarization
Yang Zhang
|
Yunqing Xia
|
Yi Liu
|
Wenmin Wang
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
pdf
bib
Tweet Normalization with Syllables
Ke Xu
|
Yunqing Xia
|
Chin-Hui Lee
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)
2014
pdf
bib
abs
Clustering tweets usingWikipedia concepts
Guoyu Tang
|
Yunqing Xia
|
Weizhi Wang
|
Raymond Lau
|
Fang Zheng
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
Two challenging issues are notable in tweet clustering. Firstly, the sparse data problem is serious since no tweet can be longer than 140 characters. Secondly, synonymy and polysemy are rather common because users intend to present a unique meaning with a great number of manners in tweets. Enlightened by the recent research which indicates Wikipedia is promising in representing text, we exploit Wikipedia concepts in representing tweets with concept vectors. We address the polysemy issue with a Bayesian model, and the synonymy issue by exploiting the Wikipedia redirections. To further alleviate the sparse data problem, we further make use of three types of out-links in Wikipedia. Evaluation on a twitter dataset shows that the concept model outperforms the traditional VSM model in tweet clustering.
pdf
bib
Web Information Mining and Decision Support Platform for the Modern Service Industry
Binyang Li
|
Lanjun Zhou
|
Zhongyu Wei
|
Kam-fai Wong
|
Ruifeng Xu
|
Yunqing Xia
Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations
2012
pdf
bib
abs
Affective Common Sense Knowledge Acquisition for Sentiment Analysis
Erik Cambria
|
Yunqing Xia
|
Amir Hussain
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
Thanks to the advent of Web 2.0, the potential for opinion sharing today is unmatched in history. Making meaning out of the huge amount of unstructured information available online, however, is extremely difficult as web-contents, despite being perfectly suitable for human consumption, still remain hardly accessible to machines. To bridge the cognitive and affective gap between word-level natural language data and the concept-level sentiments conveyed by them, affective common sense knowledge is needed. In sentic computing, the general common sense knowledge contained in ConceptNet is usually exploited to spread affective information from selected affect seeds to other concepts. In this work, besides exploiting the emotional content of the Open Mind corpus, we also collect new affective common sense knowledge through label sequential rules, crowd sourcing, and games-with-a-purpose techniques. In particular, we develop Open Mind Common Sentics, an emotion-sensitive IUI that serves both as a platform for affective common sense acquisition and as a publicly available NLP tool for extracting the cognitive and affective information associated with short texts.
pdf
bib
abs
CLTC: A Chinese-English Cross-lingual Topic Corpus
Yunqing Xia
|
Guoyu Tang
|
Peng Jin
|
Xia Yang
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
Cross-lingual topic detection within text is a feasible solution to resolving the language barrier in accessing the information. This paper presents a Chinese-English cross-lingual topic corpus (CLTC), in which 90,000 Chinese articles and 90,000 English articles are organized within 150 topics. Compared with TDT corpora, CLTC has three advantages. First, CLTC is bigger in size. This makes it possible to evaluate the large-scale cross-lingual text clustering methods. Second, articles are evenly distributed within the topics. Thus it can be used to produce test datasets for different purposes. Third, CLTC can be used as a cross-lingual comparable corpus to develop methods for cross-lingual information access. A preliminary evaluation with CLTC corpus indicates that the corpus is effective in evaluating cross-lingual topic detection methods.
2011
pdf
bib
CLGVSM: Adapting Generalized Vector Space Model to Cross-lingual Document Clustering
Guoyu Tang
|
Yunqing Xia
|
Min Zhang
|
Haizhou Li
|
Fang Zheng
Proceedings of 5th International Joint Conference on Natural Language Processing
pdf
bib
Thread Cleaning and Merging for Microblog Topic Detection
Jianfeng Zhang
|
Yunqing Xia
|
Bin Ma
|
Jianmin Yao
|
Yu Hong
Proceedings of 5th International Joint Conference on Natural Language Processing
pdf
bib
Joint Alignment and Artificial Data Generation: An Empirical Study of Pivot-based Machine Transliteration
Min Zhang
|
Xiangyu Duan
|
Ming Liu
|
Yunqing Xia
|
Haizhou Li
Proceedings of 5th International Joint Conference on Natural Language Processing
pdf
bib
Proceedings of the IJCNLP 2011 System Demonstrations
Kenneth Church
|
Yunqing Xia
Proceedings of the IJCNLP 2011 System Demonstrations
2008
pdf
bib
Lyric-based Song Sentiment Classification with Sentiment Vector Space Model
Yunqing Xia
|
Linlin Wang
|
Kam-Fai Wong
|
Mingxing Xu
Proceedings of ACL-08: HLT, Short Papers
pdf
bib
abs
Opinion Annotation in On-line Chinese Product Reviews
Ruifeng Xu
|
Yunqing Xia
|
Kam-Fai Wong
|
Wenjie Li
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)
This paper presents the design and construction of a Chinese opinion corpus based on the online product reviews. Based on the observation on the characteristics of opinion expression in Chinese online product reviews, which is quite different from in the formal texts such as news, an annotation framework is proposed to guide the construction of the first Chinese opinion corpus based on online product reviews. The opinionated sentences are manually identified from the review text. Furthermore, for each comment in the opinionated sentence, its 13 describing elements are annotated including the expressions related to the interested product attributes and user opinions as well as the polarity and degree of the opinions. Currently, 12,724 comments are annotated in 10,935 sentences from review text. Through statistical analysis on the opinion corpus, some interesting characteristics of Chinese opinion expression are presented. This corpus is shown helpful to support systematic research on Chinese opinion analysis.
2006
pdf
bib
abs
Constructing A Chinese Chat Language Corpus with A Two-Stage Incremental Annotation Approach
Yunqing Xia
|
Kam-Fai Wong
|
Wenjie Li
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)
Chat language refers to the special human language widely used in the community of digital network chat. As chat language holds anomalous characteristics in forming words, phrases, and non-alphabetical characters, conventional natural language processing tools are ineffective to handle chat language text. Previous research shows that knowledge based methods perform less effectively in proc-essing unseen chat terms. This motivates us to construct a chat language corpus so that corpus-based techniques of chat language text processing can be developed and evaluated. However, creating the corpus merely by hand is difficult. One, this work is manpower consuming. Second, annotation inconsistency is serious. To minimize manpower and annotation inconsistency, a two-stage incre-mental annotation approach is proposed in this paper in constructing a chat language corpus. Experiments conducted in this paper show that the performance of corpus annotation can be improved greatly with this approach.
pdf
bib
A Phonetic-Based Approach to Chinese Chat Text Normalization
Yunqing Xia
|
Kam-Fai Wong
|
Wenjie Li
Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics
pdf
bib
Anomaly Detecting within Dynamic Chinese Chat Text
Yunqing Xia
|
Kam-Fai Wong
Proceedings of the Workshop on NEW TEXT Wikis and blogs and other dynamic text sources
2005
pdf
bib
NIL Is Not Nothing: Recognition of Chinese Network Informal Language Expressions
Yunqing Xia
|
Kam-Fai Wong
|
Wei Gao
Proceedings of the Fourth SIGHAN Workshop on Chinese Language Processing
2004
pdf
bib
FASIL Email Summarisation System
Angelo Dalli
|
Yunqing Xia
|
Yorick Wilks
COLING 2004: Proceedings of the 20th International Conference on Computational Linguistics