2024
pdf
bib
abs
Search Query Refinement for Japanese Named Entity Recognition in E-commerce Domain
Yuki Nakayama
|
Ryutaro Tatsushima
|
Erick Mendieta
|
Koji Murakami
|
Keiji Shinzato
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 6: Industry Track)
In the E-Commerce domain, search query refinement reformulates malformed queries into canonicalized forms by preprocessing operations such as “term splitting” and “term merging”. Unfortunately, most relevant research is rather limited to English. In particular, there is a severe lack of study on search query refinement for the Japanese language. Furthermore, no attempt has ever been made to apply refinement methods to data improvement for downstream NLP tasks in real-world scenarios.This paper presents a novel query refinement approach for the Japanese language. Experimental results show that our method achieves significant improvement by 3.5 points through comparison with BERT-CRF as a baseline. Further experiments are also conducted to measure beneficial impact of query refinement on named entity recognition (NER) as the downstream task. Evaluations indicate that the proposed query refinement method contributes to better data quality, leading to performance boost on E-Commerce specific NER tasks by 11.7 points, compared to search query data preprocessed by MeCab, a very popularly adopted Japanese tokenizer.
2022
pdf
bib
abs
A Stacking-based Efficient Method for Toxic Language Detection on Live Streaming Chat
Yuto Oikawa
|
Yuki Nakayama
|
Koji Murakami
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: Industry Track
In a live streaming chat on a video streaming service, it is crucial to filter out toxic comments with online processing to prevent users from reading comments in real-time. However, recent toxic language detection methods rely on deep learning methods, which can not be scalable considering inference speed. Also, these methods do not consider constraints of computational resources expected depending on a deployed system (e.g., no GPU resource).This paper presents an efficient method for toxic language detection that is aware of real-world scenarios. Our proposed architecture is based on partial stacking that feeds initial results with low confidence to meta-classifier. Experimental results show that our method achieves a much faster inference speed than BERT-based models with comparable performance.
pdf
bib
abs
A Large-Scale Japanese Dataset for Aspect-based Sentiment Analysis
Yuki Nakayama
|
Koji Murakami
|
Gautam Kumar
|
Sudha Bhingardive
|
Ikuko Hardaway
Proceedings of the Thirteenth Language Resources and Evaluation Conference
There has been significant progress in the field of sentiment analysis. However, aspect-based sentiment analysis (ABSA) has not been explored in the Japanese language even though it has a huge scope in many natural language processing applications such as 1) tracking sentiment towards products, movies, politicians etc; 2) improving customer relation models. The main reason behind this is that there is no standard Japanese dataset available for ABSA task. In this paper, we present the first standard Japanese dataset for the hotel reviews domain. The proposed dataset contains 53,192 review sentences with seven aspect categories and two polarity labels. We perform experiments on this dataset using popular ABSA approaches and report error analysis. Our experiments show that contextual models such as BERT works very well for the ABSA task in the Japanese language and also show the need to focus on other NLP tasks for better performance through our error analysis.
2020
pdf
bib
ILP-based Opinion Sentence Extraction from User Reviews for Question DB Construction
Masakatsu Hamashita
|
Takashi Inui
|
Koji Murakami
|
Keiji Shinzato
Proceedings of the 34th Pacific Asia Conference on Language, Information and Computation
2016
pdf
bib
abs
Large-scale Multi-class and Hierarchical Product Categorization for an E-commerce Giant
Ali Cevahir
|
Koji Murakami
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers
In order to organize the large number of products listed in e-commerce sites, each product is usually assigned to one of the multi-level categories in the taxonomy tree. It is a time-consuming and difficult task for merchants to select proper categories within thousands of options for the products they sell. In this work, we propose an automatic classification tool to predict the matching category for a given product title and description. We used a combination of two different neural models, i.e., deep belief nets and deep autoencoders, for both titles and descriptions. We implemented a selective reconstruction approach for the input layer during the training of the deep neural networks, in order to scale-out for large-sized sparse feature vectors. GPUs are utilized in order to train neural networks in a reasonable time. We have trained our models for around 150 million products with a taxonomy tree with at most 5 levels that contains 28,338 leaf categories. Tests with millions of products show that our first predictions matches 81% of merchants’ assignments, when “others” categories are excluded.
2011
pdf
bib
Recognizing Confinement in Web Texts
Megumi Ohki
|
Eric Nichols
|
Suguru Matsuyoshi
|
Koji Murakami
|
Junta Mizuno
|
Shouko Masuda
|
Kentaro Inui
|
Yuji Matsumoto
Proceedings of the Ninth International Conference on Computational Semantics (IWCS 2011)
pdf
bib
Safety Information Mining — What can NLP do in a disaster—
Graham Neubig
|
Yuichiroh Matsubayashi
|
Masato Hagiwara
|
Koji Murakami
Proceedings of 5th International Joint Conference on Natural Language Processing
2010
pdf
bib
Automatic Classification of Semantic Relations between Facts and Opinions
Koji Murakami
|
Eric Nichols
|
Junta Mizuno
|
Yotaro Watanabe
|
Hayato Goto
|
Megumi Ohki
|
Suguru Matsuyoshi
|
Kentaro Inui
|
Yuji Matsumoto
Proceedings of the Second Workshop on NLP Challenges in the Information Explosion Era (NLPIX 2010)
pdf
bib
abs
Annotating Event Mentions in Text with Modality, Focus, and Source Information
Suguru Matsuyoshi
|
Megumi Eguchi
|
Chitose Sao
|
Koji Murakami
|
Kentaro Inui
|
Yuji Matsumoto
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)
Many natural language processing tasks, including information extraction, question answering and recognizing textual entailment, require analysis of the polarity, focus of polarity, tense, aspect, mood and source of the event mentions in a text in addition to its predicate-argument structure analysis. We refer to modality, polarity and other associated information as extended modality. In this paper, we propose a new annotation scheme for representing the extended modality of event mentions in a sentence. Our extended modality consists of the following seven components: Source, Time, Conditional, Primary modality type, Actuality, Evaluation and Focus. We reviewed the literature about extended modality in Linguistics and Natural Language Processing (NLP) and defined appropriate labels of each component. In the proposed annotation scheme, information of extended modality of an event mention is summarized at the core predicate of the event mention for immediate use in NLP applications. We also report on the current progress of our manual annotation of a Japanese corpus of about 50,000 event mentions, showing a reasonably high ratio of inter-annotator agreement.
2009
pdf
bib
Annotating Semantic Relations Combining Facts and Opinions
Koji Murakami
|
Shouko Masuda
|
Suguru Matsuyoshi
|
Eric Nichols
|
Kentaro Inui
|
Yuji Matsumoto
Proceedings of the Third Linguistic Annotation Workshop (LAW III)
2002
pdf
bib
Evaluation of Direct Speech Translation Method Using Inductive Learning for Conversations in the Travel Domain
Koji Murakami
|
Makoto Hiroshige
|
Kenji Araki
|
Koji Tochinai
Proceedings of the ACL-02 Workshop on Speech-to-Speech Translation: Algorithms and Systems