[go: up one dir, main page]

Youmi Ma


2024

pdf bib
Building a Japanese Document-Level Relation Extraction Dataset Assisted by Cross-Lingual Transfer
Youmi Ma | An Wang | Naoaki Okazaki
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Document-level Relation Extraction (DocRE) is the task of extracting all semantic relationships from a document. While studies have been conducted on English DocRE, limited attention has been given to DocRE in non-English languages. This work delves into effectively utilizing existing English resources to promote DocRE studies in non-English languages, with Japanese as the representative case. As an initial attempt, we construct a dataset by transferring an English dataset to Japanese. However, models trained on such a dataset are observed to suffer from low recalls. We investigate the error cases and attribute the failure to different surface structures and semantics of documents translated from English and those written by native speakers. We thus switch to explore if the transferred dataset can assist human annotation on Japanese documents. In our proposal, annotators edit relation predictions from a model trained on the transferred dataset. Quantitative analysis shows that relation recommendations suggested by the model help reduce approximately 50% of the human edit steps compared with the previous approach. Experiments quantify the performance of existing DocRE models on our collected dataset, portraying the challenges of Japanese and cross-lingual DocRE.

2023

pdf bib
DREEAM: Guiding Attention with Evidence for Improving Document-Level Relation Extraction
Youmi Ma | An Wang | Naoaki Okazaki
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics

Document-level relation extraction (DocRE) is the task of identifying all relations between each entity pair in a document. Evidence, defined as sentences containing clues for the relationship between an entity pair, has been shown to help DocRE systems focus on relevant texts, thus improving relation extraction. However, evidence retrieval (ER) in DocRE faces two major issues: high memory consumption and limited availability of annotations. This work aims at addressing these issues to improve the usage of ER in DocRE. First, we propose DREEAM, a memory-efficient approach that adopts evidence information as the supervisory signal, thereby guiding the attention modules of the DocRE system to assign high weights to evidence. Second, we propose a self-training strategy for DREEAM to learn ER from automatically-generated evidence on massive data without evidence annotations. Experimental results reveal that our approach exhibits state-of-the-art performance on the DocRED benchmark for both DocRE and ER. To the best of our knowledge, DREEAM is the first approach to employ ER self-training.

pdf bib
Generative Data Augmentation for Aspect Sentiment Quad Prediction
An Wang | Junfeng Jiang | Youmi Ma | Ao Liu | Naoaki Okazaki
Proceedings of the 12th Joint Conference on Lexical and Computational Semantics (*SEM 2023)

Aspect sentiment quad prediction (ASQP) analyzes the aspect terms, opinion terms, sentiment polarity, and aspect categories in a text. One challenge in this task is the scarcity of data owing to the high annotation cost. Data augmentation techniques are commonly used to address this issue. However, existing approaches simply rewrite texts in the training data, restricting the semantic diversity of the generated data and impairing the quality due to the inconsistency between text and quads. To address these limitations, we augment quads and train a quads-to-text model to generate corresponding texts. Furthermore, we designed novel strategies to filter out low-quality data and balance the sample difficulty distribution of the augmented dataset. Empirical studies on two ASQP datasets demonstrate that our method outperforms other data augmentation methods and achieves state-of-the-art performance on the benchmarks. The source code will be released upon acceptance.

pdf bib
Improving Cross-Lingual Transfer for Open Information Extraction with Linguistic Feature Projection
Youmi Ma | Bhushan Kotnis | Carolin Lawrence | Goran Glavaš | Naoaki Okazaki
Proceedings of the 3rd Workshop on Multi-lingual Representation Learning (MRL)

2022

pdf bib
Annotating Entity and Causal Relationships on Japanese Vehicle Recall Information
Hsuan-Yu Kuo | Youmi Ma | Naoaki Okazaki
Proceedings of the 36th Pacific Asia Conference on Language, Information and Computation

pdf bib
Joint Entity and Relation Extraction Based on Table Labeling Using Convolutional Neural Networks
Youmi Ma | Tatsuya Hiraoka | Naoaki Okazaki
Proceedings of the Sixth Workshop on Structured Prediction for NLP

This study introduces a novel approach to the joint extraction of entities and relations by stacking convolutional neural networks (CNNs) on pretrained language models. We adopt table representations to model the entities and relations, casting the entity and relation extraction as a table-labeling problem. Regarding each table as an image and each cell in a table as an image pixel, we apply two-dimensional CNNs to the tables to capture local dependencies and predict the cell labels. The experimental results showed that the performance of the proposed method is comparable to those of current state-of-art systems on the CoNLL04, ACE05, and ADE datasets. Even when freezing pretrained language model parameters, the proposed method showed a stable performance, whereas the compared methods suffered from significant decreases in performance. This observation indicates that the parameters of the pretrained encoder may incorporate dependencies among the entity and relation labels during fine-tuning.

2020

pdf bib
Semi-Supervised Semantic Dependency Parsing Using CRF Autoencoders
Zixia Jia | Youmi Ma | Jiong Cai | Kewei Tu
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Semantic dependency parsing, which aims to find rich bi-lexical relationships, allows words to have multiple dependency heads, resulting in graph-structured representations. We propose an approach to semi-supervised learning of semantic dependency parsers based on the CRF autoencoder framework. Our encoder is a discriminative neural semantic dependency parser that predicts the latent parse graph of the input sentence. Our decoder is a generative neural model that reconstructs the input sentence conditioned on the latent parse graph. Our model is arc-factored and therefore parsing and learning are both tractable. Experiments show our model achieves significant and consistent improvement over the supervised baseline.