Key-Sun Choi

Also published as: Key-sun Choi

2023

In this paper, we introduce the design and various attempts for TaskB of MEDIQA-Chat 2023. The goal of TaskB in MEDIQA-Chat 2023 is to generate full clinical note from doctor-patient consultation dialogues. This task has several challenging issues, such as lack of training data, handling long dialogue inputs, and generating semi-structured clinical note which have section heads. To address these issues, we conducted various experiments and analyzed their results. We utilized the DialogLED model pre-trained on long dialogue data to handle long inputs, and we pre-trained on other dialogue datasets to address the lack of training data. We also attempted methods such as using prompts and contrastive learning for handling sections. This paper provides insights into clinical note generation through analyzing experimental methods and results, and it suggests future research directions.

2022

2020

pdf bib abs
Unsupervised Fact Checking by Counter-Weighted Positive and Negative Evidential Paths in A Knowledge Graph
Jiseong Kim | Key-sun Choi
Proceedings of the 28th International Conference on Computational Linguistics

Misinformation spreads across media, community, and knowledge graphs in the Web by not only human agents but also information extraction algorithms that extract factual statements from unstructured textual data to populate the existing knowledge graphs. Traditional fact checking by experts or crowds is increasingly difficult to keep pace with the volume of newly created misinformation in the Web. Therefore, it is important and necessary to enhance the computational ability to determine whether a given factual statement is truthful or not. We view this problem as a truth scoring task in a knowledge graph. We present a novel rule-based approach that finds positive and negative evidential paths in a knowledge graph for a given factual statement and calculates a truth score for the given statement by unsupervised ensemble of the found positive and negative evidential paths. For example, we can determine the factual statement “United States is the birth place of Barack Obama” as truthful if there is the positive evidential path (Barack Obama, birthPlace, Hawaii) ∧ (Hawaii, country, United States) in a knowledge graph. For another example, we can determine the factual statement “Canada is the nationality of Barack Obama” as untruthful if there is the negative evidential path (Barack Obama, nationality, United States) ∧ (United States, ≠, Canada) in a knowledge graph. For evaluating on a real-world situation, we constructed an evaluation dataset by labeling truth or untruth label on factual statements that were extracted from Wikipedia texts by using the state-of-the-art BERT-based information extraction system. Our evaluation results show that our approach outperforms the state-of-the-art unsupervised approaches significantly by up to 0.12 AUC-ROC and even outperforms the supervised approach by up to 0.05 AUC-ROC not only in our dataset but also in the two different standard datasets.

Information extraction from unstructured texts plays a vital role in the field of natural language processing. Although there has been extensive research into each information extraction task (i.e., entity linking, coreference resolution, and relation extraction), data are not available for a continuous and coherent evaluation of all information extraction tasks in a comprehensive framework. Given that each task is performed and evaluated with a different dataset, analyzing the effect of the previous task on the next task with a single dataset throughout the information extraction process is impossible. This paper aims to propose a Korean information extraction initiative point and promote research in this field by presenting crowdsourcing data collected for four information extraction tasks from the same corpus and the training and evaluation results for each task of a state-of-the-art model. These machine learning data for Korean information extraction are the first of their kind, and there are plans to continuously increase the data volume. The test results will serve as an initiative result for each Korean information extraction task and are expected to serve as a comparison target for various studies on Korean information extraction using the data collected in this study.

Using current methods, the construction of multilingual resources in FrameNet is an expensive and complex task. While crowdsourcing is a viable alternative, it is difficult to include non-native English speakers in such efforts as they often have difficulty with English-based FrameNet tools. In this work, we investigated cross-lingual issues in crowdsourcing approaches for multilingual FrameNets, specifically in the context of the newly constructed Korean FrameNet. To accomplish this, we evaluated the effectiveness of various crowdsourcing settings whereby certain types of information are provided to workers, such as English definitions in FrameNet or translated definitions. We then evaluated whether the crowdsourced results accurately captured the meaning of frames both cross-culturally and cross-linguistically, and found that by allowing the crowd workers to make intuitive choices, they achieved a quality comparable to that of trained FrameNet experts (F1 > 0.75). The outcomes of this work are now publicly available as a new release of Korean FrameNet 1.1.

2018

pdf bib abs
Distant Supervision for Relation Extraction with Multi-sense Word Embedding
Sangha Nam | Kijong Han | Eun-Kyung Kim | Key-Sun Choi
Proceedings of the 9th Global Wordnet Conference

Distant supervision can automatically generate labeled data between a large-scale corpus and a knowledge base without utilizing human efforts. Therefore, many studies have used the distant supervision approach in relation extraction tasks. However, existing studies have a disadvantage in that they do not reflect the homograph in the word embedding used as an input of the relation extraction model. Thus, it can be seen that the relation extraction model learns without grasping the meaning of the word accurately. In this paper, we propose a relation extraction model with multi-sense word embedding. We learn multi-sense word embedding using a word sense disambiguation module. In addition, we use convolutional neural network and piecewise max pooling convolutional neural network relation extraction models that efficiently grasp key features in sentences. To evaluate the performance of the proposed model, two additional methods of word embedding were learned and compared. Accordingly, our method showed the highest performance among them.

The increased demand for structured knowledge has created considerable interest in knowledge extraction from natural language sentences. This study presents a new Korean knowledge extraction system and web interface for enriching a KBox knowledge base that expands based on the Korean DBpedia. The aim is to create an endpoint where knowledge can be extracted and added to KBox anytime and anywhere.

pdf bib abs
Utilizing Graph Measure to Deduce Omitted Entities in Paragraphs
Eun-kyung Kim | Kijong Han | Jiho Kim | Key-Sun Choi
Proceedings of the 27th International Conference on Computational Linguistics: System Demonstrations

This demo deals with the problem of capturing omitted arguments in relation extraction given a proper knowledge base for entities of interest. This paper introduces the concept of a salient entity and use this information to deduce omitted entities in the paragraph which allows improving the relation extraction quality. The main idea to compute salient entities is to construct a graph on the given information (by identifying the entities but without parsing it), rank it with standard graph measures and embed it in the context of the sentences.

pdf bib
Semi-automatic Korean FrameNet Annotation over KAIST Treebank
Younggyun Hahm | Jiseong Kim | Sunggoo Kwon | Key-Sun Choi
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
Unsupervised Korean Word Sense Disambiguation using CoreNet
Kijong Han | Sangha Nam | Jiseong Kim | Younggyun Hahm | Key-Sun Choi
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
Automatic Wordnet Mapping: from CoreNet to Princeton WordNet
Jiseong Kim | Younggyun Hahm | Sunggoo Kwon | Key-Sun Choi
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
Incorporating Global Contexts into Sentence Embedding for Relational Extraction at the Paragraph Level with Distant Supervision
Eun-kyung Kim | Key-Sun Choi
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

2016

pdf bib
Proceedings of the Open Knowledge Base and Question Answering Workshop (OKBQA 2016)
Key-Sun Choi | Christina Unger | Piek Vossen | Jin-Dong Kim | Noriko Kando | Axel-Cyrille Ngonga Ngomo
Proceedings of the Open Knowledge Base and Question Answering Workshop (OKBQA 2016)

pdf bib
Filling a Knowledge Graph with a Crowd
GyuHyeon Choi | Sangha Nam | Dongho Choi | Key-Sun Choi
Proceedings of the Open Knowledge Base and Question Answering Workshop (OKBQA 2016)

pdf bib abs
SRDF: Extracting Lexical Knowledge Graph for Preserving Sentence Meaning
Sangha Nam | GyuHyeon Choi | Younggyun Hahm | Key-Sun Choi
Proceedings of the Open Knowledge Base and Question Answering Workshop (OKBQA 2016)

In this paper, we present an open information extraction system so-called SRDF that generates lexical knowledge graphs from unstructured texts. In semantic web, knowledge is expressed in the RDF triple form but the natural language text consist of multiple relations between arguments. For this reason, we combine open information extraction with the reification for the full text extraction to preserve meaning of sentence in our knowledge graph. And also our knowledge graph is designed to adapt for many existing semantic web applications. At the end of this paper, we introduce the result of the experiment and a Korean template generation module developed using SRDF.

pdf bib abs
QAF: Frame Semantics-based Question Interpretation
Younggyun Hahm | Sangha Nam | Key-Sun Choi
Proceedings of the Open Knowledge Base and Question Answering Workshop (OKBQA 2016)

Natural language questions are interpreted to a sequence of patterns to be matched with instances of patterns in a knowledge base (KB) for answering. A natural language (NL) question answering (QA) system utilizes meaningful patterns matching the syntac-tic/lexical features between the NL questions and KB. In the most of KBs, there are only binary relations in triple form to represent relation between two entities or entity and a value using the domain specific ontology. However, the binary relation representation is not enough to cover complex information in questions, and the ontology vocabulary sometimes does not cover the lexical meaning in questions. Complex meaning needs a knowledge representation to link the binary relation-type triples in KB. In this paper, we propose a frame semantics-based semantic parsing approach as KB-independent question pre-processing. We will propose requirements of question interpretation in the KBQA perspective, and a query form representation based on our proposed format QAF (Ques-tion Answering with the Frame Semantics), which is supposed to cover the requirements. In QAF, frame semantics roles as a model to represent complex information in questions and to disambiguate the lexical meaning in questions to match with the ontology vocabu-lary. Our system takes a question as an input and outputs QAF-query by the process which assigns semantic information in the question to its corresponding frame semantic structure using the semantic parsing rules.

pdf bib abs
Dedicated Workflow Management for OKBQA Framework
Jiseong Kim | GyuHyeon Choi | Key-Sun Choi
Proceedings of the Open Knowledge Base and Question Answering Workshop (OKBQA 2016)

Nowadays, a question answering (QA) system is used in various areas such a quiz show, personal assistant, home device, and so on. The OKBQA framework supports developing a QA system in an intuitive and collaborative ways. To support collaborative development, the framework should be equipped with some functions, e.g., flexible system configuration, debugging supports, intuitive user interface, and so on while considering different developing groups of different domains. This paper presents OKBQA controller, a dedicated workflow manager for OKBQA framework, to boost collaborative development of a QA system.

pdf bib
Proceedings of the 12th Workshop on Asian Language Resources (ALR12)
Koiti Hasida | Kam-Fai Wong | Nicoletta Calzorari | Key-Sun Choi
Proceedings of the 12th Workshop on Asian Language Resources (ALR12)

Many emerging documents usually contain temporal information. Because the temporal information is useful for various applications, it became important to develop a system of extracting the temporal information from the documents. Before developing the system, it first necessary to define or design the structure of temporal information. In other words, it is necessary to design a language which defines how to annotate the temporal information. There have been some studies about the annotation languages, but most of them was applicable to only a specific target language (e.g., English). Thus, it is necessary to design an individual annotation language for each language. In this paper, we propose a revised version of Koreain Time Mark-up Language (K-TimeML), and also introduce a dataset, named Korean TimeBank, that is constructed basd on the K-TimeML. We believe that the new K-TimeML and Korean TimeBank will be used in many further researches about extraction of temporal information.

pdf bib abs
The Open Framework for Developing Knowledge Base And Question Answering System
Jiseong Kim | GyuHyeon Choi | Jung-Uk Kim | Eun-Kyung Kim | Key-Sun Choi
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: System Demonstrations

Developing a question answering (QA) system is a task of implementing and integrating modules of different technologies and evaluating an integrated whole system, which inevitably goes with a collaboration among experts of different domains. For supporting a easy collaboration, this demonstration presents the open framework that aims to support developing a QA system in collaborative and intuitive ways. The demonstration also shows the QA system developed by our novel framework.

pdf bib abs
Korean FrameNet Expansion Based on Projection of Japanese FrameNet
Jeong-uk Kim | Younggyun Hahm | Key-Sun Choi
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: System Demonstrations

FrameNet project has begun from Berkeley in 1997, and is now supported in several countries reflecting characteristics of each language. The work for generating Korean FrameNet was already done by converting annotated English sentences into Korean with trained translators. However, high cost of frame-preservation and error revision was a huge burden on further expansion of FrameNet. This study makes use of linguistic similarity between Japanese and Korean to increase Korean FrameNet corpus with low cost. We also suggest adapting PubAnnotation and Korean-friendly valence patterns to FrameNet for increased accessibility.

pdf bib abs
MAGES: A Multilingual Angle-integrated Grouping-based Entity Summarization System
Eun-kyung Kim | Key-Sun Choi
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: System Demonstrations

This demo presents MAGES (multilingual angle-integrated grouping-based entity summarization), an entity summarization system for a large knowledge base such as DBpedia based on a entity-group-bound ranking in a single integrated entity space across multiple language-specific editions. MAGES offers a multilingual angle-integrated space model, which has the advantage of overcoming missing semantic tags (i.e., categories) caused by biases in different language communities, and can contribute to the creation of entity groups that are well-formed and more stable than the monolingual condition within it. MAGES can help people quickly identify the essential points of the entities when they search or browse a large volume of entity-centric data. Evaluation results on the same experimental data demonstrate that our system produces a better summary compared with other representative DBpedia entity summarization methods.

2015

pdf bib
Entity Linking Korean Text: An Unsupervised Learning Approach using Semantic Relations
Youngsik Kim | Key-Sun Choi
Proceedings of the Nineteenth Conference on Computational Natural Language Learning

2014

pdf bib abs
Named Entity Corpus Construction using Wikipedia and DBpedia Ontology
Younggyun Hahm | Jungyeul Park | Kyungtae Lim | Youngsik Kim | Dosam Hwang | Key-Sun Choi
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

In this paper, we propose a novel method to automatically build a named entity corpus based on the DBpedia ontology. Since most of named entity recognition systems require time and effort consuming annotation tasks as training data. Work on NER has thus for been limited on certain languages like English that are resource-abundant in general. As an alternative, we suggest that the NE corpus generated by our proposed method, can be used as training data. Our approach introduces Wikipedia as a raw text and uses the DBpedia data set for named entity disambiguation. Our method is language-independent and easy to be applied to many different languages where Wikipedia and DBpedia are provided. Throughout the paper, we demonstrate that our NE corpus is of comparable quality even to the manually annotated NE corpus.

2009

2006

pdf bib abs
Compiling large language resources using lexical similarity metrics for domain taxonomy learning
Ronny Melz | Pum-Mo Ryu | Key-Sun Choi
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

In this contribution we present a new methodology to compile large language resources for domain-specific taxonomy learning. We describe the necessary stages to deal with the rich morphology of an agglutinative language, i.e. Korean, and point out a second order machine learning algorithm to unveil term similarity from a given raw text corpus. The language resource compilation described is part of a fully automatic top-down approach to construct taxonomies, without involving the human efforts which are usually required.

pdf bib
Taxonomy Learning using Term Specificity and Similarity
Pum-Mo Ryu | Key-Sun Choi
Proceedings of the 2nd Workshop on Ontology Learning and Population: Bridging the Gap between Text and Knowledge

2005

pdf bib
Automatic Partial Parsing Rule Acquisition Using Decision Tree Induction
Myung-Seok Choi | Chul Su Lim | Key-Sun Choi
Second International Joint Conference on Natural Language Processing: Full Papers

pdf bib
Classifying Chinese Texts in Two Steps
Xinghua Fan | Maosong Sun | Key-sun Choi | Qin Zhang
Second International Joint Conference on Natural Language Processing: Full Papers

pdf bib
An Ensemble of Grapheme and Phoneme for Machine Transliteration
Jong-Hoon Oh | Key-Sun Choi
Second International Joint Conference on Natural Language Processing: Full Papers

pdf bib
Automatic Extraction of English-Korean Translations for Constituents of Technical Terms
Jong-Hoon Oh | Key-Sun Choi
Companion Volume to the Proceedings of Conference including Posters/Demos and tutorial abstracts

2004

pdf bib
Automatic clustering of collocation for detecting practical sense boundary
Saim Shin | Key-Sun Choi
Proceedings of the ACL Interactive Poster and Demonstration Sessions

pdf bib
A Statistical Model for Hangeul-Hanja Conversion in Terminology Domain
Jin-Xia Huang | Sun-Mee Bae | Key-sun Choi
Proceedings of the Third SIGHAN Workshop on Chinese Language Processing

pdf bib
Determining the Specificity of Terms based on Information Theoretic Measures
Pum-Mo Ryu | Key-Sun Choi
Proceedings of CompuTerm 2004: 3rd International Workshop on Computational Terminology

pdf bib
Semiautomatic Extension of CoreNet using a Bootstrapping Mechanism on Corpus-based Co-occurrences
Chris Biemann | Sa-Im Shin | Key-Sun Choi
COLING 2004: Proceedings of the 20th International Conference on Computational Linguistics

pdf bib abs
Lexical Analysis of Agglutinative Languages Using a Dictionary of Lemmas and Lexical Transducers
Sun-Mee Bae | Key-Sun Choi
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

This paper presents a simple method for performing a lexical analysis of agglutinative languages like Korean, which have a heavy morphology. Especially, for nouns and adverbs with regular morphological modifications and/or high productivity, we do not need to artificially construct huge dictionaries of all inflected forms of lemmas. To construct a dictionary of lemmas and lexical transducers, first, we construct automatically a dictionary of all inflected forms from KAIST POS-Tagged Corpus. Secondly, we separate the party of lemmas and one of sequences of inflectional suffixes. Thirdly, we describe their lexical transducers (i.e., morphological rules) to recognize all inflected forms of lemmas for nouns and adverbs according to the combinatorial restrictions between lemmas and their inflectional suffixes. Finally, we evaluate the advantages of this method.

2003

pdf bib
Virtual Linked Lexical Knowledge Base for Causality Reasoning
Key-Sun Choi
Proceedings of the 17th Pacific Asia Conference on Language, Information and Computation

pdf bib
Question-Answering Based on Virtually Integrated Lexical Knowledge Base
Key-Sun Choi | Jae-Ho Kim | Masaru Miyazaki | Jun Goto | Yeun-Bae Kim
Proceedings of the Sixth International Workshop on Information Retrieval with Asian Languages

2002

pdf bib
Implicit Ambiguity Resolution Using Incremental Clustering in Korean-to-English Cross-Language Information Retrieval
Kyung-Soon Lee | Kyo Kageura | Key-Sun Choi
COLING 2002: The 19th International Conference on Computational Linguistics

pdf bib
Unsupervised Named Entity Classification Models and their Ensembles
Jae-Ho Kim | In-Ho Kang | Key-Sun Choi
COLING 2002: The 19th International Conference on Computational Linguistics

pdf bib
Word Sense Disambiguation using Static and Dynamic Sense Vectors
Jong-Hoon Oh | Key-Sun Choi
COLING 2002: The 19th International Conference on Computational Linguistics

pdf bib
An English-Korean Transliteration Model Using Pronunciation and Contextual Rules
Jong-Hoon Oh | Key-Sun Choi
COLING 2002: The 19th International Conference on Computational Linguistics

pdf bib
Word Sense Disambiguation with Information Retrieval Technique
Jong-Hoon Oh | Saim Shin | Yong-Seok Choi | Key-Sun Choi
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)

2001

pdf bib
Proceedings of the 16th Pacific Asia Conference on Language, Information and Computation
Ik-Hwan Lee | Yong-Beom Kim | Key-Sun Choi | Minhaeng Lee
Proceedings of the 16th Pacific Asia Conference on Language, Information and Computation

pdf bib
A Korean Noun Semantic Hierarchy (Wordnet) Construction
Juho Lee | Koaunghi Un | Hee-Sook Bae | Key-Sun Choi
Proceedings of the 16th Pacific Asia Conference on Language, Information and Computation

pdf bib abs
A test suite for evaluation of English-to-Korean machine translation systems
Sungryong Koh | Jinee Maeng | Ji-Young Lee | Young-Sook Chae | Key-Sun Choi
Proceedings of Machine Translation Summit VIII

This paper describes KORTERM’s test suite and their practicability. The test-sets have been being constructed on the basis of fine-grained classification of linguistic phenomena to evaluate the technical status of English-to-Korean MT systems systematically. They consist of about 5000 test-sets and are growing. Each test-set contains an English sentence, a model Korean translation, a linguistic phenomenon category, and a yes/no question about the linguistic phenomenon. Two commercial systems were evaluated with a yes/no test of prepared questions. Total accuracy rates of the two systems were different (50% vs. 66%). In addition, a comprehension test was carried out. We found that one system was more comprehensible than the other system. These results seem to show that our test suite is practicable.

2000

pdf bib
Using Bilingual Semantic Information in Chinese-Korean Word Alignment
Jin-Xia Huang | Key-Sun Choi
Proceedings of the 14th Pacific Asia Conference on Language, Information and Computation

pdf bib
Automatic Transliteration and Back-transliteration by Decision Tree Learning
Byung-Ju Kang | Key-Sun Choi
Proceedings of the Second International Conference on Language Resources and Evaluation (LREC’00)

pdf bib
Design and Construction of Knowledge base for Verb using MRD and Tagged Corpus
Young-Soog Chae | Key-Sun Choi
Proceedings of the Second International Conference on Language Resources and Evaluation (LREC’00)

pdf bib
Introduction of KIBS (Korean Information Base System) Project
Young-Soog Chae | Key-Sun Choi
Proceedings of the Second International Conference on Language Resources and Evaluation (LREC’00)

pdf bib
Terminology in Korea: KORTERM
Key-Sun Choi | Young-Soog Chae
Proceedings of the Second International Conference on Language Resources and Evaluation (LREC’00)

pdf bib
Phrase-Pattern-based Korean to English Machine Translation using Two Level Translation Pattern Selection
Jung-jae Kim | Key-Sun Choi | Young-Soog Chae
Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics

pdf bib
Chinese-Korean Word Alignment Based on Linguistic Comparison
Jin-Xia Huang | Key-Sun Choi
Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics

pdf bib
Term Recognition Using Technical Dictionary Hierarchy
Jong-Hoon Oh | KyungSoon Lee | Key-Sun Choi
Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics

pdf bib
Dimension-Reduced Estimation of Word Co-occurrence Probability
Kilyoun Kim | Key-Sun Choi
Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics

1999

pdf bib
Corpus-Based Approach for Nominal Compound Analysis for Korean Based on Linguistic and Statistical Information
Juntae Yoon | Key-Sun Choi | Mansuk Song
1999 Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora

pdf bib abs
A pipelined multi-engine approach to Chinese-to-Korean machine translation: MATES/CK
Min Zhang | Key-Sun Choi
Proceedings of Machine Translation Summit VII

This paper presents MATES/CK, a Chinese-to-Korean machine translation system. We introduce the design philosophy, component modules, implementation and some other aspects of MATES/CK system in this paper.

pdf bib
Pipelined multi-engine Machine Translation: accomplishment of MATES/CK system
Min Zhang | Key-Sun Choi
Proceedings of the 8th Conference on Theoretical and Methodological Issues in Machine Translation of Natural Languages