2024
pdf
bib
abs
Learning to Maximize Mutual Information for Chain-of-Thought Distillation
Xin Chen
|
Hanxian Huang
|
Yanjun Gao
|
Yi Wang
|
Jishen Zhao
|
Ke Ding
Findings of the Association for Computational Linguistics: ACL 2024
Knowledge distillation, the technique of transferring knowledge from large, complex models to smaller ones, marks a pivotal step towards efficient AI deployment. Distilling Step-by-Step (DSS), a novel method utilizing chain-of-thought (CoT) distillation, has demonstrated promise by imbuing smaller models with the superior reasoning capabilities of their larger counterparts. In DSS, the distilled model acquires the ability to generate rationales and predict labels concurrently through a multi-task learning framework. However, DSS overlooks the intrinsic relationship between the two training tasks, leading to ineffective integration of CoT knowledge with the task of label prediction. To this end, we investigate the mutual relationship of the two tasks from Information Bottleneck perspective and formulate it as maximizing the mutual information of the representation features of the two tasks. We propose a variational approach to solve this optimization problem using a learning-based method. Our experimental results across four datasets demonstrate that our method outperforms the state-of-the-art DSS. Our findings offer insightful guidance for future research on language model distillation as well as applications involving CoT. Codes are available at https://github.com/xinchen9/cot_distillation_ACL2024.
2022
pdf
bib
abs
AstBERT: Enabling Language Model for Financial Code Understanding with Abstract Syntax Trees
Rong Liang
|
Tiehua Zhang
|
Yujie Lu
|
Yuze Liu
|
Zhen Huang
|
Xin Chen
Proceedings of the Fourth Workshop on Financial Technology and Natural Language Processing (FinNLP)
Using the pre-trained language models to understand source codes has attracted increasing attention from financial institutions owing to the great potential to uncover financial risks. However, there are several challenges in applying these language models to solve programming language related problems directly. For instance, the shift of domain knowledge between natural language (NL) and programming language (PL) requires understanding the semantic and syntactic information from the data from different perspectives. To this end, we propose the AstBERT model, a pre-trained PL model aiming to better understand the financial codes using the abstract syntax tree (AST). Specifically, we collect a sheer number of source codes (both Java and Python) from the Alipay code repository and incorporate both syntactic and semantic code knowledge into our model through the help of code parsers, in which AST information of the source codes can be interpreted and integrated. We evaluate the performance of the proposed model on three tasks, including code question answering, code clone detection and code refinement. Experiment results show that our AstBERT achieves promising performance on three different downstream tasks.
2021
pdf
bib
abs
基于人物特征增强的拟人句要素抽取方法研究(Research on Element Extraction of Personified Sentences Based on Enhanced Characters)
Jing Li (李婧)
|
Suge Wang (王素格)
|
Xin Chen (陈鑫)
|
Dian Wang (王典)
Proceedings of the 20th Chinese National Conference on Computational Linguistics
在散文阅读理解的鉴赏类问题中,对拟人句赏析考查比较频繁。目前,已有的工作仅对拟人句中的本体要素进行识别并抽取,存在要素抽取不完整的问题,尤其是当句子中出现多个本体时,需要确定拟人词与各个本体的对应关系。为解决这些问题,本文提出了基于人物特征增强的拟人句要素抽取方法。该方法利用特定领域的特征,增强句子的向量表示,再利用条件随机场模型对拟人句中的本体和拟人词要素进行识别。在此基础上,利用自注意力机制对要素之间的关系进行检测,使用要素同步机制和关系同步机制进行信息交互,用于要素识别和关系检测的输入更新。在自建的拟人数据集上进行<本体,拟人词>抽取的比较实验,结果表明本文提出的模型性能优于其他比较模型。
pdf
bib
Jointly Identifying Rhetoric and Implicit Emotions via Multi-Task Learning
Xin Chen
|
Zhen Hai
|
Deyu Li
|
Suge Wang
|
Dian Wang
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021
2014
pdf
bib
Automatic Assessment of the Speech of Young English Learners
Jian Cheng
|
Yuan Zhao D’Antilio
|
Xin Chen
|
Jared Bernstein
Proceedings of the Ninth Workshop on Innovative Use of NLP for Building Educational Applications