Cross-modal retrieval is an important yet challenging task due to the semantic discrepancy between visual content and language. To measure the correlation between images and text, most existing research mainly focuses on learning global or local correspondence, failing to explore fine-grained local-global alignment. To infer more accurate similarity scores, we introduce a novel High Order Semantic Alignment (HOSA) model that can provide complementary and comprehensive semantic clues. Specifically, to jointly learn global and local alignment and emphasize local-global interaction, we employ tensor-product (t-product) operation to reconstruct one modal’s representation based on another modal’s information in a common semantic space. Such a cross-modal reconstruction strategy would significantly enhance inter-modal correlation learning in a fine-grained manner. Extensive experiments on two benchmark datasets validate that our model significantly outperforms several state-of-the-art baselines, especially in retrieving the most relevant results.
Metaphor identification is usually formulated as a sequence labeling or a syntactically related word-pair classification problem. In this paper, we propose a novel formulation of metaphor identification as a relation extraction problem. We introduce metaphorical relations, which are links between two spans, a target span and a source-related span, which are realized in sentences. Based on spans, we can use more flexible and precise text units beyond single words for capturing the properties of the target and the source. We create a dataset for Chinese metaphorical relation extraction, with more than 4,200 sentences annotated with metaphorical relations, corresponding target/source-related spans, and fine-grained span types. We develop a span-based end-to-end model for metaphorical relation extraction and demonstrate its effectiveness. We expect that metaphorical relation extraction can serve as a bridge for connecting linguistic and conceptual metaphor processing. The dataset is at https://github.com/cnunlp/CMRE.
This paper proposes to adapt self-attention to discourse level for modeling discourse elements in argumentative student essays. Specifically, we focus on two issues. First, we propose structural sentence positional encodings to explicitly represent sentence positions. Second, we propose to use inter-sentence attentions to capture sentence interactions and enhance sentence representation. We conduct experiments on two datasets: a Chinese dataset and an English dataset. We find that (i) sentence positional encoding can lead to a large improvement for identifying discourse elements; (ii) a structural relative positional encoding of sentences shows to be most effective; (iii) inter-sentence attention vectors are useful as a kind of sentence representations for identifying discourse elements.
This paper proposes a pre-training based automated Chinese essay scoring method. The method involves three components: weakly supervised pre-training, supervised cross- prompt fine-tuning and supervised target- prompt fine-tuning. An essay scorer is first pre- trained on a large essay dataset covering diverse topics and with coarse ratings, i.e., good and poor, which are used as a kind of weak supervision. The pre-trained essay scorer would be further fine-tuned on previously rated es- says from existing prompts, which have the same score range with the target prompt and provide extra supervision. At last, the scorer is fine-tuned on the target-prompt training data. The evaluation on four prompts shows that this method can improve a state-of-the-art neural essay scorer in terms of effectiveness and domain adaptation ability, while in-depth analysis also reveals its limitations..