Profils utilisateurs correspondant à "Taiqiang Wu"

Taiqiang Wu

University of Hong Kong | Tsinghua University
Adresse e-mail validée de connect.hku.hk
Cité 119 fois

Riformer: Keep your vision backbone effective but removing token mixer

J Wang, S Zhang, Y Liu, T Wu, Y Yang… - Proceedings of the …, 2023 - openaccess.thecvf.com
This paper studies how to keep a vision backbone effective while removing token mixers in
its basic building blocks. Token mixers, as self-attention for vision transformers (ViTs), are …

Rethinking kullback-leibler divergence in knowledge distillation for large language models

T Wu, C Tao, J Wang, R Yang, Z Zhao… - arXiv preprint arXiv …, 2024 - arxiv.org
Kullback-Leiber divergence has been widely used in Knowledge Distillation (KD) to compress
Large Language Models (LLMs). Contrary to prior assertions that reverse Kullback-Leibler …

Tencentpretrain: A scalable and flexible toolkit for pre-training models of different modalities

…, N Sun, H Liu, W Mao, H Guo, W Guo, T Wu… - arXiv preprint arXiv …, 2022 - arxiv.org
Recently, the success of pre-training in text domain has been fully extended to vision, audio,
and cross-modal scenarios. The proposed pre-training models of different modalities are …

Adapting llama decoder to vision transformer

J Wang, W Shao, M Chen, C Wu, Y Liu, T Wu… - arXiv preprint arXiv …, 2024 - arxiv.org
This work examines whether decoder-only Transformers such as LLaMA, which were originally
designed for large language models (LLMs), can be adapted to the computer vision field. …

Modeling fine-grained information via knowledge-aware hierarchical graph for zero-shot entity retrieval

T Wu, X Bai, W Guo, W Liu, S Li, Y Yang - Proceedings of the Sixteenth …, 2023 - dl.acm.org
Zero-shot entity retrieval, aiming to link mentions to candidate entities under the zero-shot
setting, is vital for many tasks in Natural Language Processing. Most existing methods …

Syngen: A syntactic plug-and-play module for generative aspect-based sentiment analysis

C Yu, T Wu, J Li, X Bai, Y Yang - ICASSP 2023-2023 IEEE …, 2023 - ieeexplore.ieee.org
Aspect-based Sentiment Analysis (ABSA) is a sentiment analysis task at fine-grained level.
Recently, generative frameworks have attracted increasing attention in ABSA due to their …

Edge-free but structure-aware: Prototype-guided knowledge distillation from gnns to mlps

T Wu, Z Zhao, J Wang, X Bai, L Wang, N Wong… - arXiv preprint arXiv …, 2023 - arxiv.org
Distilling high-accuracy Graph Neural Networks~(GNNs) to low-latency multilayer
perceptrons~(MLPs) on graph tasks has become a hot research topic. However, MLPs rely …

Weight-inherited distillation for task-agnostic bert compression

T Wu, C Hou, S Lao, J Li, N Wong, Z Zhao… - arXiv preprint arXiv …, 2023 - arxiv.org
Knowledge Distillation (KD) is a predominant approach for BERT compression. Previous KD-based
methods focus on designing extra alignment losses for the student model to mimic …

A survey on the honesty of large language models

S Li, C Yang, T Wu, C Shi, Y Zhang, X Zhu… - arXiv preprint arXiv …, 2024 - arxiv.org
Honesty is a fundamental principle for aligning large language models (LLMs) with human
values, requiring these models to recognize what they know and don't know and be able to …

Riformer: Keep your vision backbone effective while removing token mixer

J Wang, S Zhang, Y Liu, T Wu, Y Yang, X Liu… - arXiv preprint arXiv …, 2023 - arxiv.org
This paper studies how to keep a vision backbone effective while removing token mixers in
its basic building blocks. Token mixers, as self-attention for vision transformers (ViTs), are …