Google Scholar

Profils utilisateurs correspondant à "Taiqiang Wu"

Taiqiang Wu

University of Hong Kong | Tsinghua University

Adresse e-mail validée de connect.hku.hk

Cité 119 fois

[PDF] thecvf.com

Riformer: Keep your vision backbone effective but removing token mixer

J Wang, S Zhang, Y Liu, T Wu, Y Yang… - Proceedings of the …, 2023 - openaccess.thecvf.com

This paper studies how to keep a vision backbone effective while removing token mixers in
its basic building blocks. Token mixers, as self-attention for vision transformers (ViTs), are …

Enregistrer Citer Cité 28 fois Autres articles Les 6 versions Version HTML

[PDF] arxiv.org

Rethinking kullback-leibler divergence in knowledge distillation for large language models

T Wu, C Tao, J Wang, R Yang, Z Zhao… - arXiv preprint arXiv …, 2024 - arxiv.org

Kullback-Leiber divergence has been widely used in Knowledge Distillation (KD) to compress
Large Language Models (LLMs). Contrary to prior assertions that reverse Kullback-Leibler …

Enregistrer Citer Cité 11 fois Autres articles Les 2 versions Version HTML

[PDF] arxiv.org

Tencentpretrain: A scalable and flexible toolkit for pre-training models of different modalities

…, N Sun, H Liu, W Mao, H Guo, W Guo, T Wu… - arXiv preprint arXiv …, 2022 - arxiv.org

Recently, the success of pre-training in text domain has been fully extended to vision, audio,
and cross-modal scenarios. The proposed pre-training models of different modalities are …

Enregistrer Citer Cité 19 fois Autres articles Les 4 versions Version HTML

[PDF] arxiv.org

Adapting llama decoder to vision transformer

J Wang, W Shao, M Chen, C Wu, Y Liu, T Wu… - arXiv preprint arXiv …, 2024 - arxiv.org

This work examines whether decoder-only Transformers such as LLaMA, which were originally
designed for large language models (LLMs), can be adapted to the computer vision field. …

Enregistrer Citer Cité 3 fois Autres articles Les 2 versions Version HTML

[PDF] acm.org

Modeling fine-grained information via knowledge-aware hierarchical graph for zero-shot entity retrieval

T Wu, X Bai, W Guo, W Liu, S Li, Y Yang - Proceedings of the Sixteenth …, 2023 - dl.acm.org

Zero-shot entity retrieval, aiming to link mentions to candidate entities under the zero-shot
setting, is vital for many tasks in Natural Language Processing. Most existing methods …

Enregistrer Citer Cité 11 fois Autres articles Les 6 versions

[PDF] arxiv.org

Syngen: A syntactic plug-and-play module for generative aspect-based sentiment analysis

C Yu, T Wu, J Li, X Bai, Y Yang - ICASSP 2023-2023 IEEE …, 2023 - ieeexplore.ieee.org

Aspect-based Sentiment Analysis (ABSA) is a sentiment analysis task at fine-grained level.
Recently, generative frameworks have attracted increasing attention in ABSA due to their …

Enregistrer Citer Cité 16 fois Autres articles Les 3 versions

[PDF] arxiv.org

Edge-free but structure-aware: Prototype-guided knowledge distillation from gnns to mlps

T Wu, Z Zhao, J Wang, X Bai, L Wang, N Wong… - arXiv preprint arXiv …, 2023 - arxiv.org

Distilling high-accuracy Graph Neural Networks~(GNNs) to low-latency multilayer
perceptrons~(MLPs) on graph tasks has become a hot research topic. However, MLPs rely …

Enregistrer Citer Cité 7 fois Autres articles Les 3 versions Version HTML

[PDF] arxiv.org

Weight-inherited distillation for task-agnostic bert compression

T Wu, C Hou, S Lao, J Li, N Wong, Z Zhao… - arXiv preprint arXiv …, 2023 - arxiv.org

Knowledge Distillation (KD) is a predominant approach for BERT compression. Previous KD-based
methods focus on designing extra alignment losses for the student model to mimic …

Enregistrer Citer Cité 7 fois Autres articles Les 3 versions Version HTML

[PDF] arxiv.org

A survey on the honesty of large language models

S Li, C Yang, T Wu, C Shi, Y Zhang, X Zhu… - arXiv preprint arXiv …, 2024 - arxiv.org

Honesty is a fundamental principle for aligning large language models (LLMs) with human
values, requiring these models to recognize what they know and don't know and be able to …

Enregistrer Citer Cité 1 fois Autres articles Les 3 versions Version HTML

[PDF] arxiv.org

Riformer: Keep your vision backbone effective while removing token mixer

J Wang, S Zhang, Y Liu, T Wu, Y Yang, X Liu… - arXiv preprint arXiv …, 2023 - arxiv.org

This paper studies how to keep a vision backbone effective while removing token mixers in
its basic building blocks. Token mixers, as self-attention for vision transformers (ViTs), are …

Enregistrer Citer Cité 2 fois Autres articles Les 2 versions Version HTML

Créer l'alerte

Citer

Recherche avancée

Enregistré dans Ma bibliothèque

Profils utilisateurs correspondant à "Taiqiang Wu"

Taiqiang Wu

Riformer: Keep your vision backbone effective but removing token mixer

Rethinking kullback-leibler divergence in knowledge distillation for large language models

Tencentpretrain: A scalable and flexible toolkit for pre-training models of different modalities

Adapting llama decoder to vision transformer

Modeling fine-grained information via knowledge-aware hierarchical graph for zero-shot entity retrieval

Syngen: A syntactic plug-and-play module for generative aspect-based sentiment analysis

Edge-free but structure-aware: Prototype-guided knowledge distillation from gnns to mlps

Weight-inherited distillation for task-agnostic bert compression

A survey on the honesty of large language models

Riformer: Keep your vision backbone effective while removing token mixer