Profils utilisateurs correspondant à "Taiqiang Wu"
Taiqiang WuUniversity of Hong Kong | Tsinghua University Adresse e-mail validée de connect.hku.hk Cité 119 fois |
Riformer: Keep your vision backbone effective but removing token mixer
This paper studies how to keep a vision backbone effective while removing token mixers in
its basic building blocks. Token mixers, as self-attention for vision transformers (ViTs), are …
its basic building blocks. Token mixers, as self-attention for vision transformers (ViTs), are …
Rethinking kullback-leibler divergence in knowledge distillation for large language models
Kullback-Leiber divergence has been widely used in Knowledge Distillation (KD) to compress
Large Language Models (LLMs). Contrary to prior assertions that reverse Kullback-Leibler …
Large Language Models (LLMs). Contrary to prior assertions that reverse Kullback-Leibler …
Tencentpretrain: A scalable and flexible toolkit for pre-training models of different modalities
Recently, the success of pre-training in text domain has been fully extended to vision, audio,
and cross-modal scenarios. The proposed pre-training models of different modalities are …
and cross-modal scenarios. The proposed pre-training models of different modalities are …
Adapting llama decoder to vision transformer
This work examines whether decoder-only Transformers such as LLaMA, which were originally
designed for large language models (LLMs), can be adapted to the computer vision field. …
designed for large language models (LLMs), can be adapted to the computer vision field. …
Modeling fine-grained information via knowledge-aware hierarchical graph for zero-shot entity retrieval
Zero-shot entity retrieval, aiming to link mentions to candidate entities under the zero-shot
setting, is vital for many tasks in Natural Language Processing. Most existing methods …
setting, is vital for many tasks in Natural Language Processing. Most existing methods …
Syngen: A syntactic plug-and-play module for generative aspect-based sentiment analysis
Aspect-based Sentiment Analysis (ABSA) is a sentiment analysis task at fine-grained level.
Recently, generative frameworks have attracted increasing attention in ABSA due to their …
Recently, generative frameworks have attracted increasing attention in ABSA due to their …
Edge-free but structure-aware: Prototype-guided knowledge distillation from gnns to mlps
Distilling high-accuracy Graph Neural Networks~(GNNs) to low-latency multilayer
perceptrons~(MLPs) on graph tasks has become a hot research topic. However, MLPs rely …
perceptrons~(MLPs) on graph tasks has become a hot research topic. However, MLPs rely …
Weight-inherited distillation for task-agnostic bert compression
Knowledge Distillation (KD) is a predominant approach for BERT compression. Previous KD-based
methods focus on designing extra alignment losses for the student model to mimic …
methods focus on designing extra alignment losses for the student model to mimic …
A survey on the honesty of large language models
Honesty is a fundamental principle for aligning large language models (LLMs) with human
values, requiring these models to recognize what they know and don't know and be able to …
values, requiring these models to recognize what they know and don't know and be able to …
Riformer: Keep your vision backbone effective while removing token mixer
This paper studies how to keep a vision backbone effective while removing token mixers in
its basic building blocks. Token mixers, as self-attention for vision transformers (ViTs), are …
its basic building blocks. Token mixers, as self-attention for vision transformers (ViTs), are …