Starred repositories
[ICML 2024] Official PyTorch implementation of "SLAB: Efficient Transformers with Simplified Linear Attention and Progressive Re-parameterized Batch Normalization"
NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.
Quantization library for PyTorch. Support low-precision and mixed-precision quantization, with hardware implementation through TVM.
A high-throughput and memory-efficient inference and serving engine for LLMs
[TMLR 2024] Efficient Large Language Models: A Survey
Code repo for the paper "LLM-QAT Data-Free Quantization Aware Training for Large Language Models"
[ICLR2024 spotlight] OmniQuant is a simple and powerful quantization technique for LLMs.
LightSeq: A High Performance Library for Sequence Processing and Generation
The PyTorch implementation of Learned Step size Quantization (LSQ) in ICLR2020 (unofficial)
Official inference library for Mistral models
The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.
An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.
The official implementation of the ICML 2023 paper OFQ-ViT
a fast and user-friendly runtime for transformer inference (Bert, Albert, GPT2, Decoders, etc) on CPU and GPU.
Transformer related optimization, including BERT, GPT
Modeling, training, eval, and inference code for OLMo
Implementation of "DeepShift: Towards Multiplication-Less Neural Networks" https://arxiv.org/abs/1905.13298
Code for the AAAI 2024 Oral paper "OWQ: Outlier-Aware Weight Quantization for Efficient Fine-Tuning and Inference of Large Language Models".
Code for paper: "QuIP: 2-Bit Quantization of Large Language Models With Guarantees"
[EMNLP 2022 main] Code for "Understanding and Improving Knowledge Distillation for Quantization-Aware-Training of Large Transformer Encoders"
[ICML 2024] SqueezeLLM: Dense-and-Sparse Quantization
QLoRA: Efficient Finetuning of Quantized LLMs