#

tensorrt-llm

Here are 23 public repositories matching this topic...

DefTruth / Awesome-LLM-Inference

📖A curated list of Awesome LLM/VLM Inference Papers with codes, such as FlashAttention, PagedAttention, Parallelism, etc. 🎉🎉

sora llm llms vllm llm-inference awesome-llm flash-attention flash-attention-2 tensorrt-llm paged-attention deepseek open-sora flash-attention-3

Updated Dec 2, 2024

janhq / cortex.cpp

Local AI API Platform

onnx onnxruntime llamacpp gguf tensorrt-llm

Updated Dec 2, 2024
C++

collabora / WhisperLive

A nearly-live implementation of OpenAI's Whisper.

text-to-speech translation voice-recognition openai obs dictation whisper tensorrt tensorrt-llm whisper-tensorrt

Updated Nov 28, 2024
Python

shashikg / WhisperS2T

An Optimized Speech-to-Text Pipeline for the Whisper Model Supporting Multiple Inference Engine

deep-learning speech-recognition vad speech-to-text whisper asr tensorrt voice-activity-detection tensorrt-llm

Updated Aug 27, 2024
Jupyter Notebook

huggingface / optimum-benchmark

🏋️ A unified multi-backend utility for benchmarking Transformers, Timm, PEFT, Diffusers and Sentence-Transformers with full support of Optimum's hardware optimizations & quantization schemes.

benchmark pytorch openvino onnxruntime text-generation-inference neural-compressor tensorrt-llm

Updated Nov 28, 2024
Python

npuichigo / openai_trtllm

OpenAI compatible API for TensorRT LLM triton backend

triton-inference-server openai-api llm langchain tensorrt-llm

Updated Aug 1, 2024
Rust

NetEase-Media / grps

【深度学习模型部署框架】支持tf/torch/trt/trtllm/vllm以及更多nn框架，支持dynamic batching、streaming模式，支持python/c++双语言，可限制，可拓展，高性能。帮助用户快速地将模型部署到线上，并通过http/rpc接口方式提供服务。

tensorflow torch tensorrt serving triton-inference-server dynamic-batching vllm tensorrt-llm

Updated Nov 14, 2024
C++

NetEase-Media / grps_trtllm

【grps接入trtllm】通过GPRS+TensorRT-LLM+Tokenizers.cpp实现纯C++版高性能OpenAI LLM服务，支持chat和function call模式，支持ai agent，支持分布式多卡推理，支持多模态，支持gradio聊天界面。

openai multi-modal function-call ai-agent llm llama-index chatglm tensorrt-llm qwen2 qwen-vl llama3 internvl2

Updated Nov 3, 2024
C++

openhackathons-org / End-to-End-LLM

This repository is an AI Bootcamp material that consist of a workflow for LLM

natural-language-processing deep-learning question-answering prompt-tuning p-tuning llm genai nemo-guardrails tensorrt-llm nemo-megatron

Updated Aug 20, 2024
Jupyter Notebook

vossr / Chat-With-RTX-python-api

Chat With RTX Python API

tensorrt llm llm-inference tensorrt-llm mistral-7b llama2-13b chat-with-rtx nvidia-chat-with-rtx

Updated Dec 1, 2024
Python

janhq / cortex.tensorrt-llm

Cortex.Tensorrt-LLM is a C++ inference library that can be loaded by any server at runtime. It submodules NVIDIA’s TensorRT-LLM for GPU accelerated inference on NVIDIA's GPUs.

nvidia jan tensorrt llm tensorrt-llm

Updated Sep 26, 2024
C++

fgblanch / OutlookLLM

Add-in for new Outlook that adds LLM new features (Composition, Summarizing, Q&A). It uses a local LLM via Nvidia TensorRT-LLM

outlook-addin tensorrt-llm

Updated Oct 11, 2024
Python

zRzRzRzRzRzRzR / lm-fly

大模型推理框架加速，让 LLM 飞起来

mlx tgi openvino llm vllm llm-inference tensorrt-llm

Updated May 10, 2024
Python

argonne-lcf / LLM-Inference-Bench

LLM-Inference-Bench

benchmark inference deepspeed llm llamacpp vllm tensorrt-llm

Updated Nov 12, 2024
Jupyter Notebook

guidance-ai / llgtrt

TensorRT-LLM server with Structured Outputs (JSON) built with Rust

json regex guidance cfg openai-api tensorrt-llm structured-generation

Updated Dec 2, 2024
Rust

EdVince / whisper-trtllm

Whisper in TensorRT-LLM

cuda transformers openai whisper asr tensorrt huggingface tensorrt-llm

Updated Sep 21, 2023
C++

CactusQ / TensorRT-LLM-Tutorial

Getting started with TensorRT-LLM using BLOOM as a case study

jupyter-notebook deeplearning tensorrt tensorrt-inference llms llm-inference tensorrt-llm

Updated Mar 7, 2024
Jupyter Notebook

lix19937 / llm-deploy

AI Infra LLM infer/ tensorrt-llm/ vllm

llm llm-inference tensorrt-llm

Updated Oct 10, 2024
Python

j3soon / LLM-Tutorial

LLM tutorial materials include but not limited to NVIDIA NeMo, TensorRT-LLM, Triton Inference Server, and NeMo Guardrails.

nemo nvidia-nemo llm nemo-guardrails tensorrt-llm

Updated Sep 19, 2024
Jupyter Notebook

rungrodkspeed / flash_whisper

Whisper optimization for real-time application

go shell docker-compose python3 onnx triton-inference-server tensorrt-llm

Updated Nov 21, 2024
Python

Improve this page

Add a description, image, and links to the tensorrt-llm topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the tensorrt-llm topic, visit your repo's landing page and select "manage topics."