Stars
This python code performs an efficient speech reverberation starting from a dataset of close-talking speech signals and a collection of acoustic impulse responses.
A description of "RealMAN: A Real-Recorded and Annotated Microphone Array Dataset for Dynamic Speech Enhancement and Localization" [NIPS 2024]
SOTA discrete acoustic codec models with 40 tokens per second for audio language modeling
Generation scripts for EARS-WHAM and EARS-Reverb
speech enhancement\speech seperation\sound source localization
Paper, Code and Resources for Speech Language Model and End2End Speech Dialogue System.
State-of-the-art audio codec with 90x compression factor. Supports 44.1kHz, 24kHz, and 16kHz mono/stereo audio.
A curated list of audio-visual learning methods and datasets.
Facestar dataset. High quality audio-visual recordings of human conversational speech.
[ICML 2024 Best Paper] Discrete Diffusion Modeling by Estimating the Ratios of the Data Distribution (https://arxiv.org/abs/2310.16834)
Explicit Estimation of Magnitude and Phase Spectra in Parallel for High-Quality Speech Enhancement
This is the official implementation of the SEMamba paper. (Accepted to IEEE SLT 2024)
This repo contains the scripts, models, and required files for the Deep Noise Suppression (DNS) Challenge.
Official implementation of the paper "BigCodec: Pushing the Limits of Low-Bitrate Neural Speech Codec"
High-quality Text-to-Audio Generation with Efficient Diffusion Transformer
[CVPR 2024] SinSR: Diffusion-Based Image Super-Resolution in a Single Step
SSR-Speech: Towards Stable, Safe and Robust Zero-shot Speech Editing and Synthesis
Official data preparation scripts for the URGENT 2024 Challenge
ResShift: Efficient Diffusion Model for Image Super-resolution by Residual Shifting (NeurIPS@2023 Spotlight, TPAMI@2024)
This repo hosts the code and models of "Masked Autoencoders that Listen".
Code for the paper "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer"
AudioLDM: Generate speech, sound effects, music and beyond, with text.
利用HuggingFace的官方下载工具从镜像网站进行高速下载。
code for A Large-scale Dataset for Audio-Language Representation Learning
🔊 Repository for our NAACL-HLT 2019 paper: AudioCaps