[go: up one dir, main page]

Skip to content
View mumoumou's full-sized avatar

Block or report mumoumou

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

This python code performs an efficient speech reverberation starting from a dataset of close-talking speech signals and a collection of acoustic impulse responses.

Python 95 25 Updated May 30, 2020

A description of "RealMAN: A Real-Recorded and Annotated Microphone Array Dataset for Dynamic Speech Enhancement and Localization" [NIPS 2024]

Python 93 11 Updated Oct 12, 2024

More relighting!

Python 5,506 361 Updated Oct 27, 2024

SOTA discrete acoustic codec models with 40 tokens per second for audio language modeling

Python 787 43 Updated Oct 23, 2024

Generation scripts for EARS-WHAM and EARS-Reverb

Python 23 3 Updated Sep 16, 2024

speech enhancement\speech seperation\sound source localization

1,048 221 Updated Nov 14, 2023

Paper, Code and Resources for Speech Language Model and End2End Speech Dialogue System.

106 7 Updated Nov 10, 2024

State-of-the-art audio codec with 90x compression factor. Supports 44.1kHz, 24kHz, and 16kHz mono/stereo audio.

Python 1,209 112 Updated Jul 11, 2024

Audio Editor

C 12,606 2,264 Updated Nov 17, 2024
Python 15 1 Updated Jul 15, 2024

Code for calculate DNS_MOS.

Python 31 10 Updated Dec 18, 2022

A curated list of audio-visual learning methods and datasets.

231 17 Updated Oct 28, 2024

Facestar dataset. High quality audio-visual recordings of human conversational speech.

Python 104 6 Updated Mar 29, 2022

[ICML 2024 Best Paper] Discrete Diffusion Modeling by Estimating the Ratios of the Data Distribution (https://arxiv.org/abs/2310.16834)

Python 403 36 Updated Feb 29, 2024

Explicit Estimation of Magnitude and Phase Spectra in Parallel for High-Quality Speech Enhancement

Python 316 45 Updated Oct 28, 2024

This is the official implementation of the SEMamba paper. (Accepted to IEEE SLT 2024)

Python 143 13 Updated Sep 9, 2024

This repo contains the scripts, models, and required files for the Deep Noise Suppression (DNS) Challenge.

Python 1,110 412 Updated Jul 25, 2024

Official implementation of the paper "BigCodec: Pushing the Limits of Low-Bitrate Neural Speech Codec"

Python 82 4 Updated Sep 19, 2024

High-quality Text-to-Audio Generation with Efficient Diffusion Transformer

Python 238 8 Updated Nov 12, 2024

[CVPR 2024] SinSR: Diffusion-Based Image Super-Resolution in a Single Step

Python 330 19 Updated Sep 12, 2024

SSR-Speech: Towards Stable, Safe and Robust Zero-shot Speech Editing and Synthesis

Python 97 10 Updated Nov 1, 2024

Official data preparation scripts for the URGENT 2024 Challenge

Python 66 5 Updated Aug 12, 2024

ResShift: Efficient Diffusion Model for Image Super-resolution by Residual Shifting (NeurIPS@2023 Spotlight, TPAMI@2024)

Python 944 50 Updated Sep 14, 2024

This repo hosts the code and models of "Masked Autoencoders that Listen".

Python 544 45 Updated Apr 5, 2024

Code for the paper "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer"

Python 6,175 756 Updated Sep 20, 2024

AudioLDM: Generate speech, sound effects, music and beyond, with text.

Python 2,451 222 Updated Oct 14, 2024

Text-to-Audio/Music Generation

Python 2,305 180 Updated Sep 29, 2024

利用HuggingFace的官方下载工具从镜像网站进行高速下载。

Python 827 77 Updated Oct 12, 2024

code for A Large-scale Dataset for Audio-Language Representation Learning

C 10 Updated Sep 18, 2024

🔊 Repository for our NAACL-HLT 2019 paper: AudioCaps

Python 144 18 Updated Apr 23, 2024
Next