Stars
A one stop repository for generative AI research updates, interview resources, notebooks and much more!
Community list of startups working with AI in audio and music technology
A powerful Python library for getting rich data from the Vietnam Stock Market using just a few lines of code
Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model
FlashSpeech: Efficient Zero-Shot Speech Synthesis
TextrolSpeech: A Text Style Control Speech Corpus With Codec Language Text-to-Speech Models (2024 ICASSP)
[ECCV 2022] AutoTransition: Learning to Recommend Video Transition Effects
[IJCAI 2024] EAT: Self-Supervised Pre-Training with Efficient Audio Transformer
Unofficial PyTorch Implementation of UnivNet Vocoder (https://arxiv.org/abs/2106.07889)
Codebase for BirdClef 2023 solution
The code for the bark-voicecloning model. Training and inference.
Foundational model for human-like, expressive TTS
Versatile audio super resolution (any -> 48kHz) with AudioSR.
Inference and training library for high-quality TTS models.
Implementation of TTS model based on NVIDIA P-Flow TTS Paper
Awesome speech/audio LLMs, representation learning, and codec models
[ICASSP 2024] 🍵 Matcha-TTS: A fast TTS architecture with conditional flow matching
[Findings of NAACL 2024] Source code of paper CM-TTS: Enhancing Real Time Text-to-Speech Synthesis Efficiency through Weighted Samplers and Consistency Models
Instrument your FastAPI with Prometheus metrics.
Next-generation TTS model using flow-matching and DiT, inspired by Stable Diffusion 3
Zero-Shot Speech Editing and Text-to-Speech in the Wild
[ICASSP 2024] StoryTTS: A Highly Expressive Text-to-Speech Dataset with Rich Textual Expressiveness Annotations
This repository provides some useful snippets that you may need in some situations.
Efficient few-shot learning with Sentence Transformers
Xrehman / StyleTTS
Forked from yl4579/StyleTTSOfficial Implementation of StyleTTS
vits2 backbone with multilingual-bert