Collection of resources on the applications of Large Language Models (LLMs) in Audio AI.
-
Updated
Oct 16, 2025
Collection of resources on the applications of Large Language Models (LLMs) in Audio AI.
Very fast, accurate speaker diarization
A Collection of no cost ai websites with models such as Claude 4 sonnet/opus, Grok 4, o3 Pro, Gemini 2.5 Pro for free & much more...
Implementation of the model "AudioFlamingo" from the paper: "Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities"
text-to-audio-latent-diffusion
Code to train a custom time-domain autoencoder to dereverb audio
A deep learning-based Speech Emotion Recognition (SER) model trained primarily on Indian languages. Designed for applications in call centers, sentiment analysis, and accessibility tools.
Safe, production-ready starter for voice cloning via SV2TTS (RTVC wrapper). CLI, tests, Docker, CI, pre-commit. No model weights included.
Guide to deploying neural networks in VST plugins, with a specific focus on embedded devices using the Elk Audio OS
🗣️ Audio AI: Your Audio & Video Transcription Powerhouse!
Whether it’s text or a link, it can be turned into a podcast!
⚡ Accelerate speaker diarization with Senko, processing 1 hour of audio in just 5 seconds on powerful hardware—boost your audio analysis efficiency.
PodcastAgent uses advanced text-to-speech technology to create natural-sounding multi-speaker podcasts from any written content.
A conceptual voice to prompt pipeline that attempts to separate instructions from provided context for better results
Add a description, image, and links to the audio-ai topic page so that developers can more easily learn about it.
To associate your repository with the audio-ai topic, visit your repo's landing page and select "manage topics."