This system leverages local Ollama LLMs and state-of-the-art TTS models for voice cloning.
| Agent | File | Purpose |
|---|---|---|
| ๐ฏ Orchestrator | orchestrator.py |
Master agent for pipeline coordination |
| ๐ค Zero-Shot Cloning | zero_shot_cloning.py |
Instant voice cloning with VoxCPM/F5-TTS |
| ๐ฌ Quality Agent | quality_agent.py |
Audio quality assessment & comparison |
| ๐ง Ensemble Agent | ensemble_agent.py |
Multi-model synthesis & selection |
| โ๏ธ Colab Agent | colab_agent.py |
Google Colab GPU orchestration |
| ๐ฅ๏ธ GCP Agent | gcp_agent.py |
Google Cloud Platform deployment |
| ๐ RunPod Agent | runpod_agent.py |
RunPod GPU pod management |
| ๐ฎ NVIDIA API Agent | nvidia_api_agent.py |
NVIDIA API integration |
| Model | Size | Best For |
|---|---|---|
qwen3:8b |
5.2 GB | Reasoning & Planning |
deepseek-r1:7b |
4.7 GB | Code & Technical |
qwen2.5:3b |
1.9 GB | Fast, Simple Tasks |
phi3:14b |
7.9 GB | Best Quality |
llama3.2:3b |
2.0 GB | Balanced |
python starconnect.py analyze --dataset ./StarConnectpython starconnect.py profile --dataset ./StarConnect --name MyVoicepython starconnect.py clone \
--text "Bonjour, je suis votre assistant vocal." \
--reference ./StarConnect/segment_001.wav \
--ref-text "Bonjour, c'est Mani, producteur et manager."python starconnect.py ensemble \
--text "Bonjour, je suis votre assistant vocal." \
--reference ./StarConnect/segment_001.wav \
--language frpython starconnect.py assess --audio ./output.wavpython starconnect.py assess \
--audio ./cloned.wav \
--compare ./original.wavstarconnect.py <command> [options]
Commands:
analyze - Analyze voice dataset with LLM
select - Select best reference samples
profile - Create voice profile
clone - Clone voice (single text)
ensemble - Use multi-model ensemble
assess - Assess audio quality
batch - Batch process multiple texts
models - List available TTS models
ollama - Check Ollama status
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ USER REQUEST โ
โโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ ๐ฏ ORCHESTRATOR AGENT โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ Ollama LLM (qwen3:8b) โ โ
โ โ - Task planning โ โ
โ โ - Model selection โ โ
โ โ - Quality reasoning โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโผโโโโโโโโโโโโโโ
โผ โผ โผ
โโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโ
โ ๐ค Zero-Shot โ โ ๐ง Ensemble โ โ ๐ฌ Quality โ
โ Cloning โ โ Agent โ โ Agent โ
โ โ โ โ โ โ
โ - VoxCPM โ โ - VoxCPM โ โ - SNR Analysisโ
โ - F5-TTS โ โ - F5-TTS โ โ - Similarity โ
โ - Emotional โ โ - XTTS โ โ - LLM Report โ
โโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโ
โ โ โ
โโโโโโโโโโโโโโโผโโโโโโโโโโโโโโ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ OUTPUT AUDIO โ
โ - Cloned voice .wav โ
โ - Quality report โ
โ - LLM analysis โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
- Total Segments: 703 audio files
- Total Duration: ~64 minutes
- Language: French
- Speaker: Single speaker
- Format: WAV + JSON transcriptions
Generate speech with automatic emotion detection:
python starconnect.py clone \
--text "Je suis tellement content de te voir!" \
--reference ./StarConnect/segment_001.wav \
--emotion autoSupported emotions:
neutral,happy,sad,angry,surprisedfearful,disgusted,professional,excited
- Upload dataset to Google Drive
- Open
notebooks/F5_TTS_Colab_Training.ipynb - Run the polling cell
- Trigger from local:
python agents/colab_agent.py trigger \
--dataset /content/drive/MyDrive/f5_tts_datasets/starconnect_f5tts \
--epochs 200The quality agent calculates:
- SNR (Signal-to-Noise Ratio)
- Dynamic Range
- Clipping Detection
- Silence Ratio
- Intelligibility Score
- Naturalness Score
- Similarity Score (vs reference)
STARCONNECT_TRAINING_GUIDE.md- Colab training guideF5_TTS_INTEGRATION.md- F5-TTS documentationGPU_ORCHESTRATION_GUIDE.md- Cloud GPU optionsNVIDIA_FREE_OPTIONS.md- Free NVIDIA resources
- Orchestrator agent with LLM reasoning
- Zero-shot cloning agent
- Emotional TTS agent
- Audio quality assessment
- Multi-model ensemble
- Unified CLI
- Colab orchestration
- 12 Ollama models available
- 703 StarConnect segments processed
Built with cutting-edge 2026 AI techniques ๐