Converts course MDX/Markdown content into podcast-style conversational audio using a two-stage pipeline:
- Script Generation: Claude Haiku 4.5 generates engaging dialog (via Claude Code CLI)
- Audio Synthesis: Gemini 2.5 Flash TTS converts scripts to audio
- Two-speaker dialogue: Converts technical documentation into natural conversations between Alex (instructor) and Sam (senior engineer)
- Optimized for senior engineers: Professional, engaging, argument-driven content based on educational podcast best practices
- Separated concerns: Generate scripts first, then audio - allows manual editing and version control
- Multi-speaker TTS: Natural voice synthesis with distinct speaker voices (Kore/Charon)
- Automatic processing: Scans all course content and processes systematically
- Dual manifests: Scripts manifest + audio manifest for tracking
- Node.js 20+
- Claude Code CLI installed and authenticated (
npm install -g @anthropic-ai/claude-code) - Google Gemini API key for TTS
- Course content in
website/docs/directory
npm install -g @anthropic-ai/claude-code
claude # Follow authentication promptsexport GOOGLE_API_KEY="your-api-key-here"
# OR
export GEMINI_API_KEY="your-api-key-here"
# OR
export GCP_API_KEY="your-api-key-here"cd scripts
npm installcd scripts
npm run generate-podcastThis runs both stages sequentially.
cd scripts
npm run generate-podcast-scriptsOutput: Markdown scripts in scripts/output/podcasts/
- Version-controllable
- Manually editable
- Contains frontmatter with metadata
cd scripts
npm run generate-podcast-audioOutput: WAV files in website/static/audio/
- Reads saved scripts
- Multi-speaker synthesis
- Updates audio manifest
cd scripts
npm run generate-podcast-legacyRuns the original single-stage script (generates dialog inline without saving)
Location: scripts/output/podcasts/
Structure:
output/podcasts/
├── manifest.json
├── intro.md
├── fundamentals/
│ ├── lesson-1-how-llms-work.md
│ └── lesson-2-how-agents-work.md
└── methodology/
└── lesson-3-high-level-methodology.md
Script Format:
---
source: fundamentals/lesson-1-how-llms-work.md
speakers:
- name: Alex
role: Instructor
voice: Kore
- name: Sam
role: Senior Engineer
voice: Charon
generatedAt: 2025-11-01T12:34:56.789Z
model: claude-haiku-4.5
tokenCount: 5234
---
Alex: Let's dive into AI coding agents...
Sam: I've been using them for a few months now...Location: website/static/audio/
Structure: Mirrors script directory structure
Manifest: website/static/audio/manifest.json
{
"fundamentals/lesson-1-how-llms-work.md": {
"audioUrl": "/audio/fundamentals/lesson-1-how-llms-work.wav",
"size": 1234567,
"format": "audio/wav",
"tokenCount": 5234,
"generatedAt": "2025-11-01T12:34:56.789Z",
"scriptSource": "fundamentals/lesson-1-how-llms-work.md"
}
}- Content Discovery: Scans
website/docs/for .md/.mdx files - Content Parsing: Strips frontmatter, JSX, code blocks
- Prompt Engineering: Builds optimized prompt for Haiku 4.5
- Dialog Generation: Calls Claude Code CLI in headless mode
- Script Output: Saves markdown with frontmatter to
output/podcasts/ - Manifest Update: Updates script manifest
- Script Discovery: Scans
output/podcasts/for markdown files - Script Parsing: Extracts frontmatter and dialog
- Token Validation: Ensures dialog fits TTS limits
- Audio Synthesis: Calls Gemini 2.5 Flash TTS with multi-speaker config
- WAV Creation: Adds proper headers to PCM data
- Audio Output: Saves to
website/static/audio/ - Manifest Update: Updates audio manifest
- Dialog Generation: Claude Haiku 4.5 (via Claude Code CLI)
- TTS:
gemini-2.5-flash-preview-tts(audio synthesis)
- Alex: "Kore" voice (firm, professional instructor)
- Sam: "Charon" voice (neutral, professional engineer)
- Script concurrency: 3 files at a time (Claude CLI calls)
- Audio concurrency: 3 files at a time (API rate limits)
- Token limits: 6,000-7,500 tokens per dialog (TTS API constraint)
- Input: ~$0.25 per 1M tokens
- Output: ~$1.25 per 1M tokens
- Estimated per lesson: ~$0.01-0.05 (depends on content length)
- Audio output: $10.00 per 1M tokens
- Estimated per lesson: ~$0.05-0.10 (6k-7k tokens avg)
- Script generation: ~$0.50-1.00 total
- Audio synthesis: ~$0.60-1.20 total
- Combined: ~$1.10-2.20 total
Benefits of split pipeline:
- Regenerate audio without re-prompting LLM (saves script gen costs)
- Edit scripts manually before audio synthesis (reduces audio regeneration)
Repairs corrupted WAV files (adds missing headers to raw PCM data):
cd scripts
node fix-wav-files.jsThis creates .bak backups and adds proper RIFF/WAV headers to headerless PCM files.
Set one of the environment variables: GOOGLE_API_KEY, GEMINI_API_KEY, or GCP_API_KEY
- Ensure Claude Code CLI is installed:
npm install -g @anthropic-ai/claude-code - Verify it's in PATH:
which claude(should return a path) - Authenticate: Run
claudeand follow prompts - Check permissions: Script uses
--dangerously-skip-permissionsflag
Run npm install in the scripts directory
Run npm run generate-podcast-scripts first before generating audio
The script was generated with too much content. Regenerate with stricter constraints or manually edit the script file to reduce length.
The audio generation script automatically adds proper WAV headers. If you have old files from the legacy script:
node fix-wav-files.jsThe Gemini 2.5 Flash TTS model is in preview and may have some background noise in long generations (known issue as of October 2025)
- Script generation: Reduce concurrency in
generate-podcast-script.js(currently 3) - Audio generation: Reduce concurrency in
generate-podcast-audio.js(currently 3)
Script generation:
// In generate-podcast-script.js main(), after finding files:
const files = findMarkdownFiles(DOCS_DIR).slice(0, 1); // Test first file onlyAudio generation:
// In generate-podcast-audio.js main(), after finding files:
const files = findScriptFiles(SCRIPT_INPUT_DIR).slice(0, 1); // Test first file only# 1. Generate single script
cd scripts
# Edit generate-podcast-script.js to slice(0, 1)
npm run generate-podcast-scripts
# 2. Review output
cat output/podcasts/intro.md
# 3. Generate audio from that script
# Edit generate-podcast-audio.js to slice(0, 1)
npm run generate-podcast-audio
# 4. Test audio playback
open ../website/static/audio/intro.wav- Scripts automatically skip
CLAUDE.mdfiles (project instructions) - Requires
.mdor.mdxextension
Scripts (output/podcasts/): Consider version-controlling these for:
- Manual editing capability
- Tracking prompt quality improvements
- Rollback if regeneration produces worse results
Audio files (website/static/audio/): Typically excluded from git due to size:
# Add to .gitignore if needed
website/static/audio/*.wav