An AI-powered voice chatbot that coaches salespeople using behavioral psychology principles from Cialdini, Voss, and Kahneman.
Features โข Architecture โข Quick Start โข Documentation
The Behavioral Psychology Sales Coach listens to sales conversations in real-time, detects customer situations, and provides audio responses backed by evidence-based psychological principles. It combines:
- Voice-to-Voice AI powered by LFM2.5-Audio
- Semantic Situation Detection using Pinecone vector search
- 80+ Psychology Principles from influential sales and psychology books
- Real-time Coaching with explainable AI decisions
- Record customer audio from microphone or file upload
- Transcribe using LFM2.5-Audio ASR
- Detect the sales situation using semantic similarity
- Select the best psychological principle using multi-factor scoring
- Generate natural voice response with coaching explanation
- Display structured coaching output explaining why this principle was chosen
- โ Real-time audio recording with silence detection
- โ Voice transcription using LFM2.5-Audio
- โ Situation detection (keyword matching)
- โ Principle selection from 80+ psychology principles
- โ Voice response generation
- โ Structured coaching output (YAML)
- โ Modal GPU deployment with model caching
- โ Semantic Detection: Pinecone vector search replaces keyword matching
- โ Multi-Factor Scoring: Combines semantic relevance, recency penalty, stage fit, and randomization
- โ Warm Pool: Modal containers stay warm for sub-6s response times
- โ Streamlit UI: Web interface with microphone recording and file upload
- โ Debug Panel: Visualize situation detection and principle selection scores
- โ Conversation Context: Tracks turns, recent principles, and sales stage
- ๐ Real-time coaching tips (~1.3s instead of ~6s)
- ๐ Deep context tracking (customer profiles, stage progression)
- ๐ Local Whisper for faster transcription (~0.5s)
graph TB
A[User Record Audio] --> B[Upload to Modal Volume]
B --> C[Modal GPU Server]
C --> D[Transcribe with LFM2.5-Audio]
D --> E[Embed Transcript]
E --> F[Query Pinecone for Situations]
F --> G[Detect Situation]
G --> H[Score Principles]
H --> I[Select Best Principle]
I --> J[Generate Voice Response]
J --> K[Return Audio + Coaching]
K --> L[Display Coaching Output]
K --> M[Play Audio Response]
graph LR
subgraph "Local Client"
A1[Audio Recorder]
A2[Streamlit UI]
A3[File Manager]
end
subgraph "Modal Cloud GPU"
B1[Server]
B2[LFM2.5-Audio Model]
B3[Embedding Model]
end
subgraph "External Services"
C1[Pinecone<br/>Vector DB]
C2[HuggingFace<br/>Model Hub]
end
A1 --> A2
A2 --> A3
A3 --> B1
B1 --> B2
B1 --> B3
B3 --> C1
B2 --> C2
B1 --> A3
sequenceDiagram
participant U as User
participant UI as Streamlit UI
participant M as Modal Server
participant P as Pinecone
participant HF as HuggingFace
U->>UI: Record/Upload Audio
UI->>M: Upload audio.wav
M->>M: Load LFM2.5-Audio Model
M->>HF: Transcribe Audio (ASR)
HF-->>M: Transcript
M->>M: Embed Transcript (BGE-small)
M->>P: Query Situations Namespace
P-->>M: Top Situations + Scores
M->>M: Detect Best Situation
M->>P: Query Principles Namespace
P-->>M: Candidate Principles
M->>M: Score Principles<br/>(semantic + recency + stage)
M->>M: Select Best Principle
M->>HF: Generate Voice Response
HF-->>M: Audio + Text
M->>UI: Return Result
UI->>U: Display Coaching + Play Audio
graph TD
A[principles.json<br/>80+ Principles] --> B[Embed with BGE-small]
C[situations.json<br/>50+ Situations] --> B
B --> D[Pinecone Index]
E[Customer Audio] --> F[Transcribe]
F --> G[Embed Transcript]
G --> H[Vector Search]
D --> H
H --> I[Detected Situation]
I --> J[Scored Principles]
J --> K[Selected Principle]
K --> L[Voice Response]
- Python 3.11+
- Modal Account (Sign up - free tier includes $30/month)
- HuggingFace Account (Sign up)
- Pinecone Account (Sign up - free tier available)
git clone <repository-url>
cd liquid-audio-model# Create virtual environment
python -m venv venv
source venv/bin/activate # On macOS/Linux
# or: venv\Scripts\activate # On Windows
# Install package
pip install -e .
# Or with uv (faster)
uv sync- Get token from HuggingFace Settings
- Accept model terms: LFM2.5-Audio-1.5B
- Create Modal secret:
modal secret create huggingface-secret HF_TOKEN=hf_your_token_here- Create API key at Pinecone Console
- Create
.envfile:
cp .env.example .env
# Edit .env and add:
PINECONE_API_KEY=your_key_here
PINECONE_INDEX_NAME=sales-coach-embeddingspip install modal
modal token new # Opens browser for authenticationpython scripts/populate_pinecone.pyThis embeds all situations and principles and uploads them to Pinecone (~2 minutes).
modal deploy src/server.pystreamlit run streamlit_app/app.pyOpen http://localhost:8501 in your browser.
liquid-audio-model/
โโโ README.md # This file
โโโ PROJECT_PLAN.md # Master project plan with all phases
โโโ PHASE1_IMPLEMENTATION.md # Phase 1 implementation details
โโโ PHASE2_IMPLEMENTATION.md # Phase 2 implementation details
โโโ PHASE3_IMPLEMENTATION.md # Phase 3 (current focus)
โ
โโโ pyproject.toml # Python dependencies
โโโ .env.example # Environment variables template
โโโ .gitignore # Git ignore rules
โ
โโโ principles.json # 80+ psychology principles
โโโ situations.json # 50+ sales situations
โ
โโโ src/ # Source code
โ โโโ __init__.py
โ โ
โ โโโ # Core Logic
โ โโโ detector.py # Situation detection (semantic + keyword)
โ โโโ selector.py # Principle selection
โ โโโ formatter.py # Coaching output formatting
โ โโโ context.py # Conversation context tracking
โ โโโ principle_scorer.py # Multi-factor scoring
โ โ
โ โโโ # Semantic Matching
โ โโโ embeddings.py # BGE-small-en-v1.5 embeddings
โ โโโ pinecone_client.py # Pinecone vector operations
โ โ
โ โโโ # Audio Processing
โ โโโ audio_recorder.py # Microphone recording
โ โโโ audio_player.py # Audio playback
โ โโโ file_manager.py # Modal volume operations
โ โ
โ โโโ # Infrastructure
โ โโโ modal_app.py # Modal configuration
โ โโโ server.py # Modal server (GPU)
โ โโโ client.py # CLI client (optional)
โ
โโโ streamlit_app/ # Web UI
โ โโโ app.py # Main Streamlit app
โ โโโ components/
โ โโโ debug_panel.py # Debug visualization
โ
โโโ scripts/
โโโ populate_pinecone.py # Pinecone index population
Phase I: Simple keyword matching against situations.json
Phase II: Semantic similarity search using Pinecone:
- Embed customer transcript with BGE-small-en-v1.5
- Query Pinecone
situationsnamespace - Return top matching situations with confidence scores
from src.detector import detect_situation_semantic
situation = detect_situation_semantic(
transcript="That's too expensive, I saw it cheaper on Amazon",
pinecone_client=pc_client,
embedding_model=embed_model
)
# Returns: DetectedSituation with situation_id, confidence_score, etc.Phase I: First-match selection from applicable principles
Phase II: Multi-factor scoring:
- Semantic Relevance (40%): Cosine similarity to transcript
- Recency Penalty (30%): Avoids repeating recently used principles
- Stage Fit (20%): Bonus for principles matching current sales stage
- Random Variation (10%): Prevents deterministic selection
from src.selector import select_principle_semantic
principle = select_principle_semantic(
situation=situation,
context=conversation_context,
pinecone_client=pc_client,
embedding_model=embed_model,
principles_dict=principles
)
# Returns: SelectedPrinciple with selection_score breakdownUses LFM2.5-Audio with principle details in system prompt:
system_prompt = f"""
You are a helpful sales assistant. Respond using:
PRINCIPLE: {principle.name}
DEFINITION: {principle.definition}
APPROACH: {principle.intervention}
EXAMPLE: {principle.example_response}
Respond naturally and conversationally (2-3 sentences).
Respond with interleaved text and audio.
"""| Phase | Status | Key Features | Time to Coaching |
|---|---|---|---|
| Phase I | โ Complete | Keyword detection, first-match selection, CLI | ~6s |
| Phase II | โ Complete | Semantic detection, multi-factor scoring, Streamlit UI | ~6s |
| Phase III | ๐ In Progress | Real-time tips (~1.3s), deep context, local Whisper | ~1.3s (goal) |
See PROJECT_PLAN.md for detailed phase breakdown.
- Start the app:
streamlit run streamlit_app/app.py - Record audio: Click microphone button and speak
- Or upload file: Use file uploader for pre-recorded audio
- View coaching output with principle explanation
- Listen to voice response
- Check debug panel for detection scores
modal run src/client.pyInteractive conversation loop:
- Records from microphone
- Uploads to Modal
- Displays coaching YAML
- Plays audio response
Warm Pool Configuration (in src/server.py):
@app.cls(
image=image,
gpu="L40S",
min_containers=1, # Keep 1 container warm
buffer_containers=1, # Extra buffer when active
scaledown_window=300, # 5 min idle before scale down
)Cost: ~$1.50-2.00/hour for warm L40S container
Adjust in src/principle_scorer.py:
WEIGHTS = {
"semantic": 0.4, # Cosine similarity
"recency": 0.3, # Negative weight for recent use
"stage": 0.2, # Bonus for stage match
"random": 0.1 # Variation factor
}Index Configuration:
- Dimension: 384 (BGE-small-en-v1.5)
- Namespaces:
situations,principles - Metric: Cosine similarity
- macOS: System Preferences > Security & Privacy > Privacy > Microphone
- Grant access to Terminal/VS Code/Python
modal token new # Re-authenticate- Accept model terms: LFM2.5-Audio-1.5B
- Verify token has "Read" access
- Recreate Modal secret:
modal secret create huggingface-secret HF_TOKEN=hf_new_token
python scripts/populate_pinecone.py # Re-populate index- Check audio quality
- Ensure microphone is working
- Try speaking louder or closer to mic
- This is normal - model loads on first request (~15-30s)
- Subsequent requests use warm pool and are faster (~3-6s)
~80+ behavioral psychology principles from:
- Cialdini's "Influence: The Psychology of Persuasion"
- Voss's "Never Split the Difference"
- Kahneman's "Thinking, Fast and Slow"
Each principle includes:
- Definition and mechanism
- Intervention strategy
- Example response
- Source citation (book, chapter, page)
~50+ sales situations with:
- Signals (what customer says)
- Contra-signals (opposite indicators)
- Applicable principles
- Typical sales stage
- Priority level
Examples:
price_shock_in_storeonline_price_checkingjust_browsingneed_to_check_with_familyfear_of_wrong_choice
- Real-time coaching tips (~1.3s)
- Quick tip lookup from situations
- Server-Sent Events (SSE) streaming
- Deep context tracking
- Customer profile extraction
- Stage progression detection
- Local Whisper integration (~0.5s transcription)
- Voice tone analysis (frustration, excitement)
- Streaming audio playback
- A/B testing different principles
- Outcome tracking (did tip help close?)
- Team analytics dashboard
- Multi-language support
This is a research project. Contributions welcome! Areas of interest:
- New Principles: Add psychology principles from additional sources
- New Situations: Expand situation detection coverage
- Better Scoring: Improve principle selection algorithms
- Performance: Optimize for faster response times
- UI/UX: Enhance Streamlit interface
MIT License - see LICENSE file for details
- Liquid AI for LFM2.5-Audio model
- Modal for serverless GPU infrastructure
- Pinecone for vector database
- HuggingFace for model hosting and sentence transformers
- Issues: Open an issue on GitHub
- Documentation: See
PROJECT_PLAN.mdfor detailed architecture - Phase Details: Check
PHASE1_IMPLEMENTATION.mdandPHASE2_IMPLEMENTATION.md
Built with โค๏ธ using behavioral psychology and AI