Lydia · ElevenLabs Voice Architecture

Your approach Recommended approach Advisor speaks Microphone captures audio STT transcription Separate service (e.g. Whisper) Lydia (LLM) Claude generates response ElevenLabs TTS API Text → voice (separate call) Audio returned Advisor hears response 3 separate services to manage STT seam LLM seam TTS seam Advisor speaks Microphone captures audio ElevenLabs conversational AI Scribe v2 (STT) ~150ms transcription Lydia (LLM) Your LLM, plugged in Flash v2.5 (TTS) ~75ms voice generation Audio returned Advisor hears response 1 platform, native session mgmt Analytics + monitoring included Total latency: ~700–900ms Error handling: 3 failure points Turn detection: manual Total latency: ~500ms (optimal) Error handling: 1 platform Turn detection: built-in In both approaches, Lydia's intelligence (prompts, knowledge base, Claude) stays entirely yours. ElevenLabs only handles the voice loop.