Call Transcription
The dual-path transcription pipeline: CallRail native transcripts (fast path) and Deepgram Nova-2 (fallback) with speaker diarization and voicemail detection.
TL;DR: Despora uses CallRail transcripts when available (fastest path). When not, it downloads the recording and sends it to Deepgram Nova-2 for transcription with automatic speaker diarization. Voicemails with no caller message are detected and skipped — no credit consumed.
Dual-Path Transcription Strategy
Path 1: CallRail Native Transcripts (Fast)
If CallRail has Conversation Intelligence enabled, it provides its own transcript. Despora uses this directly — no additional API call or processing needed. This is the fastest path.
Path 2: Deepgram Nova-2 (Fallback)
When no CallRail transcript is available, Despora downloads the call recording from CallRail and sends the audio to Deepgram's Nova-2 model. Deepgram returns a transcript with automatic speaker diarization — separating “Agent” from “Customer” speech.
Speaker Diarization
Diarization is critical for accurate AI analysis. Despora uses Gemini to refine speaker labels when Deepgram's automatic diarization is ambiguous. The AI identifies which speaker is the business representative and which is the caller, enabling accurate agent performance scoring.
Voicemail Detection
Before sending any transcript to the AI for scoring, Despora runs voicemail detection:
- If the audio is only an answering machine greeting with no caller message → flagged as voicemail
- If Deepgram detects machine-only speech but CallRail has a caller transcript → uses the CallRail transcript
- Voicemails with no message = no credit consumed
Transcript Accuracy
Deepgram Nova-2 achieves industry-leading word error rates across accents and audio quality levels. For phone-quality audio (8kHz), typical accuracy ranges from 85-95% depending on background noise and speaker clarity. CallRail's native transcripts have similar accuracy.
