Speech to Text API
Accurate transcription in 22+ Indian languages via a single API endpoint.
Overview
CallMissed's Speech to Text API converts audio recordings and real-time audio streams into accurate text transcriptions. Powered by Sarvam AI's saaras:v3 model — the most accurate automatic speech recognition (ASR) system purpose-built for Indian languages — our API handles the linguistic diversity, accents, and code-switching patterns that global STT providers struggle with.
The API is fully OpenAI-compatible. If you've used OpenAI's Whisper API, you can switch to CallMissed by changing two lines of code: the base URL and the API key. Everything else — the endpoint path, request format, and response schema — is identical.
Supported Languages
Our STT API supports 22+ languages with native ASR models (not translation-based):
- Major Indian languages: Hindi (hi-IN), Tamil (ta-IN), Telugu (te-IN), Bengali (bn-IN), Marathi (mr-IN), Kannada (kn-IN), Malayalam (ml-IN), Gujarati (gu-IN), Punjabi (pa-IN)
- Additional languages: Odia, Assamese, Urdu, Sindhi, Konkani, Dogri, Maithili, Bodo, Santhali, Kashmiri, Nepali, Manipuri, Sanskrit
- Auto-detection: Set language to
unknownand the model automatically identifies the spoken language - English: Full support for Indian English (en-IN) accents and code-mixed English-Hindi speech
Output Modes
Unlike other STT APIs that only offer transcription, CallMissed provides 5 distinct output modes:
- Transcribe: Standard transcription in the original language script (Devanagari for Hindi, Tamil script for Tamil, etc.)
- Translate: Transcribe and translate to English in a single API call
- Verbatim: Word-for-word transcription preserving fillers, hesitations, and repetitions — useful for legal, medical, and compliance use cases
- Transliterate: Convert spoken language to Roman script (e.g., spoken Hindi → "Namaste, kaise hain aap?")
- Code-mix: Handle mixed-language speech (Hinglish, Tanglish, etc.) with accurate transcription of both languages
API Reference
Endpoint
POST https://api.callmissed.com/v1/audio/transcriptions
Authentication
Bearer token: Authorization: Bearer cm_your_api_key
Request (multipart/form-data)
file(required) — Audio file (WAV, MP3, AAC, OGG, FLAC, WebM)model—saaras:v3(default)language— BCP-47 code (e.g.,hi-IN) orunknownfor auto-detectionmode—transcribe|translate|verbatim|translit|codemixresponse_format—json|text|verbose_json
Response
OpenAI-compatible JSON: {"text": "transcribed content here"}
Use Cases
Call Center Analytics
Transcribe recorded customer support calls for quality assurance, sentiment analysis, and agent training. Our batch STT with diarization separates speakers automatically, so you know exactly what the customer said vs. what the agent said. Process hours of audio in minutes.
Voice Search and Commands
Add voice input to your mobile app or website. Users speak in their native language and your app receives accurate text for search, form filling, or command processing. Auto language detection means users don't need to select their language first.
Meeting Transcription
Transcribe business meetings, webinars, and conference calls. Multi-language support handles meetings where participants switch between English and regional languages. Verbatim mode captures every word for legal and compliance documentation.
Media and Content
Generate subtitles and captions for video content in Indian languages. Our translate mode produces English subtitles from Hindi/Tamil/Telugu audio in a single API call — no separate translation step needed.
Healthcare Documentation
Transcribe doctor-patient conversations for electronic health records. Verbatim mode ensures clinical accuracy, and our code-mix handling captures the natural way Indian doctors mix medical English with regional language explanations.
Pricing
- Sarvam saaras:v3: $0.35 per hour of audio
- With speaker diarization: $0.53 per hour of audio
- Free tier: 50 STT calls per month
Every new account gets $5 in free credits. Start transcribing now.