Speech to Text API

Accurate transcription in 22+ Indian languages via a single API endpoint.

Overview

CallMissed's Speech to Text API converts audio recordings and real-time audio streams into accurate text transcriptions. Powered by Sarvam AI's saaras:v3 model — the most accurate automatic speech recognition (ASR) system purpose-built for Indian languages — our API handles the linguistic diversity, accents, and code-switching patterns that global STT providers struggle with.

The API is fully OpenAI-compatible. If you've used OpenAI's Whisper API, you can switch to CallMissed by changing two lines of code: the base URL and the API key. Everything else — the endpoint path, request format, and response schema — is identical.

Supported Languages

Our STT API supports 22+ languages with native ASR models (not translation-based):

Major Indian languages: Hindi (hi-IN), Tamil (ta-IN), Telugu (te-IN), Bengali (bn-IN), Marathi (mr-IN), Kannada (kn-IN), Malayalam (ml-IN), Gujarati (gu-IN), Punjabi (pa-IN)
Additional languages: Odia, Assamese, Urdu, Sindhi, Konkani, Dogri, Maithili, Bodo, Santhali, Kashmiri, Nepali, Manipuri, Sanskrit
Auto-detection: Set language to unknown and the model automatically identifies the spoken language
English: Full support for Indian English (en-IN) accents and code-mixed English-Hindi speech

Output Modes

Unlike other STT APIs that only offer transcription, CallMissed provides 5 distinct output modes:

Transcribe: Standard transcription in the original language script (Devanagari for Hindi, Tamil script for Tamil, etc.)
Translate: Transcribe and translate to English in a single API call
Verbatim: Word-for-word transcription preserving fillers, hesitations, and repetitions — useful for legal, medical, and compliance use cases
Transliterate: Convert spoken language to Roman script (e.g., spoken Hindi → "Namaste, kaise hain aap?")
Code-mix: Handle mixed-language speech (Hinglish, Tanglish, etc.) with accurate transcription of both languages

API Reference

Endpoint

POST https://api.callmissed.com/v1/audio/transcriptions

Authentication

Bearer token: Authorization: Bearer cm_your_api_key

Request (multipart/form-data)

file (required) — Audio file (WAV, MP3, AAC, OGG, FLAC, WebM)
model — saaras:v3 (default)
language — BCP-47 code (e.g., hi-IN) or unknown for auto-detection
mode — transcribe | translate | verbatim | translit | codemix
response_format — json | text | verbose_json

Response

OpenAI-compatible JSON: {"text": "transcribed content here"}

Use Cases

Call Center Analytics

Transcribe recorded customer support calls for quality assurance, sentiment analysis, and agent training. Our batch STT with diarization separates speakers automatically, so you know exactly what the customer said vs. what the agent said. Process hours of audio in minutes.

Voice Search and Commands

Add voice input to your mobile app or website. Users speak in their native language and your app receives accurate text for search, form filling, or command processing. Auto language detection means users don't need to select their language first.

Meeting Transcription

Transcribe business meetings, webinars, and conference calls. Multi-language support handles meetings where participants switch between English and regional languages. Verbatim mode captures every word for legal and compliance documentation.

Media and Content

Generate subtitles and captions for video content in Indian languages. Our translate mode produces English subtitles from Hindi/Tamil/Telugu audio in a single API call — no separate translation step needed.

Healthcare Documentation

Transcribe doctor-patient conversations for electronic health records. Verbatim mode ensures clinical accuracy, and our code-mix handling captures the natural way Indian doctors mix medical English with regional language explanations.

Pricing

Sarvam saaras:v3: $0.35 per hour of audio
With speaker diarization: $0.53 per hour of audio
Free tier: 50 STT calls per month

Every new account gets $5 in free credits. Start transcribing now.