faster-whisper · WhisperX · pyannote — open-source under the hood
Drop in audio. Get a transcript with speakers.
One API call. Word-level timestamps. SRT and VTT included. Half the price of OpenAI Whisper.
One call. Everything you need.
Multipart upload or file URL. Stream back transcript, segments, words, and subtitles.
Request
curl -X POST https://api.whisperly.dev/v1/transcribe \
-H "Authorization: Bearer $WHISPERLY_KEY" \
-F "file=@meeting.mp3" \
-F "diarization=true" \
-F "word_timestamps=true"Response
{
"id": "req_01HX...",
"status": "succeeded",
"data": {
"text": "Hello and welcome to the call.",
"segments": [
{ "start": 0.0, "end": 2.4, "text": "Hello and welcome.", "speaker": "SPEAKER_00" }
],
"subtitles": {
"srt": "1\n00:00:00,000 --> 00:00:02,400\n[SPEAKER_00] Hello and welcome.\n",
"vtt": "WEBVTT\n\n00:00:00.000 --> 00:00:02.400\n<v SPEAKER_00>Hello and welcome.\n"
},
"metadata": {
"duration_seconds": 1842.0,
"language_detected": "en",
"speakers_detected": 3,
"model_used": "small"
}
},
"usage": { "units": 30.7, "unit_type": "minute" }
}Better than OpenAI's. Cheaper than AssemblyAI.
Numbers from public pricing pages, May 2026.
| OpenAI Whisper | AssemblyAI | Deepgram | Whisperly | |
|---|---|---|---|---|
| Per-min price | $0.006 | $0.012 | $0.012–$0.04 | $0.005 |
| Diarization | — | yes | yes | yes |
| Word timestamps | flaky | yes | yes | yes |
| SRT/VTT export | — | partial | partial | yes |
| Self-serve checkout | yes | yes | enterprise only | yes |
Simple, usage-based pricing.
Start free. Upgrade when you need more.
Translation, summaries, real-time.
Phase 2 is shipping translation and summarization endpoints. Drop your email and we'll tell you the day they go live — nothing else.