🤖 Auto Vietnamese Captions — AI Whisper + Claude for Video
Auto-generate SRT/VTT captions for Vietnamese videos with Whisper large-v3 + Claude Haiku fixing diacritics, names, slang. Standard SRT for TikTok/YouTube. Free, no signup.
Drag & drop file here or click to choose
Max 500MB. Supports MP4, MOV, WebM, AVI, MKV · Tối đa 60 phút
Whisper self-host (CPU) — video 10 phút mất ~5-10 phút. Sau đó Claude tự fix dấu + tên riêng tiếng Việt.
Why use this tool
Whisper transcribes → Claude Haiku fixes diacritic errors, proper names (Hà Nội, Sài Gòn), Gen-Z slang → clean SRT.
Unlike Submagic paywall, this tool is free using self-hosted Whisper on server CPU.
SRT for TikTok/YouTube, VTT for HTML5 web, TXT plain for copy-paste.
How to use
- 1Upload Vietnamese video (max 60 min).
- 2Wait for Whisper transcribe (5-10 min for 10-min video).
- 3Claude auto-fixes diacritics + names.
- 4Download SRT/VTT/TXT.
AI Vietnamese captions — how
Tool uses Whisper large-v3 self-hosted on homeserver — OpenAI open-source multi-lingual model. For Vietnamese: ~85-90% accuracy (clear speech), drops to ~70% on heavy regional accents or loud background music.
After Whisper, the tool sends raw SRT through Claude Haiku 4.5 with prompt: 'Fix Vietnamese tone marks, proper names (Hà Nội instead of ha noi), Gen-Z slang (chill/flex/sus), preserve timestamps'. Output is a ready-to-use clean SRT.
Caveat: Whisper on CPU is slow ~5x realtime. A 10-min video takes ~5-10 min to process. Progress bar shown.
- ✓Whisper large-v3 self-host
- ✓Claude Haiku fixes VN
- ✓SRT/VTT/TXT export
- ✓Auto-delete files after 60 min
- ✓Free no signup
- ✓Max 60 min video
FAQ
Is Whisper 100% accurate?
No, ~85-90% for clear Vietnamese. Always proofread for professional video work.
Does English video work too?
Yes. Whisper is multi-lingual. Tool auto-detects language. English accuracy higher ~92-95%.
Why 5-10 min wait?
Whisper self-host runs on CPU (no GPU). For faster, need a paid API — Phase 2 will offer.