Download YouTube closed captions (CC) as SRT, VTT, or plain text. Includes speaker IDs, music notations, and sound effects when present in the source caption track.
Real transcript + AI summary, ready in seconds.
“So today I want to talk about the three biggest mistakes people make when trying to grow on TikTok. And I see this constantly — creators spending hours on production value when what actually drives growth is the hook. The first fifteen seconds. That’s it.”
“If you don’t have them in the first fifteen seconds, they’re gone. So let me walk you through exactly what I changed — and how it took my average view duration from twenty-two percent all the way up to sixty-eight...”
Captions and subtitles look identical in a YouTube player but they're editorially different products. Subtitles translate the dialogue from the video's source language into another language for hearing viewers — they don't include sound effects or speaker IDs because the viewer can hear those. Closed captions describe everything audible in the video — dialogue plus speaker identification ('[JOHN:]'), sound effects ('[door slams]'), and music notations ('[ominous music]') — for deaf and hard-of-hearing viewers. YouTube exposes both: creator-uploaded caption tracks (often closed-caption-formatted with sound effects and speaker IDs included) and auto-generated subtitle tracks (dialogue only). When you use the YouTube caption downloader, the tool fetches whichever caption track is available — preferring the creator-uploaded version when present because it's editorially richer. The output preserves whatever sound effect and speaker ID notations exist in the source. SRT (.srt) is the universal format for video editors and most platforms; WebVTT (.vtt) is required for HTML5 web video; plain text (.txt) strips timestamps for reading. All three formats support inline accessibility notations like '[laughter]' and 'JOHN:' — the SRT and VTT outputs preserve them in their original positions.
The tools are largely the same; the editorial output differs based on what's in the source track. Captions include speaker IDs and sound effect descriptions ([music], [JOHN:]). Subtitles only include the dialogue. YouTube serves both depending on what the creator uploaded. This caption downloader prefers the creator's CC track when available; subtitle downloaders typically prefer the auto-generated track.
Yes — when the source caption track includes them. Creator-uploaded closed-caption tracks often include sound effects in brackets ([laughter], [door slams]) and speaker IDs in caps with colons. The tool preserves these in the SRT and VTT downloads. Auto-generated subtitle tracks typically don't include sound effects, so those won't appear in the output.
Creator-uploaded captions are typically suitable for ADA / WCAG compliance because they're human-reviewed. Auto-generated captions reach about 90-95% accuracy on clear English speech but typically miss proper nouns, technical terms, and sound effects. For formal compliance, download the creator's captions if available; if not, run the auto-captions through a human review pass before publishing.
Yes. SRT and VTT files are accepted documentation formats for ADA Title III, Section 508, and WCAG 2.1 audits. The downloaded file is a plain-text record of the captions present at the time of download.
Live captions on currently-streaming videos can't be downloaded mid-stream. After the live stream ends and the recording is processed (typically 10-60 minutes), captions become available and can be downloaded normally.
The video probably uses YouTube's auto-generated captions (which don't include speaker IDs) rather than creator-uploaded closed captions. To get speaker IDs, the original video must have been captioned by the creator with speaker labels included. Most amateur YouTube content has auto-only captions; professional broadcast and educational content typically has creator captions.
Closely related but optimized for caption-style content. The underlying URL→file flow is identical. Use the caption downloader when you specifically want sound effects, speaker IDs, and accessibility metadata; use the subtitle downloader for dialogue-only translation work.
When YouTube has multiple caption tracks for a video, the tool fetches the primary track (usually the original language). For other languages, take the SRT and translate via DeepL, Google Translate, or ChatGPT — then format as a new SRT with the same timestamps.
Ready to get started?
Free. No login. Results in seconds.