What Is Verbatim Transcription?
Verbatim transcription captures every spoken word and sound exactly as it occurs — including "um," "uh," stutters, false starts, laughter, coughs, and pauses. It's the most accurate possible written record of what was said. Used in court reporting, qualitative research, market research, medical transcription, and any context where exactly how something was said matters as much as what was said.
Try AI transcription on any video URL →Definition and the three transcription styles
Verbatim transcription is one of three standard transcription styles, distinguished by how much of the original speech the transcript preserves. Strict verbatim (also called "true verbatim") preserves every utterance: all filler words, false starts, repetitions, and non-speech sounds like laughter or background noise. Intelligent verbatim (sometimes called "clean verbatim") removes filler words and false starts but preserves the speaker's vocabulary, sentence structure, and meaning. Clean read (or "edited transcription") goes further: it removes fillers, fixes grammar, and may even paraphrase for clarity. The three styles serve different purposes. Court reporters and qualitative researchers need strict verbatim because every utterance carries meaning. Marketers and journalists usually want intelligent verbatim — readable transcripts that still feel like the speaker. Authors and content creators repurposing audio into text usually want clean read — output that reads like written prose. The same audio yields three different transcripts depending on which style is requested. Modern AI transcription tools default to intelligent verbatim because it's more readable; getting strict verbatim usually requires explicitly asking the tool to preserve fillers.
Side-by-side: the three styles transcribing the same audio
To make the differences concrete, here is the same 12-second audio clip rendered in all three styles:
Original audio: A speaker pauses, restarts, and answers a question.
Strict verbatim
Interviewer: So, um, can you tell me about, uh, the project? Subject: Yeah, yeah, sure. So I, I started — well, we started, um — [coughs] excuse me — we started the project back in, um, January? Yeah, January. And it's been, you know, it's been a journey. [laughs] Yeah.
Intelligent verbatim
Interviewer: Can you tell me about the project? Subject: Sure. We started the project back in January. It's been a journey.
Clean read
Interviewer: Tell me about the project. Subject: We launched the project in January, and it has been an interesting journey since then.
All three are accurate to what was said — they just preserve different amounts of how it was said. Choose the style based on what you'll do with the transcript.
When to use strict verbatim
Strict verbatim is required (or strongly preferred) in these contexts:
- Court reporting and legal proceedings. Depositions, trial transcripts, and other legal records must capture exactly what was said. Filler words, hesitations, and tone of voice may matter to a case. Strict verbatim is the legal standard.
- Qualitative research (PhDs, sociology, ethnography). When studying how people think and speak, the filler words, hesitations, and self-corrections are data. Researchers analysing speech patterns need every utterance.
- Medical transcription. Specifically, mental health and behavioural assessments where the manner of speech is part of the diagnosis.
- Linguistics and conversation analysis. Studying turn-taking, interruptions, repair sequences, and other conversational structures.
- Insurance investigations. Where exactly how a claimant phrased something can matter.
- Focus group research. Where the dynamics of how people respond — including filler and hesitation — are part of the analysis.
- Forensic linguistics. Authorship attribution, threat analysis, and stylometric work all depend on capturing every speech feature.
If you're writing a transcript for any of these purposes, request strict verbatim explicitly when ordering from a service or when configuring an AI transcription tool. Most AI tools default to intelligent verbatim and clean filler words automatically — you have to opt out.
When to use intelligent verbatim
Intelligent verbatim is the right choice for most general business and content use cases:
- Journalism interviews. Reporters need accurate quotes that preserve the speaker's voice without making them look inarticulate by including every "um."
- Podcast show notes. Capture what was said in readable form without forcing readers to wade through hesitations.
- Customer feedback analysis. Understand what customers said in research interviews without getting distracted by speech artifacts.
- Sales call review. Reps want to study what they said and how prospects responded — readability matters more than capturing every "right" and "yeah."
- Conference talks and keynotes. Distributing a written version of a talk for accessibility or content reuse.
- Internal meeting notes. Capturing decisions and discussion without the noise of speech disfluencies.
- Educational content. Lectures and tutorials transcribed for student reference.
This is the default style produced by most AI transcription services in 2026. TranscribeVideo.ai, Otter, Rev's auto-transcription, and Whisper all output intelligent verbatim by default.
When to use clean read
Clean read transcription is appropriate when the final output will be read as prose, not as a record of speech:
- Repurposing audio into blog posts. Convert a podcast episode into a written article — clean read removes spoken-language patterns that don't read well.
- Book content from interviews. Authors who interview subjects often want clean-read transcripts they can edit further into prose.
- Marketing case studies. Customer interview content turned into branded marketing material reads more polished in clean read style.
- Email newsletter content from podcasts. The conversational style of audio doesn't translate to skimmable email writing.
- Academic papers citing interviews. When the meaning matters more than the exact phrasing, clean read makes citations more readable.
Clean read is rarely the output of an AI transcription tool directly — it usually requires a second pass through an editor (human or AI) that paraphrases and restructures. Tools like Descript and Otter offer "clean up" features that approximate clean read.
How to get strict verbatim from AI tools
By default, modern AI transcription tools clean up speech disfluencies. To get strict verbatim, you have to request it explicitly:
Whisper (OpenAI)
Pass the prompt parameter explicitly: --initial_prompt "Include all filler words like um, uh, and false starts. Do not clean up speech." in the CLI, or include this instruction in the prompt parameter when calling the API. Whisper's default output already preserves more disfluencies than other models, but the prompt forces full strict verbatim.
Otter.ai
In Otter, the default cleans up filler words. There is no native "verbatim mode" — you'd need to use the raw transcript before Otter's smart-summary processing, which is exposed via the API but not via the standard UI export.
Rev.com
Rev's human transcription service offers an explicit verbatim option as a paid add-on. AI Rev (auto-transcription) does not currently have a verbatim toggle.
Trint
Offers a "show fillers" toggle in the editor. Turn it on after upload to see filler words; turn it off to get the cleaned version.
The honest tradeoff
AI transcription is significantly less accurate at strict verbatim than at intelligent verbatim. The models were trained on cleaned-up text, so reproducing exact disfluencies requires breaking the model's natural inclination. For high-stakes strict verbatim work (legal, research), human transcription is still the standard. AI is reliable for intelligent verbatim and adequate for clean read.
How to time and format a verbatim transcript
Beyond the words, verbatim transcripts have formatting conventions:
- Speaker IDs in caps with colon. JOHN: or INTERVIEWER:. Some research conventions use [P1], [P2] for participants or pseudonyms.
- Bracketed non-speech sounds. [laughter], [cough], [pause - 3 sec], [phone ringing], [overlapping speech].
- Filler words written out. "um," "uh," "ah," "er," "mm-hmm," "uh-huh." Convention varies on apostrophes.
- False starts marked with em-dash. "I was going to — well, actually, let me start over."
- Pauses marked. Short pauses with ellipsis or (...). Long pauses with a duration: [pause - 5 sec].
- Inaudible markings. [inaudible] or [inaudible 00:23:14] when a section can't be transcribed.
- Overlapping speech. [crosstalk] or square brackets around the overlapping sections to show simultaneity.
- Timestamps. Inserted at speaker changes or at fixed intervals (every 30 seconds, every 2 minutes) for navigation. Format: [00:14:25].
For research transcripts especially, follow the conventions used by your field's standards. Sociolinguistics has different conventions from clinical interviewing.
Verbatim vs intelligent verbatim vs clean read — quick reference
| Feature | Strict Verbatim | Intelligent Verbatim | Clean Read |
|---|---|---|---|
| Filler words (um, uh) | Included | Removed | Removed |
| False starts | Included | Removed | Removed |
| Repetitions | Included | Removed | Removed |
| Non-speech sounds | Bracketed | Sometimes bracketed | Removed |
| Grammar correction | No | No | Yes |
| Paraphrasing | No | No | Yes |
| Speaker's vocabulary preserved | Yes (exact) | Yes (exact) | Sometimes |
| Time to produce (relative) | 1.5× | 1.0× (baseline) | 1.3× |
| Cost via human service | $2-4 per audio min | $1-2 per audio min | $3-5 per audio min |
| AI default | Rare | Default | Requires post-processing |
| Best for | Legal, research, linguistics | Journalism, business, podcasts | Repurposing into prose |
Common questions about verbatim transcription
Is verbatim transcription required by law?
For court reporting and certified legal transcripts: yes, strict verbatim is required. For accessibility-compliant captioning: no — captions are typically intelligent verbatim plus sound effect descriptions, which is what makes them captions rather than subtitles. For research subject to IRB approval: it depends on the methodology, but most qualitative research IRBs require strict verbatim of recorded interviews.
How accurate is AI verbatim transcription?
For intelligent verbatim, modern AI achieves 95%+ word accuracy on clear single-speaker audio. For strict verbatim — preserving every filler and disfluency — accuracy drops because models tend to clean as they go. Whisper is the most reliable AI for strict verbatim; even so, expect 80-90% accuracy compared to human transcription.
How long does verbatim transcription take?
Human strict verbatim: roughly 4-6 hours per hour of audio for an experienced transcriptionist. Human intelligent verbatim: 3-4 hours per hour of audio. AI: minutes per hour of audio, regardless of style. The choice between human and AI usually comes down to whether the use case can tolerate AI-level accuracy or requires the higher human standard.
Can I convert AI intelligent verbatim back to strict verbatim?
Not reliably. AI cleaning is lossy — once "um" and "uh" are removed, they can't be reconstructed without re-running the transcription on the original audio with explicit instructions to preserve them. If you anticipate needing strict verbatim, configure for it from the start.
Feature Comparison
| Feature | Strict Verbatim | Intelligent Verbatim | Clean Read |
|---|---|---|---|
| Includes fillers (um, uh) | Yes | No | No |
| Includes false starts | Yes | No | No |
| Grammar corrected | No | No | Yes |
| Speaker vocabulary preserved | Exact | Exact | Approximate |
| Reading flow | Choppy | Natural | Polished |
| AI default | No | Yes | No (post-processing) |
| Best for | Legal, research | Business, journalism | Content repurposing |
How It Works
- 1.Identify your use case — legal/research/linguistics → strict verbatim. Business/journalism/podcasts → intelligent verbatim. Content repurposing → clean read.
- 2.Pick the right tool. AI tools default to intelligent verbatim. For strict verbatim, use Whisper with an explicit prompt or human transcription services. For clean read, use AI followed by a manual or AI editing pass.
- 3.Configure the tool — Whisper accepts an initial_prompt parameter; Trint has a fillers toggle; Otter doesn't natively support strict verbatim.
- 4.Review the output — even AI intelligent verbatim usually needs human review for proper nouns and technical terms. Strict verbatim from AI almost always needs human refinement.
- 5.Format using your field's conventions — speaker IDs, bracketed non-speech sounds, em-dashes for false starts, timestamps for navigation.
Why Use This Tool?
- ✓Required by law for court reporting and certified legal transcripts
- ✓Required by most qualitative research methodologies (PhD, sociology, ethnography)
- ✓Captures speaker hesitation and self-correction as data, not noise
- ✓Preserves cultural and demographic speech patterns for linguistic analysis
- ✓Maintains the original meaning when how something was said matters
- ✓Provides admissible record in legal and forensic contexts
Use Cases
- —Transcribing depositions, trials, and other legal proceedings to certified standards
- —Qualitative interviews for PhD theses and academic research papers
- —Focus group transcripts for market research firms
- —Forensic linguistic analysis for authorship attribution or threat assessment
- —Behavioral health and psychiatric assessments where speech manner is diagnostic
- —Conversation analysis research in linguistics and sociology
Frequently Asked Questions
What is verbatim transcription?
Verbatim transcription captures every spoken word and sound exactly as it occurred — including filler words (um, uh), false starts, repetitions, and non-speech sounds like laughter or coughing. It's the most accurate written record possible. Used in legal, research, and forensic contexts where every utterance carries meaning.
What's the difference between verbatim and intelligent verbatim?
Strict verbatim preserves every utterance including fillers and false starts. Intelligent verbatim (also called clean verbatim) removes fillers, false starts, and repetitions but preserves the speaker's actual vocabulary and sentence structure. Intelligent verbatim is more readable; strict verbatim is more accurate to the original speech.
What's the difference between verbatim and clean read?
Verbatim transcription preserves what was said exactly (or with light cleaning for intelligent verbatim). Clean read goes further: it corrects grammar, may paraphrase for clarity, and produces text that reads like written prose rather than transcribed speech. Clean read is the right choice for content repurposing; verbatim is right for legal and research contexts.
Is AI transcription verbatim by default?
No. Most AI transcription tools default to intelligent verbatim — they remove filler words, false starts, and repetitions automatically. To get strict verbatim from AI, you usually have to configure the tool explicitly (for example, Whisper accepts an initial_prompt instruction to preserve all disfluencies).
When should I use verbatim transcription?
Use strict verbatim for legal proceedings, qualitative research, linguistics, forensic analysis, and any context where exactly how something was said matters. Use intelligent verbatim for journalism, business meetings, podcast show notes, and most general transcription. Use clean read when repurposing audio into written prose.
How accurate is verbatim transcription?
Human strict verbatim achieves 99%+ accuracy. AI strict verbatim typically achieves 80-90% — the AI tends to clean up filler words even when instructed not to. AI intelligent verbatim achieves 95%+ on clear single-speaker audio. Multi-speaker, accented, or technical audio reduces all accuracy figures.
How much does verbatim transcription cost?
Human strict verbatim: $2-4 per audio minute via professional services. Human intelligent verbatim: $1-2 per audio minute. AI transcription: free to under $1 per audio minute regardless of style. The choice usually comes down to whether your use case can tolerate AI-level accuracy.
Related Tools
Related Pages
Ready to get started?
Try AI transcription on any video URL →