Overview of the common subtitle formats
This article analyzes three widely used plain text subtitle formats used in production, web delivery, archives, and conversion pipelines: SRT (SubRip), WebVTT (VTT), and SBV (SubViewer / YouTube SBV). It covers formal syntax, timing rules, parser behavior, de facto conventions, interoperability issues, and practical conversion decisions.
SRT vs WebVTT vs SBV: format overview and decision summary
SRT, WebVTT, and SBV are all text based timed subtitle formats, but they differ in goals and parser expectations. SRT is the de facto universal exchange format for video players and editing tools. WebVTT is the standards based web subtitle format designed for HTML media text tracks and supports richer semantics and cue layout controls. SBV is a lightweight subtitle format strongly associated with YouTube workflows and simple caption exchange.
If your goal is maximum compatibility, SRT is usually the safest export target. If your goal is browser native playback with cue positioning and web track semantics, WebVTT is the correct format. If your source data comes from YouTube subtitle export or legacy simple caption workflows, you may encounter SBV and need conversion.
Quick selection guide for technical teams
- Use SRT for universal interchange, quick editing, and broad software compatibility.
- Use WebVTT for HTML5 video tracks, web caption styling hooks, chapters, and metadata tracks.
- Use SBV only when required by a specific source workflow (most commonly YouTube related export/import scenarios).
Why subtitle format differences matter in real systems
In small projects, subtitle conversion can look like a simple timestamp replacement task. In production systems, the details matter: encoding mismatches break non ASCII characters, parser leniency hides malformed files until a stricter platform rejects them, and timing rounding can introduce overlaps or zero length cues that break importers.
The biggest implementation risks usually come from:
- timestamp punctuation differences (comma vs dot)
- optional vs required headers
- presence or absence of cue numbers
- line break handling and block separation
- encoding assumptions (UTF-8 vs legacy encodings)
- layout and styling features that do not map cleanly between formats
- platform specific parser tolerance that differs from formal syntax
SRT format (SubRip) deep technical analysis
What SRT is and why it remains dominant
SRT (SubRip Subtitle format) is a plain text sidecar subtitle format that originated from the SubRip software ecosystem. It became widely adopted because the structure is simple, human editable, and supported by many players, editors, and platforms. In practice, SRT is often treated as the default subtitle exchange format even though it is not governed by a single modern standards body in the same way as WebVTT.
Canonical SRT cue structure
A typical SRT cue block contains four logical parts:
- Sequential cue number
- Timing line in
HH:MM:SS,mmm --> HH:MM:SS,mmmformat - One or more text lines
- A blank line that terminates the block
1
00:00:01,500 --> 00:00:04,000
Hello, world!
2
00:00:05,000 --> 00:00:08,500
This is a subtitle.
SRT timestamp specifics
The canonical SRT timestamp uses:
- hours, minutes, seconds as zero padded two digit fields
- milliseconds as a three digit field
- a comma as the fractional separator
- an arrow token
-->between start and end times
This comma separator is one of the most important practical differences between SRT and both WebVTT and SBV, which commonly use a dot for fractional seconds.
SRT encoding reality and interoperability risk
SRT is plain text, but the format itself historically does not enforce a single universal encoding. This is one of the most common causes of broken accented characters or mojibake when moving subtitle files between operating systems, editors, and players. Many modern pipelines normalize SRT to UTF-8 for reliability.
Platform rules can be stricter than generic SRT expectations. For example, some platforms explicitly require UTF-8 and ignore basic markup even when certain desktop players may render it.
SRT markup and formatting support in practice
SRT is often described as plain text, but in practice many tools tolerate or render a small subset of HTML like tags such as
<b>, <i>, <u>, and sometimes color via <font>.
This is not a universally consistent behavior across platforms.
Some players render these tags, some strip them, and some display them as literal text.
SRT parser leniency and de facto behavior
Real world SRT files often deviate from canonical formatting and still work because importers are permissive. Common tolerated deviations include:
- missing or non sequential cue numbers
- extra spaces around the arrow token
- dot milliseconds instead of comma milliseconds
- inconsistent blank line usage between cues
- mixed line endings (LF, CRLF)
- legacy encodings instead of UTF-8
A technical pipeline should not assume that files labeled .srt are syntactically clean.
Robust importers should parse loosely, then normalize.
SRT strengths for engineering workflows
- Very easy to generate and debug manually
- Excellent interoperability as an export and archive sidecar format
- Simple parsing model for scripts and batch processing
- Works well for localization handoff when rich styling is not required
SRT limitations for advanced caption workflows
- No standardized native cue positioning model comparable to WebVTT cue settings
- No standard region model
- No standard metadata or chapter cue types
- Encoding ambiguity unless the workflow enforces UTF-8
- Markup behavior varies by player and platform
WebVTT (VTT) format deep technical analysis
What WebVTT is designed to solve
WebVTT (Web Video Text Tracks) is a W3C standardized timed text format created for web media text tracks. It is designed for use with HTML media and supports subtitles, captions, chapters, and metadata style track uses. Compared to SRT, WebVTT keeps the plain text feel but adds a formal syntax, cue settings, and richer cue text semantics.
WebVTT file structure at a high level
A WebVTT file starts with a required header line:
WEBVTT
and then contains a sequence of blocks such as cues, comments, styles, and region definitions.
WEBVTT
00:01.000 --> 00:04.000
Never drink liquid nitrogen.
WebVTT cue structure
A WebVTT cue block can contain:
- Optional cue identifier
- Timing line with start and end timestamps
- Optional cue settings on the timing line
- Cue payload text
- Blank line terminator
intro-1
00:00:22.230 --> 00:00:24.606 align:start line:90%
Hello from a WebVTT cue.
WebVTT timestamp syntax and strictness
WebVTT timestamps use a dot as the fractional separator and support the form
[hh:]mm:ss.mmm.
Hours can be omitted when zero in many valid WebVTT cues.
This is a key difference from SRT, where the canonical form is always hour based with comma fractions.
For conversion code, WebVTT timestamp parsing should be more strict than SRT if you want standards compliance, but many ingest systems still accept slightly malformed VTT in practice.
Cue settings in WebVTT
One of the biggest practical advantages of WebVTT is cue settings, which let authors control cue placement and orientation. Common settings include:
verticallinepositionsizealign
These settings appear on the same line as the cue timing and are separated by spaces. The web platform and player implementation determine the actual rendering behavior.
Cue text semantics and markup in WebVTT
WebVTT supports a richer cue text model than SRT. It allows a small subtitle oriented tag vocabulary and semantic spans such as:
<b>,<i>,<u>- class spans
<c.classname> - voice spans
<v Speaker> - language spans
<lang xx> - ruby annotations
- internal timestamps inside cue payloads
This makes WebVTT more suitable for web captioning, speaker labeling, and some advanced authoring needs, but many downstream tools only support a subset.
Comments, style blocks, and region definitions
WebVTT includes block types that SRT and SBV do not have:
- NOTE comments
- STYLE blocks
- REGION definitions
This expands WebVTT from a simple subtitle file into a more general timed text container syntax. It also increases the chance of conversion loss when exporting to SRT or SBV.
WebVTT strengths for web video pipelines
- Standards based syntax (W3C)
- Native alignment with HTML text tracks
- Cue settings for layout and positioning
- Semantic cue text support
- Supports multiple track uses beyond subtitles (chapters, metadata)
WebVTT limitations in mixed software ecosystems
- Some desktop tools and NLE importers only partially support cue settings or semantics
- Conversion to SRT can lose regions, styles, and metadata semantics
- Different renderers interpret the same VTT settings differently
SBV (SubViewer / YouTube SBV) format deep technical analysis
What SBV is in practice
SBV is a lightweight plain text subtitle format commonly associated with YouTube subtitle workflows and SubViewer style timing. In modern production practice, SBV is most often encountered as a YouTube oriented caption file rather than as a general purpose delivery format.
Unlike WebVTT, SBV does not have a widely referenced modern standards document maintained by a standards body. Most technical teams treat SBV behavior as platform defined and tool defined, with YouTube examples and de facto converter behavior serving as reference.
Canonical SBV block structure used in YouTube examples
SBV uses a very simple cue block:
- One timing line containing start and end timestamps separated by a comma
- One or more subtitle text lines
- Blank line separator between cues
0:00:00.599,0:00:04.160
>> ALICE: Hi, my name is Alice Miller and this is John Brown
0:00:04.160,0:00:06.770
>> JOHN: and we're the owners of Miller Bakery.
SBV timing syntax and differences from SRT and VTT
The core SBV timing line differs from both SRT and WebVTT:
- SBV:
start,endon one line (comma separator between timestamps) - SRT:
start --> endon one line with arrow separator - WebVTT:
start --> endwith optional cue settings and required file header
SBV timestamps typically use a dot for fractional seconds, for example 0:00:01.000.
The hour field is commonly not zero padded to two digits in examples, which is another source of format variation during conversion.
SBV and styling support expectations
In practical YouTube usage, SBV is treated as a basic text format. Basic file variants are accepted for timing and text, while style markup support is not the main use case. Teams should assume that SBV is a plain transport format and not a styling format.
SBV parser behavior and de facto quirks
Because SBV is often handled by converters and platform uploaders rather than advanced subtitle authoring tools, the most common de facto behaviors are:
- files are expected to be simple and clean, with blank lines between cues
- timestamps usually use dot milliseconds
- there are no cue sequence numbers
- there is no global header like
WEBVTT - format is frequently converted to SRT for broader editing or playback compatibility
Why SBV still matters in technical workflows
Even if SBV is not the preferred archive or distribution format for many teams, it still matters because:
- YouTube and related tools may export captions in SBV
- legacy caption automation scripts may produce or consume SBV
- simple timestamp plus text structure makes SBV easy to parse and convert
- conversion to SRT or VTT is common in localization and editing workflows
SBV limitations
- No cue numbering
- No formal web track semantics like WebVTT
- No rich layout settings on cue timing lines
- Weak standardization compared to WebVTT
- Lower compatibility than SRT outside YouTube focused workflows
De facto behavior in software and platforms (important for engineering)
Formal syntax vs parser tolerance
One of the most important engineering realities is that subtitle software often accepts files that are not fully standard. This means a file may appear valid in one editor, fail in another, and import with altered timing in a third. For robust tooling, treat parsing and rendering as separate steps: parse liberally, normalize internally, render conservatively.
Common de facto SRT behaviors
- Missing cue numbers may still be accepted and reconstructed on export
- Dot milliseconds may be accepted even though canonical SRT uses comma
- Multiple blank lines or inconsistent blank lines may still parse
- Basic HTML like tags may render in some players and be ignored in others
- Legacy encodings are common in older subtitle files
Common de facto WebVTT behaviors
- Some tools accept comma fractions during import and rewrite to dot fractions on export
- Only a subset of cue settings is honored by some players
- STYLE and REGION blocks may be ignored by non browser tools
- Cue text tags may be partially supported or stripped during conversion
- Identifier uniqueness rules may not be strictly enforced by every parser
Common de facto SBV behaviors
- SBV is frequently treated as a YouTube export/import format rather than a final delivery format
- Converters often infer missing strictness and normalize blocks automatically
- Timing syntax is simple enough that malformed spacing is often fixable
- SBV files are commonly converted to SRT before editing in mainstream tools
Platform specific constraints can override format capability
A key practical point for technical teams is that a platform may support a file extension but only a limited feature subset. For example, a platform can accept SRT or SBV as upload formats but ignore style markup. Similarly, a WebVTT consumer may accept the file but ignore advanced cues, regions, or styling. Always validate against the target platform, not only the format specification.
Conversion engineering notes and edge cases (SRT, VTT, SBV)
Lossless vs lossy conversion expectations
Not all subtitle conversions are fully lossless. Timing values can usually be preserved exactly, but structure and semantics often cannot.
- SBV to SRT: usually close to lossless for basic subtitle text and timing
- SRT to VTT: usually close to lossless for text and timing, with syntax normalization
- VTT to SRT: potentially lossy because cue settings, regions, and advanced cue text semantics may not map
- VTT to SBV: highly lossy because SBV is simpler and lacks header and settings model
Key syntax mappings
| Feature | SRT | WebVTT | SBV |
|---|---|---|---|
| Header | None | WEBVTT required |
None |
| Cue numbering | Canonical yes | No (optional cue identifier) | No |
| Timing separator | --> |
--> |
comma between start and end |
| Millisecond separator | comma | dot | dot (typical) |
| Cue settings on time line | No standard support | Yes | No |
Timestamp conversion pitfalls
- Comma vs dot fraction conversion must be exact and reversible where possible
- Optional WebVTT hours may need expansion to canonical SRT hour format
- SBV often uses single digit hour field, which should be normalized if exporting to stricter formats
- Rounding or truncation can create overlaps after conversion if not handled carefully
- Zero length cues may be accepted by some systems and rejected by others
Text and markup conversion pitfalls
- WebVTT class, voice, language, ruby, and internal timestamps do not map cleanly to SRT or SBV
- SRT inline tags may be stripped by platforms even when a player displays them
- Line break tokens used in one tool may need conversion to literal line breaks in another
- Escaped entities may need normalization depending on target parser expectations
Encoding normalization is not optional in serious pipelines
If your subtitle conversion tool is intended for multilingual production use, normalize text to UTF-8 on output. This is especially important when importing SRT from legacy sources and when targeting platforms that require plain UTF-8 uploads.
Recommended internal representation for converters
A robust converter should parse all subtitle formats into a single internal model before rendering. A practical internal cue model usually includes:
- start and end times in integer milliseconds
- payload text as a list of lines
- optional cue identifier
- optional layout settings map
- warning list for parser repairs
This approach makes it easier to handle malformed input and produce deterministic normalized output.
Validation and normalization strategies for production subtitle pipelines
Parse loosely, render strictly
This is the most reliable design strategy for subtitle tooling. Accept common malformed inputs to reduce failure rates during import, but always emit a stricter normalized output format.
Suggested normalization rules for SRT output
- Renumber cues sequentially starting at 1
- Render timestamps as
HH:MM:SS,mmm - Use exactly one timing arrow format
--> - Use one blank line between cues
- Normalize line endings to LF or CRLF consistently
- Output UTF-8 text
- Strip unsupported advanced tags if target platform ignores them
Suggested normalization rules for WebVTT output
- Add
WEBVTTheader - Render timestamps with dot fractions
- Validate cue settings and remove invalid duplicates
- Separate blocks clearly with blank lines
- Preserve cue identifiers only when useful
- Decide whether STYLE and REGION blocks are kept or omitted based on target player support
Suggested normalization rules for SBV output
- Use one timing line per cue with
start,end - Use dot fractions in timestamps
- No cue numbering
- No header
- Keep payload plain and simple
- Separate cues with a single blank line
Error classes worth reporting in logs
- Malformed timestamp
- End time before start time
- Cue overlap created or found
- Missing blank line recovered heuristically
- Invalid encoding fallback used
- Dropped unsupported VTT settings during SRT or SBV export
- Removed unsupported markup
SRT vs WebVTT vs SBV comparison matrix for technical users
| Category | SRT | WebVTT (VTT) | SBV |
|---|---|---|---|
| Primary use | General subtitle interchange | Web media text tracks | YouTube style simple subtitles |
| Formal standard | No single modern canonical spec in common use | W3C standard | No widely used modern standards body spec |
| Header required | No | Yes | No |
| Cue numbers | Canonical yes | No | No |
| Milliseconds separator | Comma | Dot | Dot (typical) |
| Layout settings | Not standardized | Yes (cue settings) | No |
| Comments / metadata blocks | No standardized block types | Yes (NOTE, STYLE, REGION, more structured semantics) | No |
| Encoding certainty | Historically variable in practice | UTF-8 oriented standard usage | Platform expectations often plain UTF-8 in modern usage |
| Typical compatibility | Highest across players/editors | Best on web and modern track consumers | Narrower, often converted first |
Recommendations by workflow (engineering and content operations)
For website video playback with HTML track elements
Use WebVTT as the delivery format. Keep a normalized SRT export for editing and fallback. If your authoring source is SRT, convert to WebVTT late in the pipeline and validate in the target browsers.
For video editing and cross platform subtitle exchange
Use SRT as the main exchange format unless you specifically need WebVTT semantics. Enforce UTF-8, canonical timestamps, and normalized line endings in your pipeline.
For YouTube subtitle import/export workflows
Be prepared to handle SBV and SRT. If subtitles need to move into NLEs, archives, or other platforms, convert SBV to normalized SRT first. Keep the original SBV as source evidence if timing provenance matters.
For subtitle conversion software developers
Implement:
- format detection based on header and timing heuristics
- loose parser with explicit warning reporting
- strict renderer for deterministic outputs
- UTF-8 output normalization
- feature loss reporting when converting from VTT to SRT or SBV
Technical FAQ: SRT, VTT, and SBV subtitle formats
Is SRT formally standardized like WebVTT?
Not in the same way. SRT is widely used and well understood, but practical interoperability relies heavily on de facto conventions and parser tolerance. WebVTT has a formal W3C specification.
Why does SRT use commas while VTT uses dots for milliseconds?
This is a format syntax difference with historical roots. It is one of the most common causes of failed imports during naive subtitle conversion. Converters should explicitly normalize fractional separators.
Can I safely convert WebVTT to SRT without data loss?
Only if the WebVTT file uses basic cues without advanced settings, regions, or semantic cue text features. Otherwise, timing and text may convert, but layout and semantic information may be lost.
Is SBV obsolete?
SBV is not the best universal format, but it is still relevant because YouTube related workflows and legacy subtitle exports can produce it. It remains important in conversion and ingestion pipelines.
What should a technical team archive?
Archive the original source subtitle file plus at least one normalized interchange copy, typically UTF-8 SRT. If web playback is a primary target, also archive a validated WebVTT derivative.
Conclusion
SRT, WebVTT, and SBV all solve the same basic problem of timed text, but they do so with different assumptions. SRT wins on universal compatibility, WebVTT wins on standards based web features, and SBV remains useful as a simple YouTube centered source format. For robust subtitle engineering, treat subtitle conversion as a parsing and normalization problem, not just a timestamp string replacement task.