Convert subtitle formats for free: SRT, VTT, SBV, and text

Overview of the common subtitle formats

This article analyzes three widely used plain text subtitle formats used in production, web delivery, archives, and conversion pipelines: SRT (SubRip), WebVTT (VTT), and SBV (SubViewer / YouTube SBV). It covers formal syntax, timing rules, parser behavior, de facto conventions, interoperability issues, and practical conversion decisions.

SRT vs WebVTT vs SBV: format overview and decision summary

SRT, WebVTT, and SBV are all text based timed subtitle formats, but they differ in goals and parser expectations. SRT is the de facto universal exchange format for video players and editing tools. WebVTT is the standards based web subtitle format designed for HTML media text tracks and supports richer semantics and cue layout controls. SBV is a lightweight subtitle format strongly associated with YouTube workflows and simple caption exchange.

If your goal is maximum compatibility, SRT is usually the safest export target. If your goal is browser native playback with cue positioning and web track semantics, WebVTT is the correct format. If your source data comes from YouTube subtitle export or legacy simple caption workflows, you may encounter SBV and need conversion.

Quick selection guide for technical teams

Use SRT for universal interchange, quick editing, and broad software compatibility.
Use WebVTT for HTML5 video tracks, web caption styling hooks, chapters, and metadata tracks.
Use SBV only when required by a specific source workflow (most commonly YouTube related export/import scenarios).

Why subtitle format differences matter in real systems

In small projects, subtitle conversion can look like a simple timestamp replacement task. In production systems, the details matter: encoding mismatches break non ASCII characters, parser leniency hides malformed files until a stricter platform rejects them, and timing rounding can introduce overlaps or zero length cues that break importers.

The biggest implementation risks usually come from:

timestamp punctuation differences (comma vs dot)
optional vs required headers
presence or absence of cue numbers
line break handling and block separation
encoding assumptions (UTF-8 vs legacy encodings)
layout and styling features that do not map cleanly between formats
platform specific parser tolerance that differs from formal syntax

SRT format (SubRip) deep technical analysis

What SRT is and why it remains dominant

SRT (SubRip Subtitle format) is a plain text sidecar subtitle format that originated from the SubRip software ecosystem. It became widely adopted because the structure is simple, human editable, and supported by many players, editors, and platforms. In practice, SRT is often treated as the default subtitle exchange format even though it is not governed by a single modern standards body in the same way as WebVTT.

Canonical SRT cue structure

A typical SRT cue block contains four logical parts:

Sequential cue number
Timing line in HH:MM:SS,mmm --> HH:MM:SS,mmm format
One or more text lines
A blank line that terminates the block

1
00:00:01,500 --> 00:00:04,000
Hello, world!

2
00:00:05,000 --> 00:00:08,500
This is a subtitle.

SRT timestamp specifics

The canonical SRT timestamp uses:

hours, minutes, seconds as zero padded two digit fields
milliseconds as a three digit field
a comma as the fractional separator
an arrow token --> between start and end times

This comma separator is one of the most important practical differences between SRT and both WebVTT and SBV, which commonly use a dot for fractional seconds.

SRT encoding reality and interoperability risk

SRT is plain text, but the format itself historically does not enforce a single universal encoding. This is one of the most common causes of broken accented characters or mojibake when moving subtitle files between operating systems, editors, and players. Many modern pipelines normalize SRT to UTF-8 for reliability.

Platform rules can be stricter than generic SRT expectations. For example, some platforms explicitly require UTF-8 and ignore basic markup even when certain desktop players may render it.

SRT markup and formatting support in practice

SRT is often described as plain text, but in practice many tools tolerate or render a small subset of HTML like tags such as , , , and sometimes color via . This is not a universally consistent behavior across platforms. Some players render these tags, some strip them, and some display them as literal text.

SRT parser leniency and de facto behavior

Real world SRT files often deviate from canonical formatting and still work because importers are permissive. Common tolerated deviations include:

missing or non sequential cue numbers
extra spaces around the arrow token
dot milliseconds instead of comma milliseconds
inconsistent blank line usage between cues
mixed line endings (LF, CRLF)
legacy encodings instead of UTF-8

A technical pipeline should not assume that files labeled .srt are syntactically clean. Robust importers should parse loosely, then normalize.

SRT strengths for engineering workflows

Very easy to generate and debug manually
Excellent interoperability as an export and archive sidecar format
Simple parsing model for scripts and batch processing
Works well for localization handoff when rich styling is not required

SRT limitations for advanced caption workflows

No standardized native cue positioning model comparable to WebVTT cue settings
No standard region model
No standard metadata or chapter cue types
Encoding ambiguity unless the workflow enforces UTF-8
Markup behavior varies by player and platform

WebVTT (VTT) format deep technical analysis

What WebVTT is designed to solve

WebVTT (Web Video Text Tracks) is a W3C standardized timed text format created for web media text tracks. It is designed for use with HTML media and supports subtitles, captions, chapters, and metadata style track uses. Compared to SRT, WebVTT keeps the plain text feel but adds a formal syntax, cue settings, and richer cue text semantics.

WebVTT file structure at a high level

A WebVTT file starts with a required header line: WEBVTT and then contains a sequence of blocks such as cues, comments, styles, and region definitions.

WEBVTT

00:01.000 --> 00:04.000
Never drink liquid nitrogen.

WebVTT cue structure

A WebVTT cue block can contain:

Optional cue identifier
Timing line with start and end timestamps
Optional cue settings on the timing line
Cue payload text
Blank line terminator

intro-1
00:00:22.230 --> 00:00:24.606 align:start line:90%
Hello from a WebVTT cue.

WebVTT timestamp syntax and strictness

WebVTT timestamps use a dot as the fractional separator and support the form [hh:]mm:ss.mmm. Hours can be omitted when zero in many valid WebVTT cues. This is a key difference from SRT, where the canonical form is always hour based with comma fractions.

For conversion code, WebVTT timestamp parsing should be more strict than SRT if you want standards compliance, but many ingest systems still accept slightly malformed VTT in practice.

Cue settings in WebVTT

One of the biggest practical advantages of WebVTT is cue settings, which let authors control cue placement and orientation. Common settings include:

vertical
line
position
size
align

These settings appear on the same line as the cue timing and are separated by spaces. The web platform and player implementation determine the actual rendering behavior.

Cue text semantics and markup in WebVTT

WebVTT supports a richer cue text model than SRT. It allows a small subtitle oriented tag vocabulary and semantic spans such as:

, , 
class spans <c.classname>
voice spans <v Speaker>
language spans <lang xx>
ruby annotations
internal timestamps inside cue payloads

This makes WebVTT more suitable for web captioning, speaker labeling, and some advanced authoring needs, but many downstream tools only support a subset.

Comments, style blocks, and region definitions

WebVTT includes block types that SRT and SBV do not have:

NOTE comments
STYLE blocks
REGION definitions

This expands WebVTT from a simple subtitle file into a more general timed text container syntax. It also increases the chance of conversion loss when exporting to SRT or SBV.

WebVTT strengths for web video pipelines

Standards based syntax (W3C)
Native alignment with HTML text tracks
Cue settings for layout and positioning
Semantic cue text support
Supports multiple track uses beyond subtitles (chapters, metadata)

WebVTT limitations in mixed software ecosystems

Some desktop tools and NLE importers only partially support cue settings or semantics
Conversion to SRT can lose regions, styles, and metadata semantics
Different renderers interpret the same VTT settings differently

SBV (SubViewer / YouTube SBV) format deep technical analysis

What SBV is in practice

SBV is a lightweight plain text subtitle format commonly associated with YouTube subtitle workflows and SubViewer style timing. In modern production practice, SBV is most often encountered as a YouTube oriented caption file rather than as a general purpose delivery format.

Unlike WebVTT, SBV does not have a widely referenced modern standards document maintained by a standards body. Most technical teams treat SBV behavior as platform defined and tool defined, with YouTube examples and de facto converter behavior serving as reference.

Canonical SBV block structure used in YouTube examples

SBV uses a very simple cue block:

One timing line containing start and end timestamps separated by a comma
One or more subtitle text lines
Blank line separator between cues

0:00:00.599,0:00:04.160
>> ALICE: Hi, my name is Alice Miller and this is John Brown

0:00:04.160,0:00:06.770
>> JOHN: and we're the owners of Miller Bakery.

SBV timing syntax and differences from SRT and VTT

The core SBV timing line differs from both SRT and WebVTT:

SBV: start,end on one line (comma separator between timestamps)
SRT: start --> end on one line with arrow separator
WebVTT: start --> end with optional cue settings and required file header

SBV timestamps typically use a dot for fractional seconds, for example 0:00:01.000. The hour field is commonly not zero padded to two digits in examples, which is another source of format variation during conversion.

SBV and styling support expectations

In practical YouTube usage, SBV is treated as a basic text format. Basic file variants are accepted for timing and text, while style markup support is not the main use case. Teams should assume that SBV is a plain transport format and not a styling format.

SBV parser behavior and de facto quirks

Because SBV is often handled by converters and platform uploaders rather than advanced subtitle authoring tools, the most common de facto behaviors are:

files are expected to be simple and clean, with blank lines between cues
timestamps usually use dot milliseconds
there are no cue sequence numbers
there is no global header like WEBVTT
format is frequently converted to SRT for broader editing or playback compatibility

Why SBV still matters in technical workflows

Even if SBV is not the preferred archive or distribution format for many teams, it still matters because:

YouTube and related tools may export captions in SBV
legacy caption automation scripts may produce or consume SBV
simple timestamp plus text structure makes SBV easy to parse and convert
conversion to SRT or VTT is common in localization and editing workflows

SBV limitations

No cue numbering
No formal web track semantics like WebVTT
No rich layout settings on cue timing lines
Weak standardization compared to WebVTT
Lower compatibility than SRT outside YouTube focused workflows

De facto behavior in software and platforms (important for engineering)

Formal syntax vs parser tolerance

One of the most important engineering realities is that subtitle software often accepts files that are not fully standard. This means a file may appear valid in one editor, fail in another, and import with altered timing in a third. For robust tooling, treat parsing and rendering as separate steps: parse liberally, normalize internally, render conservatively.

Common de facto SRT behaviors

Missing cue numbers may still be accepted and reconstructed on export
Dot milliseconds may be accepted even though canonical SRT uses comma
Multiple blank lines or inconsistent blank lines may still parse
Basic HTML like tags may render in some players and be ignored in others
Legacy encodings are common in older subtitle files

Common de facto WebVTT behaviors

Some tools accept comma fractions during import and rewrite to dot fractions on export
Only a subset of cue settings is honored by some players
STYLE and REGION blocks may be ignored by non browser tools
Cue text tags may be partially supported or stripped during conversion
Identifier uniqueness rules may not be strictly enforced by every parser

Common de facto SBV behaviors

SBV is frequently treated as a YouTube export/import format rather than a final delivery format
Converters often infer missing strictness and normalize blocks automatically
Timing syntax is simple enough that malformed spacing is often fixable
SBV files are commonly converted to SRT before editing in mainstream tools

Platform specific constraints can override format capability

A key practical point for technical teams is that a platform may support a file extension but only a limited feature subset. For example, a platform can accept SRT or SBV as upload formats but ignore style markup. Similarly, a WebVTT consumer may accept the file but ignore advanced cues, regions, or styling. Always validate against the target platform, not only the format specification.

Conversion engineering notes and edge cases (SRT, VTT, SBV)

Lossless vs lossy conversion expectations

Not all subtitle conversions are fully lossless. Timing values can usually be preserved exactly, but structure and semantics often cannot.

SBV to SRT: usually close to lossless for basic subtitle text and timing
SRT to VTT: usually close to lossless for text and timing, with syntax normalization
VTT to SRT: potentially lossy because cue settings, regions, and advanced cue text semantics may not map
VTT to SBV: highly lossy because SBV is simpler and lacks header and settings model

Key syntax mappings

Feature	SRT	WebVTT	SBV
Header	None	`WEBVTT` required	None
Cue numbering	Canonical yes	No (optional cue identifier)	No
Timing separator	`-->`	`-->`	comma between start and end
Millisecond separator	comma	dot	dot (typical)
Cue settings on time line	No standard support	Yes	No

Timestamp conversion pitfalls

Comma vs dot fraction conversion must be exact and reversible where possible
Optional WebVTT hours may need expansion to canonical SRT hour format
SBV often uses single digit hour field, which should be normalized if exporting to stricter formats
Rounding or truncation can create overlaps after conversion if not handled carefully
Zero length cues may be accepted by some systems and rejected by others

Text and markup conversion pitfalls

WebVTT class, voice, language, ruby, and internal timestamps do not map cleanly to SRT or SBV
SRT inline tags may be stripped by platforms even when a player displays them
Line break tokens used in one tool may need conversion to literal line breaks in another
Escaped entities may need normalization depending on target parser expectations

Encoding normalization is not optional in serious pipelines

If your subtitle conversion tool is intended for multilingual production use, normalize text to UTF-8 on output. This is especially important when importing SRT from legacy sources and when targeting platforms that require plain UTF-8 uploads.

Recommended internal representation for converters

A robust converter should parse all subtitle formats into a single internal model before rendering. A practical internal cue model usually includes:

start and end times in integer milliseconds
payload text as a list of lines
optional cue identifier
optional layout settings map
warning list for parser repairs

This approach makes it easier to handle malformed input and produce deterministic normalized output.

Validation and normalization strategies for production subtitle pipelines

Parse loosely, render strictly

This is the most reliable design strategy for subtitle tooling. Accept common malformed inputs to reduce failure rates during import, but always emit a stricter normalized output format.

Suggested normalization rules for SRT output

Renumber cues sequentially starting at 1
Render timestamps as HH:MM:SS,mmm
Use exactly one timing arrow format -->
Use one blank line between cues
Normalize line endings to LF or CRLF consistently
Output UTF-8 text
Strip unsupported advanced tags if target platform ignores them

Suggested normalization rules for WebVTT output

Add WEBVTT header
Render timestamps with dot fractions
Validate cue settings and remove invalid duplicates
Separate blocks clearly with blank lines
Preserve cue identifiers only when useful
Decide whether STYLE and REGION blocks are kept or omitted based on target player support

Suggested normalization rules for SBV output

Use one timing line per cue with start,end
Use dot fractions in timestamps
No cue numbering
No header
Keep payload plain and simple
Separate cues with a single blank line

Error classes worth reporting in logs

Malformed timestamp
End time before start time
Cue overlap created or found
Missing blank line recovered heuristically
Invalid encoding fallback used
Dropped unsupported VTT settings during SRT or SBV export
Removed unsupported markup

SRT vs WebVTT vs SBV comparison matrix for technical users

Category	SRT	WebVTT (VTT)	SBV
Primary use	General subtitle interchange	Web media text tracks	YouTube style simple subtitles
Formal standard	No single modern canonical spec in common use	W3C standard	No widely used modern standards body spec
Header required	No	Yes	No
Cue numbers	Canonical yes	No	No
Milliseconds separator	Comma	Dot	Dot (typical)
Layout settings	Not standardized	Yes (cue settings)	No
Comments / metadata blocks	No standardized block types	Yes (NOTE, STYLE, REGION, more structured semantics)	No
Encoding certainty	Historically variable in practice	UTF-8 oriented standard usage	Platform expectations often plain UTF-8 in modern usage
Typical compatibility	Highest across players/editors	Best on web and modern track consumers	Narrower, often converted first

Recommendations by workflow (engineering and content operations)

For website video playback with HTML track elements

Use WebVTT as the delivery format. Keep a normalized SRT export for editing and fallback. If your authoring source is SRT, convert to WebVTT late in the pipeline and validate in the target browsers.

For video editing and cross platform subtitle exchange

Use SRT as the main exchange format unless you specifically need WebVTT semantics. Enforce UTF-8, canonical timestamps, and normalized line endings in your pipeline.

For YouTube subtitle import/export workflows

Be prepared to handle SBV and SRT. If subtitles need to move into NLEs, archives, or other platforms, convert SBV to normalized SRT first. Keep the original SBV as source evidence if timing provenance matters.

For subtitle conversion software developers

Implement:

format detection based on header and timing heuristics
loose parser with explicit warning reporting
strict renderer for deterministic outputs
UTF-8 output normalization
feature loss reporting when converting from VTT to SRT or SBV

Technical FAQ: SRT, VTT, and SBV subtitle formats

Is SRT formally standardized like WebVTT?

Not in the same way. SRT is widely used and well understood, but practical interoperability relies heavily on de facto conventions and parser tolerance. WebVTT has a formal W3C specification.

Why does SRT use commas while VTT uses dots for milliseconds?

This is a format syntax difference with historical roots. It is one of the most common causes of failed imports during naive subtitle conversion. Converters should explicitly normalize fractional separators.

Can I safely convert WebVTT to SRT without data loss?

Only if the WebVTT file uses basic cues without advanced settings, regions, or semantic cue text features. Otherwise, timing and text may convert, but layout and semantic information may be lost.

Is SBV obsolete?

SBV is not the best universal format, but it is still relevant because YouTube related workflows and legacy subtitle exports can produce it. It remains important in conversion and ingestion pipelines.

What should a technical team archive?

Archive the original source subtitle file plus at least one normalized interchange copy, typically UTF-8 SRT. If web playback is a primary target, also archive a validated WebVTT derivative.

Conclusion

SRT, WebVTT, and SBV all solve the same basic problem of timed text, but they do so with different assumptions. SRT wins on universal compatibility, WebVTT wins on standards based web features, and SBV remains useful as a simple YouTube centered source format. For robust subtitle engineering, treat subtitle conversion as a parsing and normalization problem, not just a timestamp string replacement task.