Caption Overview

Taku Semba
3 min readMar 10, 2020

Captions provide viewers with information about what is occurring in their videos for viewers who are not able to understand or hear the language, and many streaming services, such as Netflix, YouTube, and Hulu, support captions to increase accessibility.

Open Captions vs Closed Captions

There are mainly two types of ways to support captions. One is Open Captions and the other one is Closed Captions.

Open Captions

Open captions, also known as burned-in, baked, burnt or hard-coded captions, are expressed as a part of a video frame, which means it can not be turned on and off. While it does not provide any flexibility, it is very easy to use because you do not need to keep and maintain additional caption files.

Closed Captions

On the other hand, since closed captions treat the caption data separately, it provides the flexibility of switching captions on/off and selecting captions for a specific language. Because of this flexibility, closed captions are widely used over open captions. However, unlike open captions, there are multiple ways to support closed captions and a media player has to support it to display the captions.

Options for Closed Captions

As I described above, there are many ways to carry caption data and a media player has to support some caption formats to archive closed captions. In this post, I will use ExoPlayer and deal with main caption formats supported by ExoPlayer.

You can confirm what type of caption formats are supported by ExoPlayer below.


TTML (Timed Text Markup Language) is an XML based caption developed by the World Wide Web Consortium (W3C) and supports so many features such as positioning, alignment, and styling.

Once you have the TTML file, you can show the captions using ExoPlayer.

If you are interested in how and where ExoPlayer parses a TTML file, you can check


WebVTT (The Web Video Text Tracks Format) is also developed by W3C, but it’s simpler compared to TTML format.

Once you have the WebVTT file, you can show the captions using ExoPlayer.

If you are interested in how and where ExoPlayer parses a WebVTT file, you can check

TTML/WebVTT into Box

If you use mp4 file-based streaming, you can put caption data into a box. How you can put those caption data into a box can be confirmed from ISO/IEC 14496–30.


CEA-608 was developed first for broadcast television, but CEA-708 was introduced to support more languages, appearances, positionings, etc. To use CEA-608/708 in ExoPlayer, caption data have to be embedded in SEI messages in fmp4 video streams, which means, instead of getting a caption file separately, caption data can be retrieved from SEI in NAL Unit.

You can play a media file with CEA-608 embedded from this link.


In this post, I explained the differences between Open Captions and Closed Captions and looked at some of the formats for Closed Captions, which are TTML, WebVTT, CEA-608/CEA-708. I hope this will help someone understand the overview of how captions work.