Updated: 15 March 2019. This tutorial shows how easy it is to create captions/subtitles files in WebVTT, SRT and DFXP format for subtitling in videos and audios. Depending on the media player you use, you can even provide subtitles in different languages for the same media. Subtitle files are basically text files you can write in Notepad(Win) or SimpleText(Mac).

Don’t use a text editor that permits formatting (changing the appearance of text) because the files will contain hidden information that prevents the subtitles from appearing in your video/audio. It has to be in the UFT-8 format, which both text editors mentioned above provide.

For Mac users: although you can format text in SimpleText, don’t do it!

How do we start?

The ideal would be to start from a scenario in which the spoken text is already written. Then you just need to extract the spoken text and place it in your simple text editor and afterward adapt it to the format of your choice.

However, many people create videos in which they simply talk in a natural way about a subject. In this case, you will have to play your video or audio and write down everything that is said, preferably dividing it in short sentences, because you want to avoid many lines of text. In this first take, you do not need to worry about the time, we cover that further up.

Here is an example of text in its initial state:

Lost Corners consists of charcoal paintings with pastel on paper and canvas.

The series shows landmarks, places and objects which we are so used to that we do not really see them anymore.

We would only notice if they would disappear, when it is too late.

Most of the charcoal paintings have a desolate atmosphere, enhanced by the limited range of color, consisting mostly of tinted grays, with the deep black of charcoal.

While showing rather inactive scenery with no humans in sight, there is nevertheless a suggestion of life.

Somehow these places look as if action can start any minute.

Soon a car might drive by…

a crane starts working…

a railway bridge may tumble… and click into place.

However, some of the artwork do contain people

Etc…

When you have created a list with spoken text like that, it is time to get the time for each line of text.

Timing and text

It will be beneficial to study how subtitling is implemented on DVDs. In this way, you get a feel how to cut your text up in piecemeal chunks that can be read quickly as the video or audio progresses. Writing a subtitle file is easy, but getting the time right requires trial and error. After all, subtitling is actually a profession on its own. TV stations and Movie producers hire these professionals to do the work, so you best learn from them by studying actual cases.

For instance, line 4 on my example above is rather long. It requires 3 lines of text which impose over the video. Not only does it take time to read, it also occupies a lot of video space:

Ideally, it would be better to cut this up in four parts, like this:

Most of the charcoal paintings have

a desolate atmosphere,

enhanced by the limited range of color,

consisting mostly of tinted grays,

with the deep black of charcoal.

But this is not always possible. It depends on the speed of the speaker, whether you have time to allow cutting it up. After all, you cannot replace a line of text with another before the viewer had a chance to read it. In practice it is often a trade-off between the time you can leave a line of text exposed and its length. It is obvious that the longer a line of text is, the more time it has to remain on screen.

According to DCMP, lines shouldn’t go beyond 32 characters, but in practice this is often impossible without shortening the text, i.e.: making it different from what is actually said. There is in principle nothing wrong with that, as long as you don’t change the sense of what is being said. Quite a few people take the liberty to create a summary of long phrases, which makes it more agreeable to follow.

When you have cut up your text in readable chunks, it is time to write down the time. Any subtitling format, whether it is WebVTT, SRT or DFXP requires two settings:

The time a text line(caption) needs to appear The time it ends showing up

To find out what needs to show up and when, can be found out by playing your video or audio and watch the clock in the controlbar of your video player. Most video players have this feature. If you do not have a video player like that, you may want to download QuickTime from Apple. It is available for Windows and Mac.

Play your media and jot down the times the texts have to appear and when to end. You should end up with something like this:

00:14 – 00:20.5

Lost Corners consists of charcoal paintings

with pastel on paper and canvas.

00:21.2 – 00:26.5

The series shows landmarks, places and

objects which we are so used to

00:26.5 – 00:27.5

that we do not really see them anymore.

00:27.8 – 00:31

We would only notice if they would disappear,

when it is too late.

Note that you can divide seconds into miliseconds. How this is implemented in the subtitle file depends on the format you select. Some formats allow for milliseconds, others 1/10 of a second. Below you find how to translate your list into the three formats, using the example above:

WebVTT example

This format allows for time notation of hours, minutes, seconds and milliseconds, respectively use like this:

00:00:00.000 where the 3 zeros at the end are the milliseconds.

WEBVTT

00:00:14.000 –> 00:00:20.500

Lost Corners consists of charcoal paintings

with pastel on paper and canvas.

00:00:21.000 –> 00:00:26.500

The series shows landmarks, places and objects

which we are so used to

00:00:26.600 –> 00:00:27.500

that we do not really see them anymore.

00:00:27.800 –> 00:00:31.250

We would only notice if they would disappear,

when it is too late.

When you are finished, save the file with a .vtt extension, like mycaption.vtt

SRT example

Basically the same setup as for WebVTT except that you need to add a number before each caption:

1

00:00:14,000 –> 00:00:20,500

Lost Corners consists of charcoal paintings

with pastel on paper and canvas.

2

00:00:21.000 –> 00:00:26,500

The series shows landmarks, places and objects

which we are so used to

3

00:00:26.600 –> 00:00:27,500

that we do not really see them anymore.

4

00:00:27.800 –> 00:00:31,250

We would only notice if they would disappear,

when it is too late.

When you are finished, save the file with a .srt extension, like mycaption.srt

DFXP example

This format is more complicated than WebVTT and SRT. It requires the type declaration of the file, but you can simply copy the example below and adapt it.

This format allows for time notation of hours, minutes, seconds and 1/10 of a second, respectively use like this:

<p begin=”00:14.1″ end=”00:20.5″> where the single number after the dot represents the 10th of seconds.

<tt xmlns=”http://www.w3.org/2006/10/ttaf1″>

<body>

<div>

<p begin=”00:14″ end=”00:20.5″>Lost Corners consists of charcoal paintings with pastel on paper and canvas.</p>

<p begin=”00:21.2″ end=”00:27.5″>The series shows landmarks, places and objects which we are so used to that we do not really see them anymore.</p>

<p begin=”00:27.8″ end=”00:31.3″>We would only notice if they would disappear, when it is too late.</p>

<p begin=”00:35″ end=”00:45″>Most of the charcoal paintings have a desolate atmosphere, enhanced by</p>

</div>

</body>

</tt>

When you are finished, save the file with an .dfxp extension, like mycaption.dfxp, but you also may use mycaption.xml since this basically is a XML file.



Test your captions/subtitles

It will take some tweaking to get those times right, therefore you need to test. If you do not have a media player for your site yet, you may want to download FlowPlayer or JW Player. For audios, you best try JW Player as it allows setting the height of the audio to leave room for captions. See this tutorial how to use JW player:

Embedding an audio with poster image, watermark and subtitles, using JW Player 5.10

NOTE: From JW Player 7.4 onward, you need to use the .VTT format because of iPad. SRT gives unrespected results in full screen.

An alternative is YouTube if you have a video, that is. When you upload your video to YouTube, you have the chance to add a subtitle file. That way you can check whether the times are correct or not. If it needs tweaking, and trust me, it will, adapt the file and test again until you have it right.

Be patient, in the beginning it takes a lot of time to get used how it works, but after a while, you get a feeling for timing and then it becomes easier.

See also How to fix subtitle problems in foreign languages to troubleshoot typical display errors.