Automatic retrieval CaptionPal uses your video's filename to find and download the right subtitle. No more browsing the web.

Always in sync CaptionPal uses a Machine Learning model to detect human speech. It synchronizes the subtitle by finding the delay and framerate that gives the best match between audio and subtitle.

How it works

CaptionPal integrates a Machine Learning model capable of detecting human speech inside an audio track (with an accuracy of ~ 82%). Final syncronization is highly accurate though, because errors are compensated with the length of the video.

The model has been trained with approximately 3 hours of English audio from two television series. The dataset is properly balanced between audio and non-audio sequences.

Thanks to this model, CaptionPal approximately knows in any video when there is human speech and when there is not. Although it's been trained with English audio, it is likely that this will work in other languages, supposing that human speech shares similar characteristics independently of the tongue. This remains to be verified.

Synchronization is then done by aligning the detected speech with the subtitle. This is performed using a quick brute-force search to find the best combination of subtitle delay and framerate.