Get the perfect subtitle, in a single click. Thanks, Machine Learning.

version 0.12.0

Always in sync

CaptionPal uses a Machine Learning model to detect human speech. It synchronizes the subtitle by finding the delay and framerate that gives the best match between audio and subtitle.

Automatic retrieval

CaptionPal uses your video's filename to find and download the right subtitle. No more browsing the web.


This application is free and entirely open-source, released under GNU GPL v3.

How it works

CaptionPal integrates a Machine Learning model capable of detecting human speech inside an audio track (with an accuracy of ~ 82%). Final syncronization is highly accurate though, because errors are compensated with the length of the video.

The model has been trained with approximately 3 hours of English audio from two television series. The dataset is properly balanced between audio and non-audio sequences.

Thanks to this model, CaptionPal approximately knows in any video when there is human speech and when there is not. Although it's been trained with English audio, it is likely that this will work in other languages, supposing that human speech shares similar characteristics independently of the tongue. This remains to be verified.

Synchronization is then done by aligning the detected speech with the subtitle.
This is performed using a quick brute-force search to find the best combination of subtitle delay and framerate.


Several improvements are planned on the roadmap:


This application has been inspired by a few sources. Credits where it's due:

Show your love


This application could not be possible without the constant hard work from the people writing the subtitles.
CaptionPal fetches TV-series subs from Feel free to consider making a donation to them.
If you want to make a donation to CaptionPal, consider making one to a charity instead or star the project on Gitlab.
Or if you can code, contributions are welcomed!