The vast majority of Chinese video content is subtitled, but most of the subtitles are hard subtitles, meaning the Chinese characters are burned into the frame of the video. The opposite of this would be soft or digital subtitles, which are much more useful for studying because they can be copy and pasted from a transcript and they are compatible with popup dictionaries such as the Zhongwen Chrome extension. If you are unfamiliar with anything I just said, you may want to check out this blog post about must-have tools for studying Chinese on YouTube.
Now that we’ve covered the basics, what I’m going to discuss in this post is a way you can convert hard subtitles to soft subtitles. I should say that this method is definitely not bullet proof and it presents certain challenges which I’ll discuss. This post is more like a case study of my own experiences trying to do this rather than a “How To” for a fully-fledged methodology. So let’s get right to it!
If you’ve read the post I just mentioned above, you’ll be familiar with the CopyFish Chrome extension. This extension performs Optical Character Recognition (or OCR) on screenshots you take of Chinese subtitles. You set a defined area where the subtitles appear and you can easily recapture the subtitles from the same area as the video plays. However, you have to manually click the button for every subtitle you want to convert. My goal was to turn this manual approach into an automated function that can convert the subtitles for the video ahead of time, so that you will have a transcript to review while you watch it later. My thought was that I could also upload the transcript into LingQ to read along with the video and LingQ words while you watch. I also discuss this in the blog post I mentioned.
As far as automation goes, there’s a cool application I’m going to use called Auto Mouse Click. It’s free and works on both Windows and Mac. This application allows you to automate repetitive actions on your computer and set up a list of actions to loop a specified number times or for a maximum length of time. These actions are things like clicking, moving the mouse, copying, pasting and a lot more.
For this project, I had my Chrome browser, Auto Mouse Click, and an Excel spreadsheet all open on my screen. I won’t go into a ton of detail here, but I determined every action required to convert a subtitle, paste it into Excel and then start the loop again ready to receive the next output. I had to pause the video temporarily after each conversion because the processing time for the OCR is longer than the time between subtitles. I also found that I could record the playback timestamp for each subtitle by right clicking the video and selecting “Copy Video URL at Current Time”. This is something I experimented with pasting into my Excel spreadsheet as well. I imagine this could be used eventually to generate an actual CRT subtitle file as an output from the automation.
The advantage of using an Excel spreadsheet instead of a Word document is that I inevitably had a lot of duplicate subtitles converted if the timing wasn’t quite right for the video changing to the next subtitle. So after completing my automation, I could easily select all lines of data and click “Remove Duplicates” to correct for this.
I’d recommend you check out the video version of this post so you can see how the automation looks when it’s running. I set this loop to run for a maximum time by multiplying the video runtime by 3. This was based on the pausing time between each subtitle I had set up earlier.
In the future, I’d like to set up a way to share the Auto Mouse Click file with you so you can try this yourself. But it’s a bit tricky to explain what you have to do to use the file, so I don’t want to waste the effort if there isn’t enough interest out there. So, if you really want the file to do this yourself, make sure you go to the video at the top of this post and hit the Like button on that video, because I will share the file and a tutorial for using it if this video gets more than 200 likes.
Regarding the challenges I mentioned earlier, the OCR computation you get with this is not perfect and it can be thrown off if the background is too similar in color to the subtitles. Sometimes the output is wrong even when the background has good contrast. Also, YouTube will occasionally place an ad popup on the screen exactly where the subtitles are. This is something you would miss if you walked away while running the automation on a long video. You would come back to your computer and find that all the output data was wrong after the ad appeared because these pop-ups don’t go away unless you close them out. Also, some videos have ads in the middle which might through off your transcript if the ad has subtitles as well. But despite all these challenges, I think this technique can definitely help with studying and I’m sure I can improve upon on the method for better error checking in the future.
Good luck using this method yourself. If you’ve used other ways to tackle this problem, we’d love to hear about it. Leave a comment below!