logo Sign In

Post #697676

Author
CatBus
Parent topic
Project Threepio (Star Wars OOT subtitles)
Link to post in topic
https://originaltrilogy.com/post/id/697676/action/topic#697676
Date created
27-Mar-2014, 6:26 PM

CatBus said:

Feallan said:

I tried OCRing Thai with their trial, it was pretty legit. About half of the lines had a mistake or two, but it could be done.

Damn, their demo can do up to 100 pages during the trial period. I bet I could combine the 4000 or so images of individual lines of Thai text into less than that. Or at least one film at ~1200 images.

This is completely doable. I'll be combining the subtitles into pseudo-documents, simulating a page of A4 paper scanned at 300dpi with font sizes and margins within the normal range.  Each film should get around 30-ish "pages" of subtitles per language, which can then be fed into FineReader to produce actual text!

Then, of course, will be a lengthy process of manual correction and moving the subs back into an SRT format.

I'll start with Thai, but will then create these "pages" for Cantonese, Mandarin/Traditional, and, if it might help Sadako, Japanese.  I'm honestly not sure if the Chinese ones will end up being used, but considering Cantonese has no text equivalent at all, maybe I can include it as a convenience.