logo Sign In

CatBus

User Group
Members
Join date
18-Aug-2011
Last activity
20-Dec-2025
Posts
5,985

Post History

Post
#700082
Topic
Harmy's STAR WARS Despecialized Edition HD - V2.7 - MKV (Released)
Time

Harmy said:

The question is what actually happens when it gets converted to 720p60? My guess is, that it simply shows some frames 3 times and some 2 times, which is something that happens on a 60Hz monitor or TV anyway and on a 120Hz or higher, the frames are simply going to be shown that many more times, right?

Precisely.  And it's honestly not that big of a deal for the most part, except you notice the lack of smooth scrolling in the credits, etc.

Post
#698303
Topic
Project Threepio (Star Wars OOT subtitles)
Time

Gather 'round, kiddos! It's story-time with CatBus.

So a conversation with Sadako started me thinking about all of the interesting and unexpected things a person learns when diving into a new language (such as: "jedi" means "to eat" in Croatian, which leads to some translation issues), and I thought I'd share one of the ones that I thought was neat.

So, first off, we have this unverified Simplified Mandarin fansub.  I suspect it's actually very good because I could tell whoever did it was very thorough and loved Star Wars, but I just can't say for certain if the Chinese was very good.

One of the interesting things about these Chinese subtitles is how they incorporate foreign words, such as Jawa.  There are, as you probably know, thousands of Chinese characters, and quite a lot of them can be pronounced as some variation on "ja" or "wa".  So a translator finds two characters that make those sounds, and if they're good, they choose two characters that actually can describe the thing in question.

"Wa" is easy. The character for "baby" is pronounced "wa", Jawas are small people and kinda cute, so that's that. "Ja" on the other hand... well, this translator chose "claw". So Jawas are "claw babies"...

...which is pretty accurate actually, but it somehow makes me think about an alternate version of Star Wars made by David Cronenberg, where claw babies would fit right in.  Also, it makes some lines a little funny, like Luke asking, "Why would the Empire want to slaughter claw babies?" Gee, I dunno, Luke.  Self-defense? Because they're an abomination? To kill them before they grow and multiply? I can think of plenty of reasons.

Anyway, that's just a language tidbit I thought was interesting and a little funny. There are plenty of others, I'm sure, if I think about it.

Post
#698015
Topic
Project Threepio (Star Wars OOT subtitles)
Time

BDSup2Sub, Perl, and ImageMagick--I'm a one-trick pony in that respect.  BDSup2Sub to extract individual image files, Perl to script everything, and ImageMagick to handle the image compositing.

What I've got only really works with white subtitles with a black border.  Yellow subtitles and such would require different code. Also if your subs aren't 720p, you may need to do some resizing. Can't guarantee it's bug-free, but even if there end up being problems, it still saves a lot of time.

Perl code below:

#!/usr/bin/perl -w

if($#ARGV==-1) {
  print "Usage: perl assemble.pl <filename...>\n";
  exit;
}

my @filelist=();
my @tmplist;
my $sourcefile;

for $sourcefile (@ARGV) {
  if(-e $sourcefile) {
    push(@filelist,$sourcefile);
  } else {
    @tmplist=glob($sourcefile);
    my $listsize=@tmplist;
     if($listsize==0) {
      print "Error: Could not find $sourcefile\n";
      exit;
    }
    push(@filelist,@tmplist);
  }
}

my $pagenum="01";
my $currentpage="page".$pagenum.".png";
my $pagewidth=2480;
my $pageheight=3508;
my $tmpfile="tmp1.png";
my $leftmargin=$pagewidth/20;
my $topmargin=$pageheight/20;
my $bottommargin=$pageheight/18;
my $currentmargin=$topmargin;

unlink($currentpage);

FILELOOP: for $sourcefile (@filelist) {
  if(!(-e $currentpage)) {
    system("convert -size ".$pagewidth."x".$pageheight." xc:white $currentpage");
    $currentmargin=$topmargin;
  }
  system("convert $sourcefile -negate $tmpfile");
  my $imagewidth=0;
  my $imageheight=0;
  system("identify -format %wx%h $tmpfile > _dims.txt");
  open(FILE2,"_dims.txt");
  my $dims=<FILE2>;
  close(FILE2);
  unlink("_dims.txt");
  my @bits=split(/x/,$dims);
  $imagewidth=$bits[0];
  $imageheight=$bits[1];
  system("composite -compose atop -geometry +".$leftmargin."+".$currentmargin." $tmpfile $currentpage _".$currentpage);
  unlink($tmpfile);
  rename("_".$currentpage,$currentpage);
  $currentmargin=$currentmargin+$imageheight+5;
  if($currentmargin>=$pageheight-$bottommargin) {
    $pagenum=$pagenum+1;
    if(length($pagenum)<2) {
      $pagenum="0".$pagenum;
    }
    $currentpage="page".$pagenum.".png";
  }
}
print "Done.\n";
exit;

Post
#697676
Topic
Project Threepio (Star Wars OOT subtitles)
Time

CatBus said:

Feallan said:

I tried OCRing Thai with their trial, it was pretty legit. About half of the lines had a mistake or two, but it could be done.

Damn, their demo can do up to 100 pages during the trial period. I bet I could combine the 4000 or so images of individual lines of Thai text into less than that. Or at least one film at ~1200 images.

This is completely doable. I'll be combining the subtitles into pseudo-documents, simulating a page of A4 paper scanned at 300dpi with font sizes and margins within the normal range.  Each film should get around 30-ish "pages" of subtitles per language, which can then be fed into FineReader to produce actual text!

Then, of course, will be a lengthy process of manual correction and moving the subs back into an SRT format.

I'll start with Thai, but will then create these "pages" for Cantonese, Mandarin/Traditional, and, if it might help Sadako, Japanese.  I'm honestly not sure if the Chinese ones will end up being used, but considering Cantonese has no text equivalent at all, maybe I can include it as a convenience.

Post
#697627
Topic
Project Threepio (Star Wars OOT subtitles)
Time

The only real benefits of text-derived subs are that: 1) they can look better, 2) they can be easily modified and corrected, and 3) they can be used with old-school MKV players that don't support BD-SUP files. That's a short list of nice-to-haves, but none of them are critical.

So if there's even a chance of the odd OCR mistake, I'd rather stick with the graphical subs, because they are actually still pretty decent, after all. That's why Thai OCR is the only one I'm willing to go for, even if the Chinese OCR looks okay to me.

Post
#697575
Topic
Project Threepio (Star Wars OOT subtitles)
Time

I suppose I should add this regarding OCR--while I wouldn't trust any OCR without manual correction, and I am unqualified to do this manual correction for Japanese and Chinese text, I do feel confident that I could manage it (albeit very slowly) for Thai, because it uses an alphabet and diacritics with a much more manageable number of permutations.  I've even managed to transcribe a few lines myself entirely by hand before it got way too frustrating (my process for combining diacritics was pretty clunky).

So if anyone's sitting on a copy of FineReader, I'm sure we could make some use of it for this project.

Post
#697294
Topic
Project Threepio (Star Wars OOT subtitles)
Time

Yeah, that was a bit of an oversimplification--everything that exists requires some fairly significant post-correction--and unlike OCR'ing, say, Swedish--you actually have to be fairly familiar with the language to do the manual correction properly on Japanese (or Chinese). Plus, application Unicode support for furigana is pretty much shit, so you'll have to rework those bits anyway. And as you said, Japanese DVD subtitles are like a lo-fi 8-bit approximation of a character shape, and even good OCR software may fail with it.

In my estimation, for text as short as this, the manual correction would take just as long as the manual transcription. Whether that's true or not, I don't know, but that's what we're doing--and the results are good, and we're 1/3 of the way through!

Post
#697208
Topic
Project Threepio (Star Wars OOT subtitles)
Time

Project files have been updated to version 7.0 (original post has been updated as well). This release includes a huge amount of changes to the underlying utilities and procedures for creating subtitles, and a moderate amount of other changes, specified below. Please PM me for the temporary download links until the files are available in a more permanent location.

Rough summary of text changes:
- Added Ukrainian subtitles (verified, thanks to lexsanor)
- Japanese subtitles converted from graphical-only to text and graphical (Star Wars only for now, huge thanks to Sadako--OCR doesn't really work for Japanese, so this work was all manual!!)
- Further improvements to Polish subs (thanks to Feallan)
- American Spanish subtitles promoted from unverified to verified (thanks to Leoj)
- Russian subtitles promoted from unverified to verified (thanks to lexsanor)
- Indonesian subtitles are now available in both text and graphical form for the entire trilogy, verified. They haven't all been shortened as effectively as the ones for Star Wars, so there may still be some improvements to come for these
- Improvement to Croatian subtitles for Empire, but I don't yet consider these to be verified (thanks to Feallan's international connections)
- Minor improvement to English subtitles for Empire, including the addition of text-only subtitles for the 16mm mono mix

Rough summary of behind-the-scenes changes:
- Every text-based subtitle has been re-rendered to correct an issue where a few pixels of semitransparent drop shadow were shaved off in earlier versions (this issue was barely visible; otherwise the new subtitles should look identical). The drop shadow was removed from SDH subtitles, where the black background made a drop shadow redundant
- Non-SDH DVD-resolution subtitles were re-rendered with a slightly thicker black border (easier to read on a standard definition CRT)
- Project Threepio now includes 1080p SUP files (upscaled from 720p).  I'm really hoping these might be useful to someone someday ;)
- Further improvements to cross-platform support (still requires Windows for the initial render of text subtitles)
- This project is finally weaned off an old, outdated version of BDSup2Sub. The upside is no longer having to work around its bugs, the downside is that completely different command syntax is required for batch processing on Windows and non-Windows platforms, so the instructions have become more convoluted
- The Perl scripts have been moved out of the project's root folder to avoid confusing people
- Many more processes have been fully automated (for example, adjusting subtitle positions), which I hope will reduce errors and improve quality, and also save my wrists from certain ruin

I'm sure I've forgotten something, but those are the big changes.

Post
#697054
Topic
Harmy's STAR WARS Despecialized Edition HD - V2.7 - MKV (Released)
Time

michaelkirschner said:

CatBus said:

Harmy-- when you're about ready to start authoring the BD, let me know. I've got another major release of Project Threepio in the works.

 Have you ever considered doing subs for alternate audio tracks like maybe the commentary tracks?

Very briefly, but no. I have a hard enough time preventing my project from sprawling totally completely out of control--I have decided against doing subtitles for commentary tracks and supplemental features. I barely accommodate alternate audio mixes as it is (I primarily do original multichannel mixes, but provide additional source materials for original mono mixes, but don't do the 85 or 93 audio revisions, for example). I have to draw a line somewhere, for my own sanity.

Project Threepio is designed as more of a subtitling toolkit, however, so people interested in making subs may find some value in using some of my tools, etc, if not the actual subtitles.