logo Sign In

CatBus

User Group
Members
Join date
18-Aug-2011
Last activity
9-Jul-2025
Posts
5,997

Post History

Post
#697575
Topic
Project Threepio (Star Wars OOT subtitles)
Time

I suppose I should add this regarding OCR--while I wouldn't trust any OCR without manual correction, and I am unqualified to do this manual correction for Japanese and Chinese text, I do feel confident that I could manage it (albeit very slowly) for Thai, because it uses an alphabet and diacritics with a much more manageable number of permutations.  I've even managed to transcribe a few lines myself entirely by hand before it got way too frustrating (my process for combining diacritics was pretty clunky).

So if anyone's sitting on a copy of FineReader, I'm sure we could make some use of it for this project.

Post
#697294
Topic
Project Threepio (Star Wars OOT subtitles)
Time

Yeah, that was a bit of an oversimplification--everything that exists requires some fairly significant post-correction--and unlike OCR'ing, say, Swedish--you actually have to be fairly familiar with the language to do the manual correction properly on Japanese (or Chinese). Plus, application Unicode support for furigana is pretty much shit, so you'll have to rework those bits anyway. And as you said, Japanese DVD subtitles are like a lo-fi 8-bit approximation of a character shape, and even good OCR software may fail with it.

In my estimation, for text as short as this, the manual correction would take just as long as the manual transcription. Whether that's true or not, I don't know, but that's what we're doing--and the results are good, and we're 1/3 of the way through!

Post
#697208
Topic
Project Threepio (Star Wars OOT subtitles)
Time

Project files have been updated to version 7.0 (original post has been updated as well). This release includes a huge amount of changes to the underlying utilities and procedures for creating subtitles, and a moderate amount of other changes, specified below. Please PM me for the temporary download links until the files are available in a more permanent location.

Rough summary of text changes:
- Added Ukrainian subtitles (verified, thanks to lexsanor)
- Japanese subtitles converted from graphical-only to text and graphical (Star Wars only for now, huge thanks to Sadako--OCR doesn't really work for Japanese, so this work was all manual!!)
- Further improvements to Polish subs (thanks to Feallan)
- American Spanish subtitles promoted from unverified to verified (thanks to Leoj)
- Russian subtitles promoted from unverified to verified (thanks to lexsanor)
- Indonesian subtitles are now available in both text and graphical form for the entire trilogy, verified. They haven't all been shortened as effectively as the ones for Star Wars, so there may still be some improvements to come for these
- Improvement to Croatian subtitles for Empire, but I don't yet consider these to be verified (thanks to Feallan's international connections)
- Minor improvement to English subtitles for Empire, including the addition of text-only subtitles for the 16mm mono mix

Rough summary of behind-the-scenes changes:
- Every text-based subtitle has been re-rendered to correct an issue where a few pixels of semitransparent drop shadow were shaved off in earlier versions (this issue was barely visible; otherwise the new subtitles should look identical). The drop shadow was removed from SDH subtitles, where the black background made a drop shadow redundant
- Non-SDH DVD-resolution subtitles were re-rendered with a slightly thicker black border (easier to read on a standard definition CRT)
- Project Threepio now includes 1080p SUP files (upscaled from 720p).  I'm really hoping these might be useful to someone someday ;)
- Further improvements to cross-platform support (still requires Windows for the initial render of text subtitles)
- This project is finally weaned off an old, outdated version of BDSup2Sub. The upside is no longer having to work around its bugs, the downside is that completely different command syntax is required for batch processing on Windows and non-Windows platforms, so the instructions have become more convoluted
- The Perl scripts have been moved out of the project's root folder to avoid confusing people
- Many more processes have been fully automated (for example, adjusting subtitle positions), which I hope will reduce errors and improve quality, and also save my wrists from certain ruin

I'm sure I've forgotten something, but those are the big changes.

Post
#697054
Topic
Harmy's STAR WARS Despecialized Edition HD - V2.7 - MKV (Released)
Time

michaelkirschner said:

CatBus said:

Harmy-- when you're about ready to start authoring the BD, let me know. I've got another major release of Project Threepio in the works.

 Have you ever considered doing subs for alternate audio tracks like maybe the commentary tracks?

Very briefly, but no. I have a hard enough time preventing my project from sprawling totally completely out of control--I have decided against doing subtitles for commentary tracks and supplemental features. I barely accommodate alternate audio mixes as it is (I primarily do original multichannel mixes, but provide additional source materials for original mono mixes, but don't do the 85 or 93 audio revisions, for example). I have to draw a line somewhere, for my own sanity.

Project Threepio is designed as more of a subtitling toolkit, however, so people interested in making subs may find some value in using some of my tools, etc, if not the actual subtitles.

Post
#695558
Topic
Project Threepio (Star Wars OOT subtitles)
Time

PM sent.

BTW, here's a report on the differences in the (16mm) mono mix for Empire:

- Threepio says: "Oh, this is suicide! There's nowhere to go!"

- Threepio says: "Oh, dear. What now? I don't like the look of this. If only you'd attached my legs..."

- Threepio says "Hello?" only once when looking for the R2 unit in Cloud City.

Lots of little mixing differences, occasionally making dialogue easier to hear, as was the case with the Star Wars mono mix. For example, when Lando says "So you see, since we're a small operation...", the "So" is much more clear in the mono mix.

I plan on including SRT files for the Empire mono mix in the next release, but no pre-rendered graphics, similar to what I've done for the Star Wars mono mix.

Post
#695494
Topic
Project Threepio (Star Wars OOT subtitles)
Time

FWIW, I'm going back and doing another thorough pass on English dialogue in Empire (mostly to see if there are any other mono mix differences), and I've actually managed to find a few minor issues in the regular non-mono subs.  The biggest one I've found is where Han says a line in a strange ungrammatical fashion, and your brain (or at least mine) tends to automatically correct what he says.

When Leia says "I happen to like nice men", Han actually responds "I'm nice men." The current version of Project Threepio has this as "I'm a nice man." He also says "hydrospanners" instead of "hydrospanner".  These are pretty clear when you listen for them.

Post
#695348
Topic
Project Threepio (Star Wars OOT subtitles)
Time

We've also been able to find that (to my surprise) our Arabic subtitles are pretty good, for the most part. They are still missing some lines and will stay in "unverified" until someone can do a more thorough correction, but that means the only subtitles I now even suspect of being a complete trainwreck are the Croatian subs for Empire.

Thanks yet again to Feallan and his many international connections!

Post
#694699
Topic
Lightscribe
Time

I use an Epson R320 (which I got used, so I don't know about pricing), and the print quality is great, but I can't say I recommend them because the ink clogs up so much so you waste half your ink cleaning the heads (a common Epson complaint).  I'd probably go for an HP for my next purchase, just to avoid the clogging issue.  It's a real printer though--there may be some sort of DVD-only printing device that's cheaper.

Post
#694594
Topic
Lightscribe
Time

Yeah, I am a huge fan of Lightscribe, but that technology is dead dead dead. The way to handle labeling from now on is to buy inkjet-printable discs and a compatible printer. Sorry, durable tech lost out to flashy fading colors.

More details: You can only get single-layer Lightscribe DVD media anymore.  Dual-layer Lightscribe DVD media was available for a while, and I suppose if you're lucky you can find it used, but it's rare at best. Lightscribe BD-R never happened, and never will.

There was a Lightscribe edition of SureThing, which, combined with the basic Lightscribe interface software, is what I used to make discs for a long time. But inkjet-printable is the future, like it or not.

Do not under any circumstances use those peel-and-stick DVD labels. The glue can lose its stickiness when the disc gets warm, and that can lead to bad things.

Post
#694195
Topic
Info: Back to the Future - without DNR & EE
Time

Depends on your definition of absurd.  I think using BD50's for anything is absurd.

But I am an optical media fan, so I re-encoded the video down to 21GiB, and put it on a BD25. The source was so beautiful it even looks good after a re-encode.

EDIT: My opposition to BD50's is entirely because I'm a cheap bastard, not because I don't see the value in high bitrate video.

Post
#694053
Topic
Project Threepio (Star Wars OOT subtitles)
Time

Bulgarian subtitles have been promoted from unverified to verified.

Greek was found to be decent, but with spelling issues. I'm going to leave these marked as unverified until we can get those corrected.

Croatian was found to be decent for SW and ROTJ, but dreadful for ESB.  Again, I'm leaving these marked as unverified until we can get improvements.

No changes or new release (yet).

Thanks again to Feallan for the legwork on this.

Post
#693729
Topic
Project Threepio (Star Wars OOT subtitles)
Time

Turkish subtitles have been promoted from unverified to verified. No changes or new release (yet).

Thanks to Feallan for this verification, BTW. He's actively going out and finding native speakers who can verify if our remaining unverified subtitles are correct. Certainly beats my strategy of just waiting for someone who speaks one of these languages to come over to my house for dinner...

And FWIW: American Spanish, Russian, and Ukrainian will be verified in the next release, with updated translations.

Post
#693591
Topic
Info: Back to the Future - without DNR & EE
Time

zee944 said:

Jonno said:

zee944 said:

Does it mean it's a genuine remix?

 There's an oxymoron for you.

On genuine I mean it doesn't introduce new sound effects and places the old ones where they should be in the space by looking at the movie. So a good remix which is faithful to the movie's original intenion. But I suppose it's clear how I meant it anyway.

Well, according to IMDB, there was a theatrical 6-channel mix. I suppose it's possible that this new mix is simply that 6-channel mix mapped to a 5.1 arrangement, and the reduction in dynamics is just the result of whatever home near-field mix logic they currently employ for their audio. But I don't think that's very likely--I suspect it's a whole new mix, that, AFAICT, doesn't muck up anything but the dynamics--and even there it's not horribly flat, just not as good as the Laserdisc, that's all.

Also interested in the true video source if this isn't DCP-sourced after all.

Post
#693586
Topic
Project Threepio (Star Wars OOT subtitles)
Time

This is an interesting discovery. Well, interesting if you're into subtitle arcana, I guess. So, subtitles for HD formats are 8-bit images with full alpha transparency. Subtitles for DVD can have 3 colors and 1 fully transparent color, and no alpha channel. This is why subtitles for DVD look like, well, crap.

So in the process of upscaling my 720p subtitles to 1080p, I've been doing some processing to make them look better (just sharpening, really), but the result is always 8 bits with full transparency so I figure I'm good, right?

Except I'm not. Some software reads the subtitles based on the images just fine, but some other software simply doesn't like some of the images. It turns out that if I look at my known good (monochrome) subtitles, they always use the exact same palette of 125 colors, including transparency. If I remap one of my bad images to this new palette, the compatibility problem goes away (and with 124 shades of gray available, it doesn't look any different at all). Could this be the subtitle equivalent of a web-safe palette? (also, the palette may very well be larger than 125 colors if you include non-grayscale colors)

Who knows, but at least I've solved this weird bug. But it also exposes that, as far as I can tell, HD subtitles can't necessarily use their full 8-bits of color--not even quite 7 bits, really, and only using very specific values at that. But as it turns out, that's still quite enough for subtitles IMO.