Project files have been updated to version 4.0 (original post has been updated as well). The new download links are NOT the same as before, so please PM me for the links.
Rough summary of changes:
Added six new languages: Simplified Chinese, Russian, Arabic, Turkish, Dutch, and Czech. All scripts and utilities updated to better support Unicode and RTL languages. All SRT files changed to UTF-8 encoding. Minor text/typo changes in many languages, including English, German, and Italian.
This release marks the biggest change yet in Project Threepio. Now with improved Unicode support, it can theoretically accommodate any language. Also, after some hand-wringing, I decided that although I still prefer official or otherwise verified subs, I was okay using fan-created subtitles if there were no alternatives. Dutch is the only language in this group based on the official subs--all others are from essentially unverified Internet sources.
Chinese, Russian, Arabic, and Turkish subs were hand-resynced from SE subtitles, so they've had the extra dialogue removed, but altered dialogue may still be present. I used Google translate to more-or-less verify where I was in each film, which yielded occasionally hilarious results (my favorite: "The enemy is in your behind!"). There are also dropped lines here and there, but nothing too critical.
I went back to the English subs and changed a few subs that were needlessly taking up two lines to one-liners (i.e. "I find your lack of faith disturbing.") Since my subs are CIH-safe, covering less vertical area is important. I also fixed some minor errors in ESB and ROTJ that nobody had caught yet. Also, ROTJ subs for all languages were adjusted to better match video derived from PAL sources (Harmy and DJ/U2).
A word of warning for those who use my SRT files directly: First, they are now UTF-8 encoded, and that may cause some problems. If so, it's pretty trivial to change encodings with something like Notepad++. Secondly, watch out for Arabic. If your software doesn't handle RTL languages right, this can get really screwed up, with punctuation moving to the wrong side of the text, etc. I really had to do a lot of work to get this working with my software. Last, use a real Unicode font for Chinese. If you try to render it with a typical font, it will find maybe half of the characters in that font, then fall back to another font for the other half, and they will look terribly inconsistent. I used Arial Unicode and thought the results looked pretty nice.
Project Threepio now covers a very respectable percentage of the planet. The problem with this is that it makes the gaps even more apparent, and the biggest gaps might not change any time soon. Hindi/Urdu and Bengali are a complete black hole--I can literally find nothing for these languages. Japanese and Korean I can't find subtitles for the complete trilogy (someone with a copy of the Japanese GOUT would be my BFF). And Indonesian appears to have the same issues as Nordic languages--the words are so long that a direct translation would fill up the whole screen with words, so some sort of trimming/summarization is required. It's certainly the most doable of the bunch, but it's still problematic. The only languages that seem really feasible are more European languages, and I may get to that--after a long break.