logo Sign In

CatBus

User Group
Members
Join date
18-Aug-2011
Last activity
24-Sep-2025
Posts
5,979

Post History

Post
#1508103
Topic
Project Threepio (Star Wars OOT subtitles)
Time

Muyfa666 said:

Hmm… I can’t seem to find a non-SDH ENG full in the archive. Am I missing something?

Under most circumstances, you’ll want to use the localized English subtitles, not the full English subtitles. Most people won’t want English full subtitles, so those are not rendered into PGS subtitles by default. You can still get them in SRT format, or use the provided instructions to create your own PGS subtitles.

“Full” subtitles subtitle Greedo and Jabba’s lines, and “localized” subtitles do not. Since most Star Wars preservations include burnt-in English subtitles for Greedo’s and Jabba’s lines, you should use “localized” for English and “full” for non-English. The names and varieties of the subtitles are a little complicated by the fact that I’m trying to be compatible with subtitle-free preservations, foreign-language preservations, etc. Check the README for “Which file should I use?”

Post
#1502516
Topic
International Audio (including Voice-Over Translations)
Time

A couple things to note: I’m keeping both versions of the Polish dubs indefinitely, both the standard dub and the voiceover-style dub. This is because Polish audiences generally have a preference for voiceovers, but certainly there are some who dislike them just as much.

The other thing is that I’ve now upgraded all of my Mandarin dubs to good-quality stereo dubs (despecialized).

Other new and exciting languages are also in the works.

Post
#1496620
Topic
Project Threepio (Star Wars OOT subtitles)
Time

Building on some things I mentioned earlier, adding Nastaliq scripts (Urdu and Afghan Persian) was a little more difficult than your average new script. Other scripts have varying levels of conformity to a consistent baseline and character height. Chinese characters (at least in Noto Sans CJK) are the best at this. Given even a fairly small sample of characters, it’s pretty easy to gauge exactly where they should be placed, and know exactly how much vertical space they will take. Latin characters are slightly more complicated, with descenders going below the baseline, and accents sometimes exceeding the normal max height. Until now, the scripts that played around most in that arena were Burmese and Thai – but still, pretty tame, relatively speaking.

Nastaliq threw all that out the window. There are huge vertical variances, with a baseline that’s more theoretical than ever before. So it takes up quite a lot of vertical space, but it also threw a wrench into a lot of my old processes.

For example: dual subtitles. If you want dual English and Chinese subtitles, no problem. The Chinese subtitles get shifted down into the black bars, the English subtitles stay where they are, and everything just works. But if you wanted to make dual English and Afghan Persian subtitles, the Nastaliq subtitles take up so much vertical space that, even pushed down as far into the black bars as possible, they will still overlap the English subtitles. So, in that case, I shift the English subtitles up into the top black bars, and then you can have two sets of subtitles going with no overlap.

Another case was “unshifted” subtitles. There are cases where, for example, if you’re dealing with a Star Wars preservation with no burnt-in subtitles, you will no longer want subtitles for the alien dialog to be shifted to the top of the screen. So “unshifting” these subtitles requires calculating where these subtitles should now be placed, and the existing formulas broke hard for Nastaliq. And to be honest, they were also a little broken for Thai and Burmese as well. Now, they all work great.

Anyway, that’s just an example of how adding a new language isn’t always a simple matter of plugging in a new SRT file and loading a new font. Localization is hard!

Post
#1496557
Topic
Project Threepio (Star Wars OOT subtitles)
Time

Project files have been updated to version 13.0 (codename: “Endurance”), and the first post has been updated. Please PM me for temporary download links until the files are available at some more permanent locations.

Project Threepio has now been providing subtitles for the Star Wars Original Unaltered Trilogy for more than ten years!

Rough summary of changes from 12.3 to 13.0:

  • Added new language: Afghan Persian (Dari)
  • Changed language code for Iranian Persian from fas to pes, to better distinguish between various forms of Persian
  • Added Canadian French subtitles for ROTJ (thanks to schorman13). Under normal circumstances, I don’t add subtitles that aren’t trilogy-complete to Project Threepio, but it appears at the moment that there are official Canadian French subtitles available only for ROTJ, and I prefer for there not to be any reason for anyone to feel obligated to ever watch a Special Edition, if I can help it.
  • Improved Dutch subtitles (thanks to frater)
  • Added Urdu and Polish titles-only subtitles to accompany dubs (the Urdu subtitles are to accompany the Hindi dubs)
  • Modified Indonesian titles-only subtitles to accompany complete dubs
  • PGS subtitles now have limited support for cropped video (i.e. video where the black bars have been removed). Some background: graphical subtitles are designed to work only with a 1.78:1 video frame, and their behavior with video cropped to a different aspect ratio is not defined. Playback software can deal with the aspect ratio mismatch in different ways, potentially resulting in a variety of non-optimal results. Project Threepio’s scripts now allow you to alter subtitles so that they look normal during playback with cropped video, but that process works only for specific playback software. The best solution for cropped video is still to re-encode the video with black bars, expanding the aspect ratio to the expected 1.78:1.
  • Hypothetical 24.000fps preservations are now fully supported in all included scripts and documentation, although all included subtitles are 23.976fps, as always
  • The render script can now use Python’s Pillow module for improved speed (but it is still extremely slow by any reasonable measure)
  • Improved cross-platform compatibility for all scripts (this was so badly broken before that it could be considered a new feature)
  • Fixed some bugs in the render script, and re-rendered all subtitles
  • Improved the algorithm that estimates the position of “unshifted” subtitles in various scripts
  • Reorganized some files (most noticeably, the sup folder has been renamed the pgs folder, and native subtitles have been renamed localized subtitles)
  • Pre-rendered PGS subtitles are now included for all pre-1997 film variations except the 70mm cut of Empire (if needed, 70mm subtitles can be created using the provided instructions). Be warned, this means the total size of Project Threepio is much larger than earlier versions.
  • Pre-rendered 720p subtitles are no longer included with the project files. 720p subtitles are still supported and can be created using the provided instructions.
Post
#1492660
Topic
Project Threepio (Star Wars OOT subtitles)
Time

It’s been a while since I’ve put a “Here’s an interesting thing I learned today” post out here. It started with this article (from 2013, so the situation may have greatly improved since then):

https://medium.com/@eteraz/the-death-of-the-urdu-script-9ce935435d90

First off – Urdu subtitles are still not really an option unless I find a new source. The Urdu subtitles floating around out there on the usual Subscene-type sites are clearly machine translations with major problems that are obvious even to me. But machine translations may be good enough for creating titles-only subtitles to accompany the Hindi dub (quick background: Hindi and Urdu, when spoken, are very closely related dialects of a language sometimes called “Hindustani” with only a few minor vocabulary differences. When written, however, they are written in entirely different scripts that are indecipherable to each other. So an Urdu-speaking viewer watching a Hindi dub is a quite likely scenario, and Urdu titles-only subtitles for the Greedo/Jabba scenes would help them.

Back to the different families of Perso-Arabic scripts: if it’s hard for people to imagine how a font can make such a difference, Latin-based languages had a similar schism a long time ago. There was a form of writing called Blackletter which was once very common, but it was eventually supplanted by Roman-style text. Today, Blackletter is only really used when people want to make something extra fancy or medieval-looking. If you had to read a whole book in this script, you could still do it, but it would be very slow-going (and you’d probably also think it’d be worthwhile to find the same book in a different font).

And that’s where people seem to sit on the Naskh/Nastaliq divide, except that both styles remain in active use. The Arabic world has gone with Naskh, parts of South and Central Asia have gone with Nastaliq, and Iran sits in the middle, where Naskh is used for everyday usage, and Nastaliq is used pretty much exclusively for poetry and decorative script. Using the wrong font in the wrong place ends up with people feeling that they’re reading some strange, foreign and unnecessarily difficult-to-read text (you may see old Ottoman-era signs with Nastaliq script in Naskh-using areas, but that’s much like Blackletter today – it does still exist, but it’s very limited in usage). But Naskh clearly rules the digital world, and Nastaliq areas just suffer with it. It’s so bad that, according to the article, in Nastaliq areas, people hate using Naskh on their smartphones so much that it has given rise to Urdu transliterated into Latin – a whole new informal, unstandardized writing system created for lack of a font!

So, long story short, when I create these Urdu titles-only subtitles, they will be Nastaliq. But I will likely also create Afghan Persian (Dari) subtitles. Afghan Persian is basically the same as Iranian Persian with a few vocabulary and pronunciation differences, but in written form they’re very, very close, except that Iranian Persian (for non-poetry) is typically Naskh and Afghan Persian is typically Nastaliq. This is not entirely new territory – Simplified and Traditional Mandarin have a similar split, where they’re mutually legible but just feel wrong when used for the wrong audience. Serbian is available in both Latin and Cyrillic scripts, and so on.

And after working with this for a bit, I can also understand why Nastaliq is having such difficulty in a digital world. The script is a horizontally-written (RTL) script, but it doesn’t sit neatly on a straight baseline like most other scripts. Instead, it sort of hangs in the air and words tend to form these beautiful-looking cascading diagonals (which likely present all sorts of issues to the font renderer/shaper). But if you’re pressed for vertical space, that sort of layout can be a problem – and subtitles are definitely pressed for vertical space. Nevertheless, I think I’ll be able to make it work. Nastaliq subtitles will definitely cover more of the screen vertically than their Naskh counterparts, but it seems workable. In my opinion, if you have only a couple seconds to read text before it disappears again, it should be in the easiest-to-read format possible!