I have absolutely no idea if this would work or not, but it seems to me that theoretically it could.
My thought on this project is based on the "My Neighbor Totoro" original english DUB - I'd love to hear the vocal talent from that release grafted onto the higher quality soundtrack backing the Japanese and Disney DUB.
Of course, this only even theoretically works if the audio is in perfect sync.
Usually when you combine audio or video, you use a regular mean average. Sometimes when you're capturing video and want to remove artifacts, you use a median average that eliminates outliers.
What you don't see a lot is MODE averaging, where the most commonly occurring pixel or sample is used.
I'm proposing a potential breakthrough in movies with dubbing at least, using a MODE average process to compare and combine audio tracks on a Per-Sample basis (or 1ms basis... just something very small).
In the case of Totoro, my thought is this - Use the following Audio Tracks
1 - Japanese Blu Track
2 - Disney Blu Track
3 - Original Dub
4 - Original Dub duplicate
5 - French Blu Track (or any other foreign language track that appears to use the same soundscape as the other Blu Tracks)
In this scenario, hopefully the 3 Blu-Sourced tracks would provide the Most Commonly Occurring Samples/sections for music and background noises/effects. In situations where there was talking, the 3 different Blu Languages would be too different, allowing the two identical Dub tracks to be the most commonly occurring.
I'm unsure exactly how to then implement this, but I'd imagine somehow phase inversion and/or difference extraction could then be used on a track like this to isolate dialogue in the other tracks as well. While not important for the project I'm interested in here, I imagine something like that would be very helpful for Star Wars edits.
Is any of this possible? Could anyone explain to me how to do it? Or should I go to sleep because it's 2 in the morning and I'm going to have no clue what I even meant here in the morning?