Shenmue Audio Restoration Project



﻿ What is it?



The Shenmue Audio Restoration Project aims to restore the voices to their full quality, as their quality on PC is substandard at best, leaving much to be desired from this port.





What do you mean "restore the voices?"



When D3T ported this game, they converted all of the audio from the Yamaha AICA ADPCM Stream format, to the xWMA format. For this use, xWMA is wholly unsuitable, and completely destroys any semblance of audio quality. Beyond that, while the xWMA encoder they used (more on how I know what specific encoder they used in a minute) encodes audio at 48kbps by default, D3T opted to encode (most of) the voice files in the game at 20kbps. In the case of Shenmue, this is especially bad because the audio quality for the voices was already pretty poor (understandable for the time, but still poor). The aim of this project is to restore the original voice files as 1:1 as possible.





So if xWMA files are bad, and they're what the game uses; how are you doing this?





Well, it just so happens that the game also accepts .WAV files if you name them identically to what D3T did. Oh yeah, they also changed the filenames from something descriptive like 01AYUA008 (yes, that's actually descriptive), to generic filenames like file_1, file_2, file_3 etc. This wouldn't be an issue, except everything doesn't line up 1:1. The first file (alphabetically) in the Dreamcast archive isn't necessarily the first file in the PC archive.





That seems like a problem; how did you find the correct filenames?



Well this is where things get interesting. So, as I said, D3T changed the filenames to generic ones like file_1, but what's really interesting are the file extensions. The files in the PC release's archives end in .wav.xma. Considering the Shenmue Translation Pack converts the .str files from the Dreamcast release into .wav, I took a wild guess that they used the Shenmue Translation Pack in making this port. Noticing that extension is how I figured out how to load .wav files into the game. It took me a few more days before I really figured out what was up. At first, I thought I was going to have to listen to each file, transcribe it, do the same for the other release, and then compare the two by hand. In desperation to find a way to make this easier, I decided to try and encode some xWMA files myself, from my .WAV sources, and see if I could use filesizes to compare them. In my experimentation I noticed I had two files that weren't just close in size, they were identical, so of course, I checked their MD5 hashes. It would be absolutely absurd, but super useful if I were able to generate identical files... Which is exactly what I wound up doing, the MD5 hashes matched. I checked it again and again and again. It matched. From there, everything just sort of spiraled, and I got my good friend Davidokuro to whip up some scripts to compare my XMAs against theirs, and rename .wav files wherever there was a match. It's one of the dumbest ideas I've ever had; and it worked perfectly. Many revisions of the script later, and here we are.



Wait so... Where are we?



Well, we're in a few places.



First off, we're right here on the Nexus. You can download my .zip right here on the Nexus, unzip the scene folder inside to your mods folder (Forklift mod loader required), and you're good to go.



We're also on Github. Remember how I said Davidokuro whipped up a few scripts for me? Well, we put them on Github, so anyone can go through the process I did of extracting WAVs, extracting XWMs, converting WAVs to XWMs (twice; once for 20kbps and once for 48kbps, in order to match 99.9% of filenames, there's like 10-15 total throughout the whole game that still don't match up because they're encoded at 32kbps.). This script is called pcdcall and it can be found on his Github, which I will link at the bottom of this post.



There's one more thing on that Github as well. We realized that the whole matching filenames thing only has to be done once. As long as we log it, we never have to do it again... And we can write a script to just rename the .WAV files; then you just have to unpack the PC archives, replace the .xwm files with our fake .xwm files, and re-pack the archives. Unfortunately, you would still need to know the hashes of the PC afs files... luckily those are also all documented on the Github.



So what's next?



Well, Shenmue is now done, and it's time to get to work on Shenmue II. I think I have most of the groundwork laid out for it; but it's going to be a bit more involved of a process (there are some things I just can't script, unfortunately), and there are some things that I haven't been able to figure out yet; but that should be solved soon, since I'll be changing the order in which I wanted to do things, and am now doing English first for Shenmue II, since it'll be easiest for me to figure out problems with.



Shenmue II Japanese dub next, and then European dub (or vice versa, still don't know which is "correct" for this rerelease)



Alright, I downloaded your mod and I think I found an issue



Well, reach out to me either here, on reddit ( /u/Jianmin_Tao ) or on the Shenmue Dojo. I can't guarantee any support (and I will not be supporting the scripts on Github, half because that's Davidokuro's job, and half because they're adequate for my usage, and I have not experienced any bugs), because tracking down individual files in this is a nightmare; but I'll take a look into it.



I also want to note that there are a few (specifically two) known issues that I'm not entirely sure how I'm going to handle going forward, so keep in mind that these may be fixed in an update, and they also may not.



- I did not convert the BGM01/BGM02/BGM03 archives, these contain things such as the intro/ending cutscene's BGM, the item collection sound(s), and the notebook music. They also may or may not be present in the .CSV files on the Github (I legitimately just don't remember). I did not convert these because they're all music or SFX, and XWMA isn't too horrible a choice (I still wouldn't recommend it though) for this use case, so I can stomach the audio. I also encountered a bug with BGM01, where it would not load the second part of the intro cutscene's BGM. Rather than have some bastardized half-restored file, and considering that it doesn't sound TOO bad, I left them all as-is



- Some voice clips were legitimately edited before being inserted into the game; I did not re-do those edits, so when those lines get triggered, something funky may happen; I don't know, I've done almost no testing whatsoever on this pack, all I know is that everything should work. These files ARE noted in the CSVs on the Github, so you don't have to worry about them unless you're doing the whole conversion start-to-finish as I have, which nobody should ever ever ever ever have to do again.



- This isn't necessarily a bug, but just something to watch out for if you're doing all the conversions start-to-finish. It seems the version of xwmaencode you use matters. I do not know which version it is that I have that works, I just know the one Davidokuro got the other day didn't work



Instructions:

1) Install the Forklift Mod Loader

2) Move the Scene folder from the zip you downloaded into your Mods folder

3) Play the game without your ears wanting to murder you





Special Thanks



Raymonf - For doing so much work on capturing/reversing hashes. I couldn't have even begun this project without Wulinshu and your unpacking tools.



SHENTRAD Team - I cannot overstate just how instrumental your translation pack has been to the rereleases of these games.



Davidokuro - Lead Programmer on this project, and just a damn good friend



Lemonhaze - Too much to list here, dude's been going crazy tearing apart the executables for these games, and wrote a MUCH appreciated script that automated AHX conversion for Shenmue II.



Bluemue - For being my guinea pig, as well as a repository of knowledge on the original releases, you really helped a lot dude.



I'd also like to thank anyone that's been following along with this project. It's been a hell of a ride and I can't believe that we've gotten this far in under a month.