Decompiling Tenchu: Stealth Assassins part 10: potpourri status update

Previously in this series of articles, we’ve completed the rescue of the debugging data that was nearly lost to the mists of time onto a more tangible vessel. A bunch of stuff happened since then, but nothing worthy of an article by itself, so I guess I’m going for a potpourri status update now.

The version tracking trail

A while ago, I found a SYM file containing debugging symbols for an early, lost build of the game. That file is a treasure trove of symbol names, data types and function prototypes, far too precious to pass over, which is why I’ve spent a significant amount of time and effort to rescue it.

However, since I eventually want to work on the latest release of the game, this data needs to be version tracked all the way across the entire span of the game’s release timeline to use it there, a multi-step journey all by itself:

From PSX.SYM to PSX.SYM.elf, a placeholder program in the shape of the lost build to hold that debugging data inside Ghidra ;
From PSX.SYM.elf to PSX.EXE, the earliest, non-working artifact we have of the game that is a close match ;
From PSX.EXE to Rittai Ninja Katsugeki Tenchu’s GAME.EXE, the last version of the game that is similarly architectured into a single monolithic executable ;
…

This third step has been completed, but unlike the previous ones it was just a normal version tracking session that didn’t require any dark magic to pull off, so I didn’t bother writing a dedicated article over that.

This debugging data doesn’t quite cover the entire GAME.EXE file, but I have a mostly-annotated version of the game I can work with for now. Therefore, while the objective is still to version tracking that data all the way to Rittai Ninja Katsugeki Tenchu: Shinobi Gaisen, I’m putting that goal aside for the time being.

The Tenchu modding community

While I was underway with the latest version tracking session, I’ve joined the Tenchu Speedrunning Discord server that serves as the hub of the Tenchu community and announced my reverse-engineering project there.

The Tenchu series hasn’t seen a new release in over 10 years and is a bit of a forgotten cult classic at this point. Also, the speedrunning activity for the original game in particular is at a low ebb compared to a couple of years ago, so that Discord server is a somewhat quiet place these days.

That being said, I did get in touch there with a couple of Tenchu modders that are very interested by the contents of the Ghidra database I’ve built as part of this project. In particular, some lights have been shed on the animation and move set systems of the game (during vocal chat sessions late at night), which is key to creating fully playable versions of enemy and boss characters.

By sharing that data with the Tenchu community, it enables modifications more ambitious in scope than ever before by modders, at least until the decompilation project itself is completed.

It will take a while before we’ll see the results (the modders are still digesting all that new information and experimenting with it as I’m writing this), but at some point some new videos are likely to appear on Teslafane’s YouTube channel, among other places.

The delinking debacle

I have made several improvements to my Ghidra extension, but I have come to realize that delinking MIPS code in an automated fashion is an absolute, unmitigated nightmare to pull off for several reasons.

While Ghidra has a concept of references, it does not model the relocations I require for delinking. I’m working around this problem with custom-built analyzers that try to identify relocation spots based on the references, but Ghidra’s own automatic analyzers have a tendency to annotate references whenever and however they want, in a manner that confuses and frustrates my tooling and me.

Even with my latest improvements in pattern matching, a fair amount of manual cleanup of references is still required in order for my extension to successfully delink code, especially within large functions or with complex memory access patterns.

The real hair-pulling part, however, comes in with the specifics of the MIPS architecture. In particular, a couple of quirks and features of this instruction set interact together into an absolute mess to pick apart:

HI16/LO16 relocations are split across two instructions and relocations entries, as such they can be placed quite far apart from each other ;
The RISC-flavored, register-and-immediate load/store model means that complex addressing patterns requires a lot of instructions to synthesize… and the two HI16/LO16 instructions will be located somewhere in that chain ;
The branch delay slot can lead to some peculiar instruction scheduling if the toolchain doesn’t put a no-op in there.

I’ve accumulated some nasty test cases inside my test suite, gleaned from the game’s code, that stem from these interactions. This is the kind of gnarly stuff I’m dealing with:

#include <asm/reg.h>

.text
.set	noreorder
test:
	beq	$a0,$zero,1f
	lui	$v0,%hi(HELLO_WORLD)
	lui	$v0,%hi(GOODBYE_WORLD)
	j	2f
	lb	$v0,%lo(GOODBYE_WORLD)($v0)
1:
	lb	$v0,%lo(HELLO_WORLD)($v0)
2:
	jr	$ra
	nop
.data
HELLO_WORLD:
.asciiz	"Hello, world!"
GOODBYE_WORLD:
.asciiz	"Goodbye, world!"

It might look obvious with the relocation annotations in the assembly code, but remember that all we have to work with inside Ghidra are references on the two lb instructions and a bunch of immediate integer constants.

For this particular case, I had to integrate code block flow analysis into my analyzers in order to untangle the usage of the v0 register.

Unfortunately, that’s not the most messed up example I have. One particularly… creative… instruction pattern present in the game that my Ghidra extension doesn’t handle at the moment looks like this:

                             LAB_80044468                                    XREF[1]:     8004443c(j)  
        80044468 05 00 9e 14     bne        a0,s8,LAB_80044480
        8004446c 08 80 03 3c     _lui       v1,0x8008
        80044470 01 00 09 24     li         t1,0x1
        80044474 b8 00 a9 a7     sh         t1,local_68(sp)
                             LAB_80044478                                    XREF[1]:     80044460(j)  
        80044478 74 07 80 a7     sh         zero,0x774(gp)=>DAT_8008bca8
                             LAB_8004447c                                    XREF[2]:     800443e4(j), 80044458(j)  
        8004447c 08 80 03 3c     lui        v1,0x8008
                             LAB_80044480                                    XREF[1]:     80044468(j)  
        80044480 18 2e 63 24     addiu      v1,v1,0x2e18
        80044484 01 80 02 3c     lui        v0,0x8001
        80044488 02 00 42 90     lbu        v0,offset PersistentState.field0_0x0+2(v0)
        8004448c b0 00 a9 97     lhu        t1,local_70(sp)
        80044490 40 10 02 00     sll        v0,v0,0x1
        80044494 21 10 43 00     addu       v0,v0,v1
        80044498 00 00 42 94     lhu        v0,0x0(v0)=>DAT_80082e18                         = 4Dh    M

It’s a bit hard to follow, but basically as far as I can tell the LO16 relocation located inside the addiu instruction at 80044480 has two HI16 counterparts in this snippet:

The first one is the lui at 8004446c, in the delay slot of the previous bne branch instruction ;
The second one is the lui at 8004447c, right before the target of that same branch instruction.

That’s right, the lui instruction targeted by a HI16 relocation has been duplicated.

My best guess is that the assembler figured out it could shave off one instruction from the execution flow by copying the HI16 instruction inside the branch delay slots of the branch instructions targeting it, then shifting the branch targets to one instruction later to skip the original HI16 instruction, which remains for the fallthrough case.

Sep 24, 2024: when I originally wrote this article, I thought that this pattern couldn’t be represented by the R_MIPS_HI16/R_MIPS_LO16 MIPS relocation pair of the ELF file format, which is the object file format I was exporting to. It turns out that there is a GNU extension for this, as reported by mono21400 on the decomp.me Discord server.

It’s not mentioned inside the SYSTEM V APPLICATION BINARY INTERFACE MIPS® RISC Processor Supplement 3rd Edition document that I was leaning on. The MIPS psABI documentation in general is quite derelict, too.

Furthermore, my data model for synthesized relocations inside my Ghidra extension, which in theory is agnostic of any particular object file format (but in practice is heavily inspired by ELF because that’s what I’m familiar with), also can’t represent this.

A particularly cunning reverse-engineer might observe that patching the bne branch instruction to target one instruction earlier, so that the first lui inside the delay slot would become irrelevant because the second lui that would then get executed in all cases, reduces the HI16/LO16 pattern down to one manageable pair.

It seems to work, but altering the code flow by hand to undo this optimization and normalize the HI16/LO16 relocation pairs is a rather fiddly, intrusive, extreme and error-prone solution… Looks like I’ll have to improvise another crazy contraption to deal with this.

While each improvement of my MIPS analyzers reduces the quantity of unhandled relocation patterns, the remaining ones become harder and harder to address. If there’s a limit to how much bullshit I’m willing to put up with in order to dial in my delinker, this latest issue is getting really, really close to it.

The linking lead

I’ve already repurposed some of its bits and pieces to run on Linux before, but while I can generate an ELF object file of the whole game, it doesn’t mean that it actually works. Or that I can actually link it as a Linux program, for that matter.

On the Linux front, while PsyCross is an obvious seemingly drop-in replacement for the original Psy-Q SDK, it is not currently suitable in practice due to multiple missing functions and even entire libraries used by Tenchu. Even if I were to fill in the blanks and get it to link, splattering lots of PlayStation code inside a Linux MIPS process is bound to create all kinds of havoc, on top of the experimental and temperamental nature of my delinker tooling.

Therefore, a more reasonable stepping stone would be to first delink and relink the game’s code as a PlayStation program using the original Psy-Q SDK. If the game still works with the game’s code and data shuffled around, then the delinking process will be a success and I can go back to running over ABIs like a madman…

But of course it’s not quite that simple.

I do not want to use the original Sony toolchain because it’s thoroughly obsolete and I don’t have an exporter to its proprietary object file format. To work around this, I’m cobbling together an ELF conversion of the Psy-Q SDK out of various tools. This introduces new sources of issues into the mix, but I’m fixing the bugs as I encounter them.

At the moment, I have enough game code delinked and relinked successfully that I can play the briefing screen of a mission correctly and without crashing on a PlayStation emulator, but not much more. Another achievement that’s not worth an article by itself, at least not until I can get the inventory selection screen running properly.

Conclusion

So progress is being made behind the scenes on a bunch of topics, it’s just that nothing’s ready for a write-up at the moment. Having ran out of articles in my buffer for over three weeks now, I figured I might as well write something to document what’s happening.

If it’s not obvious by now, the publication schedule of this blog is definitely not regular anymore. Use the RSS feed if you want to be notified whenever I get around to publish something.

« Decompiling Tenchu: Stealth Assassins part 9: rescue the debugging data

Decompiling Tenchu: Stealth Assassins part 11: a modding framework powered by the tears of CS101 teachers »