Previously in this series of articles, I’ve provided a potpourri-style update on this project due to a lack of demonstrable progress. Since then, the wretched demon of optimizing MIPS assemblers vacuuming up copies of instructions with HI16 relocations inside branch delay slots has been tamed, so we’re back on our irregularly-scheduled program.

In this part, I’ve managed to pull off yet another affront to computer science: a rather extreme and unconventional take on binary patching. Specifically, we’ll leverage the delinking technique heralded in this blog to chunk a program into giblets, swap out the parts we want and make the linker mend it all back together in one piece.

This blog post mutilates the desiccated remains of a computer program from 1998 in a most gruesome and unnatural way, all in the name of science. Its contents may be offensive to linker developers, software supply chain specialists or CS101 teachers.

As such, this is your one and only warning to look away before I start the chainsaw.

Previously, on Decompiling Tenchu: Stealth Assassins

To understand where this comes from, a quick summary of the history of this project is needed.

As part of my ongoing project to decompile Tenchu: Stealth Assassins, I’ve managed to build a decently annotated Ghidra database of Rittai Ninja Katsugeki Tenchu’s GAME.EXE, the executable for the original JP release. It’s already a very useful resource for modders of the game, but by itself is only a repository of knowledge, with no practical applications.

In parallel, over the past two years I’ve developed and fine-tuned a rather peculiar reverse-engineering technique known as delinking. What started as a bunch of scrappy Jython scripts eventually evolved into a Ghidra extension that enables one to export a program selection as a working, relocatable object file in two mouse clicks.

These two things combined mean that I can turn GAME.EXE into one or more reusable ELF object files. In turn, I can relink them to create a new executable and still get something that manages to go in-game without crashing… but I needed something a little extra to write an article on this latest breakthrough.

There are still a bunch of problems to fix inside the Ghidra database and probably also in my Ghidra extension that results in various bugs and crashes, if you tickle the rearranged artifact in the wrong spot.

Nevertheless, I’m still bewildered that this abomination runs at all, let alone that you can stroll around the level, use items and fight enemies as usual.

Frankenstein’s monster, computer program edition

So, we can delink GAME.EXE back into object files, put these back together with a linker and it mostly works. This is neat by itself, but it’s not very useful to craft versions of the game with functions and variables just shuffled around in memory. However, while sitting in a voice chat on the Tenchu Speedrunning Discord server, observing a live stream where bytes were being meticulously patched by hand using Cheat Engine, I got an idea…

What happens if we start changing some of the pieces?

This is something I’ve done previously, but only as a proof-of-concept on a toy program. This time, I’m doing it on a closed-source executable that’s orders of magnitude bigger.

In other words, this is for real.

The butcher’s tools, dripping wet with blood binary code

We can’t just stuff a C source code file at the end of the linker’s invocation alongside the rest of the original object files, as we would run into multiple symbol definition conflicts between the original game’s code and our additional source code. We could manually leave out the parts from the program that we are replacing during object file exportation, but that’s tedious, error-prone and worst of all boring.

What we need is a way to use the original game’s code as-is, but easily override the parts we want with our own code. So after tinkering a bit to find a practical workflow, I ended up with this innocent-looking Makefile:

#!make

-include .env.mak
include settings.mak

define makelibs
$(foreach lib,$(1),-l$(lib))
endef

define makedefsyms
$(foreach sym,$(1),--defsym=$(sym))
endef

define makesrcobjs
$(subst $(1),$(3),$(subst $(2),.o,$(wildcard $(1)/*$(2))))
endef

define makeoverridableobjs
$(filter-out $(sort $(patsubst %,%.o,$(subst $(2)/,$(1)/,$(basename $(wildcard $(2)/*))))),$(wildcard $(1)/*))
endef

all: build/$(EXECUTABLE).exe

clean:
	rm -rf build/*.exe build/*.elf build/*.o

build/%.o: src/%.c
	$(CROSS_TOOLCHAIN)gcc -fno-pic -mno-abicalls -Wa,-mno-pdr -march=r3000 -mfp32 -I include/ $(CFLAGS) -c $< -o $@

build/%.o: src/%.S
	$(CROSS_TOOLCHAIN)as -march=r3000 -mfp32 -mno-pdr -I include/ $(AFLAGS) $< -o $@

build/%.weakened.o: obj/%.o
	$(CROSS_TOOLCHAIN)objcopy --weaken-symbols=weak-symbols.txt $< $@

build/$(EXECUTABLE).elf: $(call makesrcobjs,src,.c,build) $(call makesrcobjs,src,.S,build) $(patsubst obj/%.o,build/%.weakened.o,$(call makeoverridableobjs,obj,src))
	$(CROSS_TOOLCHAIN)ld -nostdlib -no-pie -static -melf32ltsmip -T $(PSn00bSDK)/libpsn00b/ldscripts/exe.ld -Ttext=$(TEXT_START) $(LDFLAGS) $(call makedefsyms,$(DEFSYMS)) $^ $(call makelibs,$(LIBS)) -o $@

build/$(EXECUTABLE).exe: build/$(EXECUTABLE).elf
	ps1-packer $< -o $@

What this Makefile does is create a program from both object files and source code files. To override the original game’s code with our own, we can either provide a source code file that automatically overrides an object file with the same name, or weaken specific symbols from the delinked object files so that our new code is prioritized by the linker.

The layout is as follows:

  • The object files delinked from the original game are stored inside obj/ ;
  • The declaration headers for the game’s code are placed inside include/ ;
  • The source code newly written by us is placed inside src/ ;
  • The build artifacts are outputted to build/.

The jigs and fixtures of our modding factory are set, all we need to do now is push code through it.

Let the chimeras roam free

Say we want to modify the debug menu of the game, located within the DoInfoViewProc() function. With binary patching, we’d have to patch or detour the function, find a place to put our code and data if they don’t fit inline, mend anything that we have moved around or modified, hoping that we didn’t miss anything that would cause crashes or glitches at run-time…

… or we could just write our code, as if nothing is out of the ordinary:

#include "adt.h"
#include "camera.h"
#include "human.h"
#include "infoview.h"
#include "semng.h"
#include "uncategorized.h"

#define NULL (0)

void DoOptions() {
    struct TAdtSelect menu[3] = {
        { "Yes", 0 },
        { "No", 1 },
        { NULL, 2 },
    };

    switch (AdtSelect("Are you cheating son?", menu, 0)) {
        case 0: AdtMessageBox("Naughty boy!"); break;
        case 1: AdtMessageBox("Good boy!"); break;
        default: break;
    }
}

void DoSelectItem(unsigned short padTrig) {
    int i = SelectedItem;
    int direction;
    
    if ((padTrig & 0x2) == 0) {
        if ((padTrig & 0x1) == 0) {
            return;
        }

        direction = -1;
    } else {
        direction = 1;
    }

    do {
        i += direction;

        if (i > 0x18) {
            i = 0;
        } else if (i < 0) {
            i = 0x18;
        }
    } while (((CamState.Owner)->item[i] == 0) && (i != SelectedItem));

    SelectedItem = i;
    SoundEx(NULL, 0xb);
}

void DoInfoViewProc() {
    unsigned short nowPad = GetPad(0);
    unsigned short padData = ((CamState.Owner)->pad).data;
    unsigned short padTrig = ((CamState.Owner)->pad).trig;

    if (fInitialize == 0) {
        InitializeInfoView();
    }

    if ((SystemFlag & 2) && nowPad == 0x0003) {
        DoOptions();
    }

    if ((padData & 0x10) == 0) {
        DoSelectItem(padTrig);
    }

    PutItemList();
    PutLifeBar(-0x6e, 0x61, (int)(CamState.Owner)->life, (int)(CamState.Owner)->lifemax);
    PutLifeBarS();
    PutStrain();

    if (((nowPad & 0x0100) == 0) || (SystemFlag & 5)) {
        PutMapMode = 0;
    } else {
        PutMap();
    }

    PauseProc();
}

Being a PlayStation game from 1998, developed by a newly-created studio staffed with talented but inexperienced people, DoInfoViewProc() does a bunch of things that probably should’ve been divided up into multiple, separate functions.

Since we detour it, our replacement needs to replicate its functionality, hence all the extraneous boilerplate. Keep in mind that this is a cleaned-up version of the function, the original DoInfoViewProc() is quite a bit more messy and convoluted than this.

Then, we add DoInfoViewProc into weak-symbols.txt, plop object files delinked from GAME.EXE inside obj/, stash our source code inside src/ and let the linker do the work for us:

$ make
mipsel-linux-gnu-gcc -fno-pic -mno-abicalls -Wa,-mno-pdr -march=r3000 -mfp32 -I include/ -std=c11 -Os -Wall -Werror -nostdinc -isystem /home/boricj/Documents/PsyQ/elf/include -c src/cheats.c -o build/cheats.o
mipsel-linux-gnu-gcc -fno-pic -mno-abicalls -Wa,-mno-pdr -march=r3000 -mfp32 -I include/ -std=c11 -Os -Wall -Werror -nostdinc -isystem /home/boricj/Documents/PsyQ/elf/include -c src/libsn.c -o build/libsn.o
mipsel-linux-gnu-gcc -fno-pic -mno-abicalls -Wa,-mno-pdr -march=r3000 -mfp32 -I include/ -std=c11 -Os -Wall -Werror -nostdinc -isystem /home/boricj/Documents/PsyQ/elf/include -c src/__main.c -o build/__main.o
mipsel-linux-gnu-gcc -fno-pic -mno-abicalls -Wa,-mno-pdr -march=r3000 -mfp32 -I include/ -std=c11 -Os -Wall -Werror -nostdinc -isystem /home/boricj/Documents/PsyQ/elf/include -c src/valloc.c -o build/valloc.o
mipsel-linux-gnu-as -march=r3000 -mfp32 -mno-pdr -I include/ -I /home/boricj/Documents/PsyQ/elf/include src/start.S -o build/start.o
mipsel-linux-gnu-objcopy --weaken-symbols=weak-symbols.txt obj/game.o build/game.weakened.o
mipsel-linux-gnu-objcopy --weaken-symbols=weak-symbols.txt obj/psyq.o build/psyq.weakened.o
mipsel-linux-gnu-ld -nostdlib -no-pie -static -melf32ltsmip -T /home/boricj/Documents/PSn00bSDK/libpsn00b/ldscripts/exe.ld -Ttext=0x80010100 -L /home/boricj/Documents/PsyQ/elf/lib --defsym=Scratchpad=0x1f800000 --defsym=PersistentState=0x80010000 --defsym=MemoryPool=0x80200000 --defsym=MemoryDiskSentinel=0x807f0000 build/cheats.o build/libsn.o build/__main.o build/valloc.o build/start.o build/game.weakened.o build/psyq.weakened.o -lcd -lgs -lgpu -lgte -lpad -lmcrd -lcard -lsnd -lspu -letc -lc -lapi -o build/game.elf
ps1-packer build/game.elf -o build/game.exe

ps1-packer by Nicolas "Pixel" Noble
https://github.com/grumpycoders/pcsx-redux/tree/main/tools/ps1-packer/

Input file: build/game.elf
pc: 0x800106b0  gp: 0x00000000  sp: 0x00000000
file size: 700444 -> 194560

File build/game.exe created. All done.

We end up with both an ELF executable file, straight out of our modern toolchain, as well as a PS-EXE file native to the PlayStation. I could repack the ISO of the game with it, but I want fast iterations so we’ll just start the game, pause execution the moment GAME.EXE gets loaded and inject our version instead:

Figure 2: Unmodified GAME.EXE with original debug menu
Figure 3: Modified GAME.EXE with custom debug menu

There are some visible bugs in the modified executable (and it will crash when entering a cutscene), but it mostly works. That program has gone through a meat grinder and got chunked into object files, some bits were taken out and some bits were added… and the linker then stitched it all back together into something that somehow runs.

I feel like a hacksaw-wielding maniac that just got away with mutilating a program in plain sight.

CS101 teachers assume that toolchains are a strict progression of source code through compilers, assemblers and linkers to executable, but actually from a non-linear, unacademic viewpoint it’s more like a big ball of wibbly wobbly, bits-y wimey… stuff.

This is why I say that this modding framework is powered by their tears. If I were pulling this kind of stunts back in university, I would’ve been branded a heretic… or at the very least gave the teaching staff huge migraines.

I feel the need… The need for heap.

This is but a single symbol being overridden. What if we wanted to make more ambitious modifications?

Say we want to inject a lot of extra code and data into the game. The original game’s heap occupies the address range 0x800dc000-0x801fc000, leaving only 16 KiB for the stack at the end of the RAM. Delinking it will work, but there’s only so much wiggle room left in the program’s memory layout before we collide with the stack.

One way to address it would be to leverage a PlayStation dev kit and its 8 MiB of RAM, so that we can stuff the heap there by telling the linker that the MemoryPool symbol is at address 0x80200000. This frees 1152 KiB for our modifications, but if our modifications requires a bunch of dynamic allocations then we’re still limited by the original memory allocator’s heap size (and whatever the game left us to play with).

Instead, let’s do something radical and rip it out entirely. We can do so by writing a reimplementation where we can control the heap size, here set to 2 MiB:

#include "valloc.h"

#define NULL (0)

void *memcpy(void *dst, const void *src, unsigned long len);
void *memset(void *dst, int c, unsigned long len);
int sprintf(char *buf, const char *fmt, ...);

#include "adt.h"

extern unsigned char MemoryPool[];
const int MemoryPoolSize = 0x200000;

#define IS_FREE(node) ((node->size & 0x80000000L) == 0)
#define IS_USED(node) ((node->size & 0x80000000L) != 0)

#define MARK_FREE(node) (node->size &= 0x7fffffffL)
#define MARK_USED(node) (node->size |= 0x80000000L)

#define TO_NODE(ptr) (((struct VMhead *)ptr) - 1)
#define FROM_NODE(node) ((void *)(node + 1))

#define GET_NODE_SIZE(node) (node->size & 0x7fffffffL)
#define SET_NODE_SIZE(node, newsize) (node->size = (node->size & 0x80000000L) | newsize)

struct VMhead *virtual_memory_pool;

static void merge_free_nodes(struct VMhead *node) {
    while (node->next != NULL && IS_FREE(node->next)) {
        node->size += GET_NODE_SIZE(node->next) + sizeof(struct VMhead);
        node->next = node->next->next;
    }
}

static void split_node(struct VMhead *node, int offset) {
    if (GET_NODE_SIZE(node) > (offset + sizeof(struct VMhead))) {
        struct VMhead *rest = node->next;

        struct VMhead *newNode = (struct VMhead *)(FROM_NODE(node) + offset);
        newNode->size = GET_NODE_SIZE(node) - offset - sizeof(struct VMhead);
        newNode->next = rest;

        SET_NODE_SIZE(node, offset);
        node->next = newNode;

        merge_free_nodes(node->next);
    }
}

void vinit(void *adr, unsigned long size) {
    if (adr == NULL) {
        adr = &MemoryPool;
    }

    if (size == 0) {
        size = MemoryPoolSize;
    }

    virtual_memory_pool = adr;
    virtual_memory_pool->size = size - sizeof(struct VMhead);
    virtual_memory_pool->next = NULL;
}

unsigned long vgetmaxsize(void) {
    struct VMhead *node = virtual_memory_pool;
    unsigned long maxSize = 0;

    while (node != NULL) {
        if (IS_FREE(node) && (maxSize < node->size)) {
            maxSize = node->size;
        }

        node = node->next;
    }

    return maxSize;
}

unsigned long vgetfreesize(void) {
    struct VMhead *node = virtual_memory_pool;
    unsigned long freeSize = 0;

    while (node != NULL) {
        if (IS_FREE(node)) {
            freeSize += node->size;
        }

        node = node->next;
    }

    return freeSize;
}

unsigned long vsize(void *ptr) {
    return TO_NODE(ptr)->size & 0x7fffffffL;
}

void *valloc(unsigned long size) {
    if (virtual_memory_pool == NULL) {
        vinit(NULL, 0);
    }

    size = (size + 3) & -4;

    struct VMhead *node = virtual_memory_pool;
    while (node != NULL) {
        if (IS_FREE(node) && node->size >= size) {
            split_node(node, size);
            MARK_USED(node);

            return FROM_NODE(node);
        }

        node = node->next;
    }

    char buffer[64];
    int maxSize = vgetmaxsize();
    int freeSize = vgetfreesize();
    sprintf(buffer, "OUT OF MEMORY\nREQUEST=%d\nFREE=%d(%d)\n", (int)size, maxSize, freeSize);
    SystemOut(buffer);
}

void *vcalloc(unsigned long size, char c) {
    void *ptr = valloc(size);
    memset(ptr, c, size);
    return ptr;
}

void *vrealloc(void *ptr, unsigned int size) {
    // FIXME: optimize realloc()
    if (size == 0) {
        vfree(ptr);
        return NULL;
    }

    void *newptr = valloc(size);
    if (ptr != NULL) {
        int oldsize = vsize(ptr);
        memcpy(newptr, ptr, oldsize < size ? oldsize : size);
        vfree(ptr);
    }

    return newptr;
}

void *vmemoryGC(void *ptr) {
    // FIXME: implement vmemoryGC()
    return ptr;
}

void vfree(void *ptr) {
    if (ptr != NULL) {
        struct VMhead *node = TO_NODE(ptr);
        if (IS_FREE(node)) {
            SystemOut("DOUBLE MEMORY FREE");
        }

        MARK_FREE(node);

        merge_free_nodes(node);
    }
}

__attribute((noreturn)) void SystemOut(const char *string) {
    AdtMessageBox("*** SYSTEM OUT ***\n\n%s", string);
    while (1);
}

This crappy, singly-linked list of memory blocks that wouldn’t look out of place in the 1980s follows the design of the game’s heap allocator implementation, but we could have retrofitted a more modern one, with advanced data structures and better fragmentation characteristics if we wanted.

With our fancy Makefile, if we put this code inside src/valloc.c and the original delinked code inside obj/valloc.o then the source code will be picked over the object file automatically.

There’s no video this time because there are no user-visible changes here, but we’ve just swapped out an entire source file from the original program (that we do not have the source code for by the way) with another, ran make to invoke a couple of standard toolchain programs and it spat out a modified executable that runs.

If this was happening inside a movie, I’d be rolling my eyes at that Hollywood hacking scene.

Conclusion

We’ve leveraged the delinking technique to produce modified versions of an executable, by slicing it into object files and then stuffing some additional source code written by us, before relinking it as a whole again. This proof-of-concept is an interesting alternative to the typical kind of binary patching, where the reverse-engineer needs to contort its modifications within the existing layout of a program.

While the entry ticket is higher because it requires a curated Ghidra database for my delinking tooling to produce usable object files, it allows patching of programs at a far more intrusive scale than traditional techniques can realistically achieve. Constraints from the original program no longer apply when it is dismantled section by section and rebuilt from the ground up.