Previously in this series of articles, we’ve uncovered how the game switches between its multiple executables. It’s time to start tearing the game’s code apart with the power of delinking.

As with part 5 on my series of articles about the Atari Jaguar SDK, this is the part where I start performing dark magic.

This is your last chance to look away before I start poking Cthulhu in the rear.

Hey kids, you want some heap?

Fortunately for me, Tenchu: Stealth Assassins is far from being an optimized game. The game contains a huge amount of debugging strings, some for diagnostics and others for the debug menu.

Out of the 750 or so candidate strings that Ghidra finds inside MAIN.EXE using Search > For Strings..., these two are very noteworthy:

  • "OUT OF MEMORY\nREQUEST=%d\nFREE=%d(%d)\n"
  • "DOUBLE MEMORY RELEASE"

The first string in particular is a printf-style format string, helpfully preparing a report during a memory allocation failure including the requested size, largest free block and total free bytes. It is then passed as a parameter to a panic function that never returns, as the game can’t recover from this.

Unfortunately, said panic function is nothing but an infinite no-op loop, which the player would interpret as a hang. It’s a fairly safe bet that during development another function would’ve been used to actually print out the error or otherwise help with debugging.

Unmasking the heap dealer

After pulling the thread from these two strings, we uncover a dynamic memory heap allocator. It uses a single 288 KiB memory chunk located at 0x800dc000, well outside the sections of this executable.

This chunk overlaps the address range for RUN.EXE as described in part 4, reinforcing the notion that lifetimes is as important as layout in memory maps.

Memory blocks are managed through a singly-linked list, whose nodes hold the size of the block (with bit 31 set if the block is allocated) and a pointer to the next node. Allocations are served on a first-find basis and it panics if an allocation request can’t be satisfied.

The allocator offers the traditional malloc(), realloc() and free() primitives, as well as basic usage metrics and the ability to get the size of a block, used twice outside of the allocator itself. This means that the standard C memory allocation functions aren’t quite a drop-in replacement due to these extra features.

In-game, run-time memory usage metrics can be viewed by holding the Select button after turning on the debug print in the debug menu.

A critique of unopinionated memory allocation policy

To add some perspective, right after starting the first level with layout A the heap has around 128 KiB of free memory, or around 45% of the total heap. Then, spawning a cat through the debug menu incurs a CD hit (presumably to load the model) and reduces free memory by 10 KiB ; spawning additional cats cost about 2 KiB each.

Overall, this is a very basic, general-purpose heap allocator and quite a lot of functions inside MAIN.EXE call it. These functions include dealing with level data, characters, loading files whole from the CD and so on. It works fine within the context of the game, but it clearly shows that few optimization efforts have been put in that area.

Memory is tight on the PlayStation and more opinionated memory allocations can help both memory usage and memory fragmentation in the soft real-time system that is a video game. Some of the techniques that could be applied are:

  • Taking advantage of lifetime properties (per-level, per-frame…) ;
  • Bump allocators ;
  • Object pools ;

The avid reader can check out this blog post for a practical example with a heavily-refactored WipeOut.

On the plus side, the allocator gets reinitialized whenever the game swaps to another executable, so memory leaks and memory fragmentation wouldn’t linger around across levels. Also, not all memory is dynamically allocated (there are plenty of static variables), but further analysis would require drilling down specific subsystems.

A memory allocator that fell off a truck

Enough dilly-dallying, let’s do some delinking with my Ghidra extension. We can determine that the memory allocator’s code and data span the following address ranges:

Section Address range
.rdata 0x80011024 - 0x80011063
.text 0x8001656c - 0x80016e8b
.sdata 0x800976a0 - 0x800976a3
RAM 0x800dc000 - 0x80123fff

After exporting this selection of addresses as an ELF object file, we end up with this:

$ readelf --wide --file-header memory.o
ELF Header:
  Magic:   7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00 
  Class:                             ELF32
  Data:                              2's complement, little endian
  Version:                           1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       0
  Type:                              REL (Relocatable file)
  Machine:                           MIPS R3000
  Version:                           0x1
  Entry point address:               0x0
  Start of program headers:          0 (bytes into file)
  Start of section headers:          52 (bytes into file)
  Flags:                             0x0
  Size of this header:               52 (bytes)
  Size of program headers:           0 (bytes)
  Number of program headers:         0
  Size of section headers:           40 (bytes)
  Number of section headers:         10
  Section header string table index: 9
$ readelf --wide --section-headers --string-dump=.comment memory.o
There are 10 section headers, starting at offset 0x34:

Section Headers:
  [Nr] Name              Type            Addr     Off    Size   ES Flg Lk Inf Al
  [ 0]                   NULL            00000000 000000 000000 00      0   0  0
  [ 1] .strtab           STRTAB          00000000 0001c4 000185 00      0   0  1
  [ 2] .symtab           SYMTAB          00000000 00034c 0001c0 10      1  13  4
  [ 3] .rdata            PROGBITS        00000000 000510 000040 00   A  0   0 16
  [ 4] .text             PROGBITS        00000000 000550 000920 00  AX  0   0 16
  [ 5] .sdata            PROGBITS        00000000 000e70 000004 00  WA  0   0 16
  [ 6] RAM               NOBITS          00000000 000000 048000 00 WAX  0   0 16
  [ 7] .rel.text         REL             00000000 000e74 000188 08   I  2   4  4
  [ 8] .comment          PROGBITS        00000000 000ffc 000024 01  MS  0   0  1
  [ 9] .shstrtab         STRTAB          00000000 001020 000046 00      0   0  1
Key to Flags:
  W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
  L (link order), O (extra OS processing required), G (group), T (TLS),
  C (compressed), x (unknown), o (OS specific), E (exclude),
  p (processor specific)

String dump of section '.comment':
  [     0]  ghidra-delinker-extension v0.3.0


$ readelf --wide --symbols memory.o

Symbol table '.symtab' contains 28 entries:
   Num:    Value  Size Type    Bind   Vis      Ndx Name
     0: 00000000     0 NOTYPE  LOCAL  DEFAULT  UND 
     1: 00000000     0 FILE    LOCAL  DEFAULT  ABS memory.o
     2: 00000000     0 SECTION LOCAL  DEFAULT    3 
     3: 00000000     0 SECTION LOCAL  DEFAULT    4 
     4: 00000000     0 SECTION LOCAL  DEFAULT    5 
     5: 00000000     0 SECTION LOCAL  DEFAULT    6 
     6: 00000000    38 OBJECT  LOCAL  DEFAULT    3 s_OUT_OF_MEMORY_REQUEST=%d_FREE=%d_80011024
     7: 00000028    22 OBJECT  LOCAL  DEFAULT    3 s_DOUBLE_MEMORY_RELEASE_8001104c
     8: 00000100     4 OBJECT  LOCAL  DEFAULT    4 LAB_8001666c
     9: 000003b4     4 OBJECT  LOCAL  DEFAULT    4 LAB_80016920
    10: 000003b8     4 OBJECT  LOCAL  DEFAULT    4 LAB_80016924
    11: 000006b4     4 OBJECT  LOCAL  DEFAULT    4 LAB_80016c20
    12: 0000070c     4 OBJECT  LOCAL  DEFAULT    4 LAB_80016c78
    13: 00000000   460 FUNC    GLOBAL DEFAULT    4 MemoryAllocate
    14: 000001cc   516 FUNC    GLOBAL DEFAULT    4 MemoryResize
    15: 000003d0   780 FUNC    GLOBAL DEFAULT    4 FUN_8001693c
    16: 000006dc    84 FUNC    GLOBAL DEFAULT    4 MemoryInitialize
    17: 00000730    76 FUNC    GLOBAL DEFAULT    4 MemoryGetLargestAvailableBlockBytes
    18: 0000077c    68 FUNC    GLOBAL DEFAULT    4 MemoryGetAvailableBytes
    19: 000007c0    80 FUNC    GLOBAL DEFAULT    4 MemoryAllocateCleared
    20: 00000810   260 FUNC    GLOBAL DEFAULT    4 MemoryFree
    21: 00000914    12 FUNC    GLOBAL DEFAULT    4 MemoryGetBlockSize
    22: 00000000     4 OBJECT  GLOBAL DEFAULT    5 s_memoryBlockHead
    23: 00000000 0x48000 OBJECT  GLOBAL DEFAULT    6 s_memoryBlockDefault
    24: 00000000     0 NOTYPE  GLOBAL DEFAULT  UND Panic
    25: 00000000     0 NOTYPE  GLOBAL DEFAULT  UND memcpy
    26: 00000000     0 NOTYPE  GLOBAL DEFAULT  UND memset
    27: 00000000     0 NOTYPE  GLOBAL DEFAULT  UND sprintf
$ readelf --wide --relocs memory.o

Relocation section '.rel.text' at offset 0xe74 contains 49 entries:
 Offset     Info    Type                Sym. Value  Symbol's Name
00000004  00001607 R_MIPS_GPREL16         00000000   s_memoryBlockHead
0000001c  00001705 R_MIPS_HI16            00000000   s_memoryBlockDefault
00000020  00001706 R_MIPS_LO16            00000000   s_memoryBlockDefault
0000002c  00001607 R_MIPS_GPREL16         00000000   s_memoryBlockHead
00000058  00001607 R_MIPS_GPREL16         00000000   s_memoryBlockHead
000000a8  00000804 R_MIPS_26              00000100   LAB_8001666c
000000e8  00000804 R_MIPS_26              00000100   LAB_8001666c
00000108  00001607 R_MIPS_GPREL16         00000000   s_memoryBlockHead
00000150  00001607 R_MIPS_GPREL16         00000000   s_memoryBlockHead
0000019c  00000605 R_MIPS_HI16            00000000   s_OUT_OF_MEMORY_REQUEST=%d_FREE=%d_80011024
000001a0  00000606 R_MIPS_LO16            00000000   s_OUT_OF_MEMORY_REQUEST=%d_FREE=%d_80011024
000001a4  00001b04 R_MIPS_26              00000000   sprintf
000001ac  00001804 R_MIPS_26              00000000   Panic
000001f0  00000d04 R_MIPS_26              00000000   MemoryAllocate
000001f8  00000a04 R_MIPS_26              000003b8   LAB_80016924
000002c0  00000a04 R_MIPS_26              000003b8   LAB_80016924
0000032c  00000904 R_MIPS_26              000003b4   LAB_80016920
0000036c  00000a04 R_MIPS_26              000003b8   LAB_80016924
00000374  00000d04 R_MIPS_26              00000000   MemoryAllocate
00000384  00001504 R_MIPS_26              00000914   MemoryGetBlockSize
000003a4  00001904 R_MIPS_26              00000000   memcpy
000003ac  00001404 R_MIPS_26              00000810   MemoryFree
0000040c  00000d04 R_MIPS_26              00000000   MemoryAllocate
0000042c  00001904 R_MIPS_26              00000000   memcpy
0000044c  00000705 R_MIPS_HI16            00000028   s_DOUBLE_MEMORY_RELEASE_8001104c
00000450  00001804 R_MIPS_26              00000000   Panic
00000454  00000706 R_MIPS_LO16            00000028   s_DOUBLE_MEMORY_RELEASE_8001104c
000004a4  00001607 R_MIPS_GPREL16         00000000   s_memoryBlockHead
00000504  00000b04 R_MIPS_26              000006b4   LAB_80016c20
00000524  00000705 R_MIPS_HI16            00000028   s_DOUBLE_MEMORY_RELEASE_8001104c
00000528  00001804 R_MIPS_26              00000000   Panic
0000052c  00000706 R_MIPS_LO16            00000028   s_DOUBLE_MEMORY_RELEASE_8001104c
0000057c  00001607 R_MIPS_GPREL16         00000000   s_memoryBlockHead
000005dc  00001607 R_MIPS_GPREL16         00000000   s_memoryBlockHead
00000650  00001904 R_MIPS_26              00000000   memcpy
000006dc  00001607 R_MIPS_GPREL16         00000000   s_memoryBlockHead
000006e8  00001705 R_MIPS_HI16            00000000   s_memoryBlockDefault
000006ec  00001706 R_MIPS_LO16            00000000   s_memoryBlockDefault
000006f0  00001607 R_MIPS_GPREL16         00000000   s_memoryBlockHead
000006fc  00000c04 R_MIPS_26              0000070c   LAB_80016c78
00000710  00001607 R_MIPS_GPREL16         00000000   s_memoryBlockHead
00000730  00001607 R_MIPS_GPREL16         00000000   s_memoryBlockHead
0000077c  00001607 R_MIPS_GPREL16         00000000   s_memoryBlockHead
000007d8  00000d04 R_MIPS_26              00000000   MemoryAllocate
000007ec  00001a04 R_MIPS_26              00000000   memset
00000840  00000705 R_MIPS_HI16            00000028   s_DOUBLE_MEMORY_RELEASE_8001104c
00000844  00001804 R_MIPS_26              00000000   Panic
00000848  00000706 R_MIPS_LO16            00000028   s_DOUBLE_MEMORY_RELEASE_8001104c
0000089c  00001607 R_MIPS_GPREL16         00000000   s_memoryBlockHead

The memory allocator has very few dependencies with the rest of the game code and there are only a couple of undefined symbols (Panic, memcpy, memset, sprintf) in this object file.

This is also PlayStation code that was originally compiled and linked with a COFF toolchain being disemboweled and squeezed into an ELF object file. Don’t overthink it, it’s one of the lesser sins I’m about to commit here.

Since it’s fairly self-contained, we can try and make use of this object file. C compilers require symbols to be declared in order to use them, but they can’t use object files directly: compilers process source code, linkers process object files. Sharing declarations across source code files is traditionally done with header files and we can write one for memory.o by hand:

#pragma once

void* MemoryAllocate(int length);
void* MemoryResize(void* ptr, int length);
void MemoryInitialize(void* blockAddress, int blockLength);
int MemoryGetLargestAvailableBlockBytes();
int MemoryGetAvailableBytes();
void* MemoryAllocateCleared(int length, char pattern);
void MemoryFree(void* ptr);
int MemoryGetBlockSize(void *ptr);

With this we can use the heap allocator exported as an object file in our own code, despite the fact that we do not have the source code for it. Heck, we didn’t even decompile it first!

Necrobotics for the software engineer

We have an object file, but we don’t know if it actually works. Let’s put it to the test with a simple test program:

#include <stdio.h>

#include "memory.h"

void dumpstats() {
    int freeBytes = MemoryGetAvailableBytes();
    int largestFreeBlockBytes = MemoryGetLargestAvailableBlockBytes();

    printf("Free bytes: %7d, largest free block: %7d\n", freeBytes, largestFreeBlockBytes);
}

void* allocate_block(int length) {
    void* block = MemoryAllocate(length);

    printf("Allocating %d bytes, got address %p\n", length, block);
    dumpstats();

    return block;
}

void free_block(void *block) {
    int length = MemoryGetBlockSize(block);

    MemoryFree(block);
    printf("Freeing %p (%d bytes)\n", block);
    dumpstats();
}

int main() {
    dumpstats();

    void* block1 = allocate_block(1000);
    void* block2 = allocate_block(1000);
    free_block(block1);

    return 0;
}

Yes, I’m about to shove bits of code ripped from a PlayStation game into a Linux executable. Trust me, I’m an engineer!

Since memory.o is a MIPS 32-bit little-endian object file, we need the right toolchain and QEMU’s user-mode emulator in order to build and run this test program:

$ sudo apt-get install \
    binutils-mipsel-linux-gnu \
    gcc-mipsel-linux-gnu \
    qemu-user

The ABIs are slightly mismatched, but with enough flags the toolchain can be persuaded to paper over the cracks:

$ mipsel-linux-gnu-gcc -static -fno-pic -no-pie -o test-memory.elf test-memory.c memory.o panic.o
/usr/lib/gcc-cross/mipsel-linux-gnu/10/../../../../mipsel-linux-gnu/bin/ld: memory.o: warning: linking abicalls files with non-abicalls files
/usr/lib/gcc-cross/mipsel-linux-gnu/10/../../../../mipsel-linux-gnu/bin/ld: panic.o: warning: linking abicalls files with non-abicalls files

We have our test executable, let’s give it a try:

$ qemu-mipsel ./test-memory.elf
Free bytes:       0, largest free block:       0
qemu: uncaught target signal 11 (Segmentation fault) - core dumped
Segmentation fault (core dumped)

Hmm… It crashes during the first allocation.

Why did it have to be casts?

After debugging the test program, the issue can be traced back to these two instructions from MAIN.EXE:

        80016588 0d 80 03 3c     lui        v1,0x800d
        8001658c 00 c0 63 34     ori        v1,v1,0xc000

This sequence of instructions puts the value 0x800dc000 (the address of the start of the heap) into the register v1. This is common on MIPS since this architecture lacks the ability to load an arbitrary 32-bit integer constant in one instruction. There’s even a relocation pair pattern (R_MIPS_HI16 and R_MIPS_LO16) to fix up symbol addresses laid out like this.

The problem here is the second instruction (ori), whose 16-bit immediate is zero-extended. The R_MIPS_LO16 relocation type expects a sign-extended 16-bit immediate, used in instructions such as lw, sw, addiu

Somebody back in 1997 appears to have casted a raw integer constant into a pointer and called it a day.

Since this code is no longer running on a PlayStation but on Linux, that pointer is no longer valid and we end up with a segmentation fault when trying to use it.

Putting a patch on the cast

We can’t express this pattern as a set of System V ABI relocations. We can’t let this pointer alone by mapping some memory there either, since it’s outside the range of userspace addresses for the Linux kernel (from 0x00000000 to 0x7fffffff).

Sigh.

This leaves us with only one option, binary patching:

        80016588 0e 80 03 3c     lui        v1,0x800e
        8001658c 00 c0 63 24     addiu      v1,v1,-0x4000

We change the ori instruction into addiu and we add one to the lui constant, to account for the carry from the negative addiu constant. It is functionally identical to the original code, but now we can add a reference to address 0x800dc000 inside the addiu instruction and my relocation synthesizer analyzer will be able to create the relocations needed here.

After going through the object file exportation ritual again, we end up with this:

$ qemu-mipsel ./test-memory.elf
Free bytes:       0, largest free block:       0
Allocating 1000 bytes, got address 0x496f28
Free bytes: 1178632, largest free block: 1178632
Allocating 1000 bytes, got address 0x497318
Free bytes: 1177624, largest free block: 1177624
Freeing 0x496f28 (0 bytes)
Free bytes: 1178624, largest free block: 1177624

This is the edge I’ve been talking about back in the introduction. The delinking technique coupled with my own tooling allows me to turn executables into what are effectively LEGO® bricks.

The files for this part can be found here: tenchu1.tar.gz

Conclusion

We have identified a heap allocator within the MAIN.EXE executable of Rittai Ninja Katsugeki Tenchu: Shinobi Gaisen and extracted it as an object file using the delinking technique. We’ve also got it to run on Linux with a test program, despite the fact that this is straight up PlayStation code ripped out of a PS-EXE executable.