Decompiling Tenchu: Stealth Assassins part 5: memory mismanagement
Previously in this series of articles, we’ve uncovered how the game switches between its multiple executables. It’s time to start tearing the game’s code apart with the power of delinking.
As with part 5 on my series of articles about the Atari Jaguar SDK, this is the part where I start performing dark magic.
This is your last chance to look away before I start poking Cthulhu in the rear.
Hey kids, you want some heap?
Fortunately for me, Tenchu: Stealth Assassins is far from being an optimized game. The game contains a huge amount of debugging strings, some for diagnostics and others for the debug menu.
Out of the 750 or so candidate strings that Ghidra finds inside MAIN.EXE
using Search > For Strings...
, these two are very noteworthy:
"OUT OF MEMORY\nREQUEST=%d\nFREE=%d(%d)\n"
"DOUBLE MEMORY RELEASE"
The first string in particular is a printf-style format string, helpfully preparing a report during a memory allocation failure including the requested size, largest free block and total free bytes. It is then passed as a parameter to a panic function that never returns, as the game can’t recover from this.
Unfortunately, said panic function is nothing but an infinite no-op loop, which the player would interpret as a hang. It’s a fairly safe bet that during development another function would’ve been used to actually print out the error or otherwise help with debugging.
Unmasking the heap dealer
After pulling the thread from these two strings, we uncover a dynamic memory heap allocator.
It uses a single 288 KiB memory chunk located at 0x800dc000
, well outside the sections of this executable.
This chunk overlaps the address range for RUN.EXE
as described in part 4, reinforcing the notion that lifetimes is as important as layout in memory maps.
Memory blocks are managed through a singly-linked list, whose nodes hold the size of the block (with bit 31 set if the block is allocated) and a pointer to the next node. Allocations are served on a first-find basis and it panics if an allocation request can’t be satisfied.
The allocator offers the traditional malloc()
, realloc()
and free()
primitives, as well as basic usage metrics and the ability to get the size of a block, used twice outside of the allocator itself.
This means that the standard C memory allocation functions aren’t quite a drop-in replacement due to these extra features.
In-game, run-time memory usage metrics can be viewed by holding the Select
button after turning on the debug print in the debug menu.
A critique of unopinionated memory allocation policy
To add some perspective, right after starting the first level with layout A the heap has around 128 KiB of free memory, or around 45% of the total heap. Then, spawning a cat through the debug menu incurs a CD hit (presumably to load the model) and reduces free memory by 10 KiB ; spawning additional cats cost about 2 KiB each.
Overall, this is a very basic, general-purpose heap allocator and quite a lot of functions inside MAIN.EXE
call it.
These functions include dealing with level data, characters, loading files whole from the CD and so on.
It works fine within the context of the game, but it clearly shows that few optimization efforts have been put in that area.
Memory is tight on the PlayStation and more opinionated memory allocations can help both memory usage and memory fragmentation in the soft real-time system that is a video game. Some of the techniques that could be applied are:
- Taking advantage of lifetime properties (per-level, per-frame…) ;
- Bump allocators ;
- Object pools ;
- …
The avid reader can check out this blog post for a practical example with a heavily-refactored WipeOut.
On the plus side, the allocator gets reinitialized whenever the game swaps to another executable, so memory leaks and memory fragmentation wouldn’t linger around across levels. Also, not all memory is dynamically allocated (there are plenty of static variables), but further analysis would require drilling down specific subsystems.
A memory allocator that fell off a truck
Enough dilly-dallying, let’s do some delinking with my Ghidra extension. We can determine that the memory allocator’s code and data span the following address ranges:
Section | Address range |
---|---|
.rdata |
0x80011024 - 0x80011063 |
.text |
0x8001656c - 0x80016e8b |
.sdata |
0x800976a0 - 0x800976a3 |
RAM |
0x800dc000 - 0x80123fff |
After exporting this selection of addresses as an ELF object file, we end up with this:
$ readelf --wide --file-header memory.o
ELF Header:
Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00
Class: ELF32
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: REL (Relocatable file)
Machine: MIPS R3000
Version: 0x1
Entry point address: 0x0
Start of program headers: 0 (bytes into file)
Start of section headers: 52 (bytes into file)
Flags: 0x0
Size of this header: 52 (bytes)
Size of program headers: 0 (bytes)
Number of program headers: 0
Size of section headers: 40 (bytes)
Number of section headers: 10
Section header string table index: 9
$ readelf --wide --section-headers --string-dump=.comment memory.o
There are 10 section headers, starting at offset 0x34:
Section Headers:
[Nr] Name Type Addr Off Size ES Flg Lk Inf Al
[ 0] NULL 00000000 000000 000000 00 0 0 0
[ 1] .strtab STRTAB 00000000 0001c4 000185 00 0 0 1
[ 2] .symtab SYMTAB 00000000 00034c 0001c0 10 1 13 4
[ 3] .rdata PROGBITS 00000000 000510 000040 00 A 0 0 16
[ 4] .text PROGBITS 00000000 000550 000920 00 AX 0 0 16
[ 5] .sdata PROGBITS 00000000 000e70 000004 00 WA 0 0 16
[ 6] RAM NOBITS 00000000 000000 048000 00 WAX 0 0 16
[ 7] .rel.text REL 00000000 000e74 000188 08 I 2 4 4
[ 8] .comment PROGBITS 00000000 000ffc 000024 01 MS 0 0 1
[ 9] .shstrtab STRTAB 00000000 001020 000046 00 0 0 1
Key to Flags:
W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
L (link order), O (extra OS processing required), G (group), T (TLS),
C (compressed), x (unknown), o (OS specific), E (exclude),
p (processor specific)
String dump of section '.comment':
[ 0] ghidra-delinker-extension v0.3.0
$ readelf --wide --symbols memory.o
Symbol table '.symtab' contains 28 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 00000000 0 NOTYPE LOCAL DEFAULT UND
1: 00000000 0 FILE LOCAL DEFAULT ABS memory.o
2: 00000000 0 SECTION LOCAL DEFAULT 3
3: 00000000 0 SECTION LOCAL DEFAULT 4
4: 00000000 0 SECTION LOCAL DEFAULT 5
5: 00000000 0 SECTION LOCAL DEFAULT 6
6: 00000000 38 OBJECT LOCAL DEFAULT 3 s_OUT_OF_MEMORY_REQUEST=%d_FREE=%d_80011024
7: 00000028 22 OBJECT LOCAL DEFAULT 3 s_DOUBLE_MEMORY_RELEASE_8001104c
8: 00000100 4 OBJECT LOCAL DEFAULT 4 LAB_8001666c
9: 000003b4 4 OBJECT LOCAL DEFAULT 4 LAB_80016920
10: 000003b8 4 OBJECT LOCAL DEFAULT 4 LAB_80016924
11: 000006b4 4 OBJECT LOCAL DEFAULT 4 LAB_80016c20
12: 0000070c 4 OBJECT LOCAL DEFAULT 4 LAB_80016c78
13: 00000000 460 FUNC GLOBAL DEFAULT 4 MemoryAllocate
14: 000001cc 516 FUNC GLOBAL DEFAULT 4 MemoryResize
15: 000003d0 780 FUNC GLOBAL DEFAULT 4 FUN_8001693c
16: 000006dc 84 FUNC GLOBAL DEFAULT 4 MemoryInitialize
17: 00000730 76 FUNC GLOBAL DEFAULT 4 MemoryGetLargestAvailableBlockBytes
18: 0000077c 68 FUNC GLOBAL DEFAULT 4 MemoryGetAvailableBytes
19: 000007c0 80 FUNC GLOBAL DEFAULT 4 MemoryAllocateCleared
20: 00000810 260 FUNC GLOBAL DEFAULT 4 MemoryFree
21: 00000914 12 FUNC GLOBAL DEFAULT 4 MemoryGetBlockSize
22: 00000000 4 OBJECT GLOBAL DEFAULT 5 s_memoryBlockHead
23: 00000000 0x48000 OBJECT GLOBAL DEFAULT 6 s_memoryBlockDefault
24: 00000000 0 NOTYPE GLOBAL DEFAULT UND Panic
25: 00000000 0 NOTYPE GLOBAL DEFAULT UND memcpy
26: 00000000 0 NOTYPE GLOBAL DEFAULT UND memset
27: 00000000 0 NOTYPE GLOBAL DEFAULT UND sprintf
$ readelf --wide --relocs memory.o
Relocation section '.rel.text' at offset 0xe74 contains 49 entries:
Offset Info Type Sym. Value Symbol's Name
00000004 00001607 R_MIPS_GPREL16 00000000 s_memoryBlockHead
0000001c 00001705 R_MIPS_HI16 00000000 s_memoryBlockDefault
00000020 00001706 R_MIPS_LO16 00000000 s_memoryBlockDefault
0000002c 00001607 R_MIPS_GPREL16 00000000 s_memoryBlockHead
00000058 00001607 R_MIPS_GPREL16 00000000 s_memoryBlockHead
000000a8 00000804 R_MIPS_26 00000100 LAB_8001666c
000000e8 00000804 R_MIPS_26 00000100 LAB_8001666c
00000108 00001607 R_MIPS_GPREL16 00000000 s_memoryBlockHead
00000150 00001607 R_MIPS_GPREL16 00000000 s_memoryBlockHead
0000019c 00000605 R_MIPS_HI16 00000000 s_OUT_OF_MEMORY_REQUEST=%d_FREE=%d_80011024
000001a0 00000606 R_MIPS_LO16 00000000 s_OUT_OF_MEMORY_REQUEST=%d_FREE=%d_80011024
000001a4 00001b04 R_MIPS_26 00000000 sprintf
000001ac 00001804 R_MIPS_26 00000000 Panic
000001f0 00000d04 R_MIPS_26 00000000 MemoryAllocate
000001f8 00000a04 R_MIPS_26 000003b8 LAB_80016924
000002c0 00000a04 R_MIPS_26 000003b8 LAB_80016924
0000032c 00000904 R_MIPS_26 000003b4 LAB_80016920
0000036c 00000a04 R_MIPS_26 000003b8 LAB_80016924
00000374 00000d04 R_MIPS_26 00000000 MemoryAllocate
00000384 00001504 R_MIPS_26 00000914 MemoryGetBlockSize
000003a4 00001904 R_MIPS_26 00000000 memcpy
000003ac 00001404 R_MIPS_26 00000810 MemoryFree
0000040c 00000d04 R_MIPS_26 00000000 MemoryAllocate
0000042c 00001904 R_MIPS_26 00000000 memcpy
0000044c 00000705 R_MIPS_HI16 00000028 s_DOUBLE_MEMORY_RELEASE_8001104c
00000450 00001804 R_MIPS_26 00000000 Panic
00000454 00000706 R_MIPS_LO16 00000028 s_DOUBLE_MEMORY_RELEASE_8001104c
000004a4 00001607 R_MIPS_GPREL16 00000000 s_memoryBlockHead
00000504 00000b04 R_MIPS_26 000006b4 LAB_80016c20
00000524 00000705 R_MIPS_HI16 00000028 s_DOUBLE_MEMORY_RELEASE_8001104c
00000528 00001804 R_MIPS_26 00000000 Panic
0000052c 00000706 R_MIPS_LO16 00000028 s_DOUBLE_MEMORY_RELEASE_8001104c
0000057c 00001607 R_MIPS_GPREL16 00000000 s_memoryBlockHead
000005dc 00001607 R_MIPS_GPREL16 00000000 s_memoryBlockHead
00000650 00001904 R_MIPS_26 00000000 memcpy
000006dc 00001607 R_MIPS_GPREL16 00000000 s_memoryBlockHead
000006e8 00001705 R_MIPS_HI16 00000000 s_memoryBlockDefault
000006ec 00001706 R_MIPS_LO16 00000000 s_memoryBlockDefault
000006f0 00001607 R_MIPS_GPREL16 00000000 s_memoryBlockHead
000006fc 00000c04 R_MIPS_26 0000070c LAB_80016c78
00000710 00001607 R_MIPS_GPREL16 00000000 s_memoryBlockHead
00000730 00001607 R_MIPS_GPREL16 00000000 s_memoryBlockHead
0000077c 00001607 R_MIPS_GPREL16 00000000 s_memoryBlockHead
000007d8 00000d04 R_MIPS_26 00000000 MemoryAllocate
000007ec 00001a04 R_MIPS_26 00000000 memset
00000840 00000705 R_MIPS_HI16 00000028 s_DOUBLE_MEMORY_RELEASE_8001104c
00000844 00001804 R_MIPS_26 00000000 Panic
00000848 00000706 R_MIPS_LO16 00000028 s_DOUBLE_MEMORY_RELEASE_8001104c
0000089c 00001607 R_MIPS_GPREL16 00000000 s_memoryBlockHead
The memory allocator has very few dependencies with the rest of the game code and there are only a couple of undefined symbols (Panic
, memcpy
, memset
, sprintf
) in this object file.
This is also PlayStation code that was originally compiled and linked with a COFF toolchain being disemboweled and squeezed into an ELF object file. Don’t overthink it, it’s one of the lesser sins I’m about to commit here.
Since it’s fairly self-contained, we can try and make use of this object file.
C compilers require symbols to be declared in order to use them, but they can’t use object files directly: compilers process source code, linkers process object files.
Sharing declarations across source code files is traditionally done with header files and we can write one for memory.o
by hand:
#pragma once
void* MemoryAllocate(int length);
void* MemoryResize(void* ptr, int length);
void MemoryInitialize(void* blockAddress, int blockLength);
int MemoryGetLargestAvailableBlockBytes();
int MemoryGetAvailableBytes();
void* MemoryAllocateCleared(int length, char pattern);
void MemoryFree(void* ptr);
int MemoryGetBlockSize(void *ptr);
With this we can use the heap allocator exported as an object file in our own code, despite the fact that we do not have the source code for it. Heck, we didn’t even decompile it first!
Necrobotics for the software engineer
We have an object file, but we don’t know if it actually works. Let’s put it to the test with a simple test program:
#include <stdio.h>
#include "memory.h"
void dumpstats() {
int freeBytes = MemoryGetAvailableBytes();
int largestFreeBlockBytes = MemoryGetLargestAvailableBlockBytes();
printf("Free bytes: %7d, largest free block: %7d\n", freeBytes, largestFreeBlockBytes);
}
void* allocate_block(int length) {
void* block = MemoryAllocate(length);
printf("Allocating %d bytes, got address %p\n", length, block);
dumpstats();
return block;
}
void free_block(void *block) {
int length = MemoryGetBlockSize(block);
MemoryFree(block);
printf("Freeing %p (%d bytes)\n", block);
dumpstats();
}
int main() {
dumpstats();
void* block1 = allocate_block(1000);
void* block2 = allocate_block(1000);
free_block(block1);
return 0;
}
Yes, I’m about to shove bits of code ripped from a PlayStation game into a Linux executable. Trust me, I’m an engineer!
Since memory.o
is a MIPS 32-bit little-endian object file, we need the right toolchain and QEMU’s user-mode emulator in order to build and run this test program:
$ sudo apt-get install \
binutils-mipsel-linux-gnu \
gcc-mipsel-linux-gnu \
qemu-user
The ABIs are slightly mismatched, but with enough flags the toolchain can be persuaded to paper over the cracks:
$ mipsel-linux-gnu-gcc -static -fno-pic -no-pie -o test-memory.elf test-memory.c memory.o panic.o
/usr/lib/gcc-cross/mipsel-linux-gnu/10/../../../../mipsel-linux-gnu/bin/ld: memory.o: warning: linking abicalls files with non-abicalls files
/usr/lib/gcc-cross/mipsel-linux-gnu/10/../../../../mipsel-linux-gnu/bin/ld: panic.o: warning: linking abicalls files with non-abicalls files
We have our test executable, let’s give it a try:
$ qemu-mipsel ./test-memory.elf
Free bytes: 0, largest free block: 0
qemu: uncaught target signal 11 (Segmentation fault) - core dumped
Segmentation fault (core dumped)
Hmm… It crashes during the first allocation.
Why did it have to be casts?
After debugging the test program, the issue can be traced back to these two instructions from MAIN.EXE
:
80016588 0d 80 03 3c lui v1,0x800d
8001658c 00 c0 63 34 ori v1,v1,0xc000
This sequence of instructions puts the value 0x800dc000
(the address of the start of the heap) into the register v1
.
This is common on MIPS since this architecture lacks the ability to load an arbitrary 32-bit integer constant in one instruction.
There’s even a relocation pair pattern (R_MIPS_HI16
and R_MIPS_LO16
) to fix up symbol addresses laid out like this.
The problem here is the second instruction (ori
), whose 16-bit immediate is zero-extended.
The R_MIPS_LO16
relocation type expects a sign-extended 16-bit immediate, used in instructions such as lw
, sw
, addiu
…
Somebody back in 1997 appears to have casted a raw integer constant into a pointer and called it a day.
Since this code is no longer running on a PlayStation but on Linux, that pointer is no longer valid and we end up with a segmentation fault when trying to use it.
Putting a patch on the cast
We can’t express this pattern as a set of System V ABI relocations.
We can’t let this pointer alone by mapping some memory there either, since it’s outside the range of userspace addresses for the Linux kernel (from 0x00000000
to 0x7fffffff
).
Sigh.
This leaves us with only one option, binary patching:
80016588 0e 80 03 3c lui v1,0x800e
8001658c 00 c0 63 24 addiu v1,v1,-0x4000
We change the ori
instruction into addiu
and we add one to the lui
constant, to account for the carry from the negative addiu
constant.
It is functionally identical to the original code, but now we can add a reference to address 0x800dc000
inside the addiu
instruction and my relocation synthesizer analyzer will be able to create the relocations needed here.
After going through the object file exportation ritual again, we end up with this:
$ qemu-mipsel ./test-memory.elf
Free bytes: 0, largest free block: 0
Allocating 1000 bytes, got address 0x496f28
Free bytes: 1178632, largest free block: 1178632
Allocating 1000 bytes, got address 0x497318
Free bytes: 1177624, largest free block: 1177624
Freeing 0x496f28 (0 bytes)
Free bytes: 1178624, largest free block: 1177624
This is the edge I’ve been talking about back in the introduction. The delinking technique coupled with my own tooling allows me to turn executables into what are effectively LEGO® bricks.
The files for this part can be found here: tenchu1.tar.gz
Conclusion
We have identified a heap allocator within the MAIN.EXE
executable of Rittai Ninja Katsugeki Tenchu: Shinobi Gaisen and extracted it as an object file using the delinking technique.
We’ve also got it to run on Linux with a test program, despite the fact that this is straight up PlayStation code ripped out of a PS-EXE executable.