Previously in this series of articles, we’ve analyzed some leftovers hidden inside the AFS archive of the demo version of Tenchu. This time, we’ll continue with the main course: PSX.EXE and PSX.SYM, an early build of Tenchu’s main executable with its accompanying debugging information file.

An artifact lost in time

Found hidden inside the AFS archive of the demo version of Rittai Ninja Katsugeki Tenchu, PSX.EXE raises a number of questions, particularly when it was created. For reference, the timeline looks like this:

  • Acquire Corp. was founded on December 6th, 1994 ;
  • The K:\WORK\CDIMAGE\CD.CCS file inside its AFS archive contains the date September 5th, 1997 ;
  • The demo release date for Rittai Ninja Katsugeki Tenchu was October 26th, 1997 ;
  • The original release date for Rittai Ninja Katsugeki Tenchu was February 26th, 1998.

Do not quote me on the dates, I did not rely on authoritative sources for them.

Looking at the artifact itself, PSX.EXE does contain strings for all 8 levels plus the training level as well as a SCCS string for bios.c dated March 28th, 1997, which means it could’ve been built inside a time span of approximately half a year until the demo release date. It seems to fit the way the original JP releases were architectured, with one big executable instead of four separate executables as found on the NA release onward.

From the NA release of Tenchu: Stealth Assassins onwards, a 4-byte binary-coded decimal date is stored at address 0x80010000 during initialization, but unfortunately that doesn’t apply to earlier versions.

It’s likely that it is the last built executable before the development team switched to the name GAME.EXE and it was simply left there, forgotten. Merely substituting either GAME.EXE or PAPX_900.29 on the demo version of Rittai Ninja Katsugeki Tenchu with PSX.EXE doesn’t seem to yield a working game. At any rate, the really juicy bit isn’t PSX.EXE itself, but PSX.SYM.

Challenges of long-obsolete proprietary file formats

PSX.SYM contains debugging information in a proprietary MND format, specific to the Psy-Q toolchain. There’s some reverse-engineered documentation, a decompilation for DUMPSYM.EXE and some tools to manipulate them.

The ghidra_psx_ldr extension I’ve been using for importing PS-EXE files used to support SYM files, but that feature was removed in 2020.

Anyway, since I have my own ideas about my workflow I started writing a Java script (note the space) for Ghidra to process this file and get all that sweet metadata inside the program database, but it soon became apparent that things weren’t that simple…

No plan survives first contact with the enemy

Unfortunately, the addresses inside PSX.SYM do not match the contents of PSX.EXE. This means that this SYM file was for another, most likely earlier version of PSX.EXE, which we don’t have.

There’s still tons of information in there, like file names, data types, function definitions and symbol names, we just can’t directly apply it to any artifact on hand. However, the order of symbols inside PSX.SYM does appear to match PSX.EXE, at least to some extent, so it can’t be that far off.

That’s way too valuable to pass up. Therefore, the new plan currently looks something like this:

  1. Somehow load PSX.SYM and all its information as a program into Ghidra ;
  2. Version track from PSX.SYM to PSX.EXE, to get that data onto an executable we have ;
  3. Version track again from PSX.EXE to Rittai Ninja Katsugeki Tenchu’s GAME.EXE ;

After this, the valuable debugging data from PSX.SYM will have been transferred to an executable that we can actually play with, both literally and figuratively. From there, assuming there wasn’t too much transmission loss during all these migrations, I can use the annotated Rittai Ninja Katsugeki Tenchu’s GAME.EXE as a new starting point for my reverse-engineering effort.

But how do we load the debugging symbols into Ghidra in the first place if we do not have a program to begin with?

Make-believe executables

Okay, so we have a SYM file but not its PS-EXE executable. That’s like having an egg’s shell and membrane but not its yolk and albumen. Ghidra being a hungry beast, it can’t feast on an egg (debugging symbols) devoid of its substance (executable)… but maybe if we pad the inside with tofu it won’t know any better?

Sketchy metaphors aside, without a program to load we can’t process debugging information inside Ghidra. However, if we can scaffold a placeholder program, one that matches the shape if not the contents of the missing executable, then we can load it and then lay that debugging information on top of this fake program.

Mapping out sections

For our placeholder executable, we’ll need a header, a section table and the section’s bytes. We’ll pad the bytes with dummy data, but we still need the sections themselves. Fortunately, the SYM file contains symbols that mark the start, end and size of each section:

$ dumpsym ~/Games/'Rittai Ninja Katsugeki - Tenchu (Japan) (Demo)'/K:/WORK/CDIMAGE/PSX.SYM | grep -E '__[a-z]+(_org|_size)' | cut -f2,4 -d' ' | sort
$00000000 __ctors_size
$00000000 __dtors_size
$0000018e __sbss_size
$0000069c __sdata_size
$00003762 __rdata_size
$0000ca22 __data_size
$000777b0 __text_size
$0015aad8 __bss_size
$80010100 __ctors_org
$80010100 __ctors_orgend
$80010100 __dtors_org
$80010100 __dtors_orgend
$80010100 __rdata_org
$80013862 __rdata_orgend
$8001387c __text_org
$8008b02c __data_org
$8008b02c __text_orgend
$80097a4e __data_orgend
$80097a50 __sdata_org
$800980ec __sdata_orgend
$80098118 __sbss_org
$800982a6 __sbss_orgend
$800982c8 __bss_org
$801f2da0 __bss_orgend

With this, we can recreate the memory map for the sections of our placeholder:

Section Start address End address Size Padded size
.rdata 0x80010100 0x80013862 0x3762 0x377c
.text 0x8001387c 0x8008b02c 0x777b0 0x777b0
.data 0x8008b02c 0x80097a4e 0xca22 0xca24
.sdata 0x80097a50 0x800980ec 0x69c 0x6c8
.sbss 0x80098118 0x800982a6 0x18e 0x1b0
.bss 0x800982c8 0x801f2da0 0x15aad8 0x15aad8

The size of the sections will be padded to the start of the next section because there are symbols wedged in between the sections, at least according to this file.

Once more, the toolchains in the 90s didn’t give a damn

A whole lot of nothing

One extra bit of information we can observe is that the start symbol is located at address 0x80098098, which will be our entrypoint. After that, we can then repurpose some Jython code from my reverse-engineering series of articles to hand-craft the placeholder executable. Ghidra embeds a Jython standalone interpreter at Ghidra/Features/Python/lib/jython-standalone-2.7.3.jar, which we can use to run this script:

import struct
from jarray import zeros

ELFCLASS32 = 1
ELFDATA2LSB = 1
EV_CURRENT = 1
ELFOSABI_SYSV = 0
ELF_ET_REL = 1
ELF_ET_EXEC = 2
ELF_EM_MIPS = 8
ELF_EV_CURRENT = 1

ELF32LE_HDR = "<4c12BHHIIIIIHHHHHH"
ELF32LE_HDR_SIZE = struct.calcsize(ELF32LE_HDR)

SHT_NULL = 0
SHT_PROGBITS = 1
SHT_SYMTAB = 2
SHT_STRTAB = 3
SHT_NOBITS = 8

SHF_WRITE = 0x1
SHF_ALLOC = 0x2
SHF_EXECINSTR = 0x4

ELF32LE_SHDR = "<IIIIIIIIII"
ELF32LE_SHDR_SIZE = struct.calcsize(ELF32LE_SHDR)

# Craft string tables
def craft_string_table(strings):
    data = ""
    offsets = dict()
    for string in strings:
        offsets[string] = len(data)
        data += (string + "\0").encode("ascii")
    return data, offsets

_shstrtab_bytes, shstrtab = craft_string_table(["", ".shstrtab", ".rdata", ".text", ".data", ".sdata", ".sbss", ".bss"])

#   Name            Type                Flags                       Address         Bytes               Link    Info    Alignment   Entry size
sections = [
    ("",            SHT_NULL,           0,                          0,              "",                 0,      0,      0,          0),
    (".shstrtab",   SHT_STRTAB,         0,                          0,              _shstrtab_bytes,    0,      0,      0,          0),
    (".rdata",	    SHT_NOBITS,         SHF_ALLOC,                  0x80010100,     0x377c,             0,      0,      0,          0),
    (".text",	    SHT_NOBITS,         SHF_ALLOC|SHF_EXECINSTR,    0x8001387c,     0x777b0,            0,      0,      0,          0),
    (".data",	    SHT_NOBITS,         SHF_WRITE|SHF_ALLOC,        0x8008b02c,     0xca24,             0,      0,      0,          0),
    (".sdata",	    SHT_NOBITS,         SHF_WRITE|SHF_ALLOC,        0x80097a50,     0x6c8,              0,      0,      0,          0),
    (".sbss",	    SHT_NOBITS,         SHF_WRITE|SHF_ALLOC,        0x80098118,     0x1b0,              0,      0,      0,          0),
    (".bss",	    SHT_NOBITS,         SHF_WRITE|SHF_ALLOC,        0x800982c8,     0x15aad8,           0,      0,      0,          0),
]

file_offset = ELF32LE_HDR_SIZE + ELF32LE_SHDR_SIZE * len(sections)

elf_section_headers_bytes = ""
for section in sections:
    length = section[4] if type(section[4]) == int else len(section[4])
    elf_section_headers_bytes += struct.pack(ELF32LE_SHDR,
        shstrtab[section[0]], section[1], section[2], section[3], file_offset, length, section[5], section[6], section[7], section[8]
    )
    file_offset += len(section[4]) if type(section[4]) != int else 0

elf_header_bytes = struct.pack(ELF32LE_HDR,
    '\x7f', 'E', 'L', 'F', ELFCLASS32, ELFDATA2LSB, EV_CURRENT, ELFOSABI_SYSV, 0, 0, 0, 0, 0, 0, 0, 0,
    ELF_ET_EXEC, ELF_EM_MIPS, ELF_EV_CURRENT, 0x80098098, 0, ELF32LE_HDR_SIZE, 0x1000, ELF32LE_HDR_SIZE, 0, 0, ELF32LE_SHDR_SIZE, len(sections), 1
)
with open("PSX.SYM.elf", "w") as fp:
    fp.write(elf_header_bytes)
    fp.write(elf_section_headers_bytes)
    for section in sections:
        if type(section[4]) != int:
            fp.write(bytearray(section[4]))

Since PSX.SYM.elf is a real ELF file that happens to be a fake executable, we can inspect it with the usual tools:

$ readelf --wide --file-header PSX.SYM.elf
ELF Header:
  Magic:   7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00 
  Class:                             ELF32
  Data:                              2's complement, little endian
  Version:                           1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       0
  Type:                              EXEC (Executable file)
  Machine:                           MIPS R3000
  Version:                           0x1
  Entry point address:               0x80098098
  Start of program headers:          0 (bytes into file)
  Start of section headers:          52 (bytes into file)
  Flags:                             0x1000, o32, mips1
  Size of this header:               52 (bytes)
  Size of program headers:           0 (bytes)
  Number of program headers:         0
  Size of section headers:           40 (bytes)
  Number of section headers:         8
  Section header string table index: 1
$ readelf --wide --section-headers PSX.SYM.elf
There are 8 section headers, starting at offset 0x34:

Section Headers:
  [Nr] Name              Type            Addr     Off    Size   ES Flg Lk Inf Al
  [ 0]                   NULL            00000000 000174 000000 00      0   0  0
  [ 1] .shstrtab         STRTAB          00000000 000174 000030 00      0   0  0
  [ 2] .rdata            NOBITS          80010100 0001a4 003762 00   A  0   0  0
  [ 3] .text             NOBITS          8001387c 0001a4 0777b0 00  AX  0   0  0
  [ 4] .data             NOBITS          8008b02c 0001a4 00ca22 00  WA  0   0  0
  [ 5] .sdata            NOBITS          80097a50 0001a4 00069c 00  WA  0   0  0
  [ 6] .sbss             NOBITS          80098118 0001a4 00018e 00  WA  0   0  0
  [ 7] .bss              NOBITS          800982c8 0001a4 15aad8 00  WA  0   0  0
Key to Flags:
  W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
  L (link order), O (extra OS processing required), G (group), T (TLS),
  C (compressed), x (unknown), o (OS specific), E (exclude),
  p (processor specific)

It’s a whole lot of nothing: this executable doesn’t have segments for the loader to process or even a single byte of initialized memory to its name. QEMU won’t even let us try to run it:

$ chmod +x PSX.SYM.elf 
$ qemu-mipsel PSX.SYM.elf 
qemu-mipsel: PSX.SYM.elf: Invalid ELF image for this architecture

Despite all of that, Ghidra will happily import this so-called executable. After all, all we need it to do is to act as a stand-in, so that the debugging data has a place to live.

Dressing up the mannequin

Now that we have a placeholder executable, all we need to do is to layer that debugging data on top of this scaffolding. It’s easier said than done because this is a deceptively tricky task:

  • The SYM file format is poorly documented and somewhat brain-dead by modern standards ;
  • Mapping the data types to Ghidra’s own representation isn’t a straightforward conversion ;
  • Types can be forward-declared or use recursion (like a linked list node for example) ;

I also have some additional requirements that piles another level of difficulty on top of this:

  • I want the script to reuse predefined data types if available from the PlayStation executable loader ;
  • I want the script to work even on a placeholder program ;
  • I want the script to be idempotent.

After banging my head on the problem for quite a while, I’ve managed to write a battle-scarred but working Ghidra importation script for SYM files, which has since been upstreamed to the ghidra_psx_ldr extension.

We have symbols:

We have data:

We even have functions:

Sure, we haven’t got a byte for any of this stuff: PSX.SYM.elf is just a placeholder after all. However, if we can port all that metadata onto an executable we do have, then we’ll have a fully annotated artifact with real bytes… but that is a story for another part.

Conclusion

We’ve discovered that the debugging symbols file PSX.SYM doesn’t match PSX.EXE, meaning that we have a debugging symbols file and no program to use it on. Undaunted, we’ve created a fake program PSX.SYM.elf that matches the outline of PSX.SYM and then wrote a script to import all this information inside Ghidra, as if we had the real program on hand.