After laying out the context for this series of articles, let’s begin our journey on how to make software ports of programs with no source code. Our case study will be aln, the Atari linker for the Atari Jaguar ; specifically, its original Linux port in the a.out executable file format for the 32-bit x86 architecture.

Running aln

Before trying to reverse-engineer the artifact head-on, let’s run it first as-is on a modern x86_64 Linux system. After downloading the aln artifact, we’ll use Kees Cook’s a.out loader to execute it. Let’s download and build it:

$ wget https://raw.githubusercontent.com/kees/kernel-tools/trunk/a.out/aout.c
$ i686-linux-gnu-gcc aout.c -o aout -static

Then, we can use the loader to run aln:

$ ./aout ./aln -v
***********************************
*   ATARI LINKER (Mar 17 1995)    *
*  Adds from Atari version 1.11   *
*     and PC/DOS&Linux ports      *
*  Copyright 1993-95 Brainstorm.  *
** Copyright 1987-95 Atari Corp. **
***********************************
No object files to link.
Link aborted.

a.out QMAGIC executables are based very low in the virtual address space (4096), which is likely below the minimum mmap() address configured on a modern Linux system. Either lower that minimum address with sysctl -w vm.mmap_min_addr=4096 or use QEMU’s user-mode emulation to run aout.

There is a packaging of a complete Jaguar SDK by cubanismo, ready to use on modern systems. It offers the choice between running the original tools and modern replacements, if any. After setting it up, we can build the sample included with the original aln executable like so:

$ make LINK=aln V=1
rmac -fb -g2 -rd -v +o0 +o1 +o2 startup.s
startup.s 81: Warning: RISC code generated with no origin defined
[Writing object file: startup.o]
TEXT segment: 624 bytes
DATA segment: 0 bytes
BSS  segment: 64080 bytes
Total       : 64704 bytes
TextRel size: 248 bytes
DataRel size: 0 bytes
m68k-aout-gcc -DJAGUAR -I/home/boricj/Documents/jaguar-sdk/jaguar/include -O2 -c -o jag.o jag.c
aln -e -l -g2 -rd -a 4000 x x -v -o jaghello.cof startup.o jag.o
***********************************
*   ATARI LINKER (Mar 17 1995)    *
*  Adds from Atari version 1.11   *
*     and PC/DOS&Linux ports      *
*  Copyright 1993-95 Brainstorm.  *
** Copyright 1987-95 Atari Corp. **
***********************************
Output file is jaghello.cof

Sizes:   Text   Data    Bss   Syms
(hex)     3C0    410   FA54   1C00

Link complete.
$ file jaghello.cof
jaghello.cof: mc68k COFF object not stripped
$ sha256sum jaghello.cof
f9c8269cdc998de01c0ac7a3e815c16b7ced106e25f10f92a7078c722a220dbb  jaghello.cof

We’ll assume that our future ports of aln to be successful if they can replicate jaghello.cof. It doesn’t completely prove that the ports are fit for purpose as this is hardly an exhaustive stress testing of the linker, but it will demonstrate at least some basic level of functionality.

Loading aln into Ghidra

The Linux port of aln is a statically-linked QMAGIC a.out executable. At the time of writing this article, vanilla Ghidra doesn’t have an a.out file loader. There is a pull request that aims to implement it, but we’ll demonstrate how to manually load aln into Ghidra by hand.

For now, we’ll keep things simple and assume the a.out file format is composed of just four parts:

  1. A header ;
  2. A .text segment ;
  3. A .data segment ;
  4. A .bss segment.

The header has the following structure:

struct aout_header
{
	uint32_t a_info;    /* machine type, magic, etc */
	uint32_t a_text;    /* text size */
	uint32_t a_data;    /* data size */
	uint32_t a_bss;     /* desired bss size */
	uint32_t a_syms;    /* symbol table size */
	uint32_t a_entry;   /* entry address */
	uint32_t a_trsize;  /* text relocation size */
	uint32_t a_drsize;  /* data relocation size */
};

When running an a.out QMAGIC executable, the Linux kernel will load it as following:

  • The header and .text segment are mapped read-execute at offset 4096 ;
  • The .data segment is mapped read-write-execute immediately after the .text segment ;
  • The .bss segment is mapped read-write immediately after the .data segment.

First, let’s analyze the header:

$ hexdump -C aln | head -n 2
00000000  cc 00 64 00 00 c0 01 00  00 10 00 00 94 09 00 00  |..d.............|
00000010  00 00 00 00 20 10 00 00  00 00 00 00 00 00 00 00  |.... ...........|

Using the data structure above (and remembering that the executable is in little-endian order), we can read the following interesting values:

Field Offset Value
a_text 4 0x0001c000
a_data 8 0x00001000
a_bss 12 0x00000994
a_entry 20 0x00001020

Now that we have all the information needed from the header, we’ll import this executable inside Ghidra as a raw file and use the following settings:

  • Language: x86:LE:32:default:gcc ;
  • Block name: .text ;
  • Base address: 0x00001000 ;
  • Length: 0x0001d000.

Open the file and skip auto-analysis for now. Click on Window > Memory Map to display the memory map:

Ghidra has loaded the whole of aln as one .text section (as instructed), but this isn’t how this a.out file is actually structured. We need to fix up by hand the sections so that the memory map of Ghidra matches up with the memory map of the a.out file.

First, we’ll split off .data from .text. Select the .text section and click on the orange “🟰” button on the top-right corner with the tooltip Split a block. Fill in the dialog box as follows:

  • Block name: .data ;
  • Block length: 0x00001000.

The rest of the fields will automatically adjust. Click on OK to split the memory block.

Every initialized section is now correctly set up, but we still have .bss missing, which is an uninitialized section (meaning these bytes aren’t actually stored inside the a.out file). Click on the green “➕” button on the top-right corner with the tooltip Add a new block to memory. Fill in the dialog box as follows:

  • Block name: .bss ;
  • Start address: ram:0x0001e000 ;
  • Block length: 0x00000994 ;
  • Permissions: read, write ;
  • Uninitialized.

Click on OK to add the memory block. The memory map should now look like this:

The artifact is now properly loaded, but there is one last piece of information from the header that we can apply: the entrypoint. Select the address 0x00001020 in the listing view and hit F (or right-click and select Create Function). Then, hit the L key, rename the function to _start and mark it as an entry point.

The files for this case study can be found here: case-study.tar.gz

Conclusion

We have loaded the aln executable artifact into Ghidra by hand. Next time, we’ll start reverse-engineering this artifact in order to start porting aln to new environments.