Porting the Atari Jaguar SDK part 3: makin' elves
Previously in this series of articles, we used Ghidra’s Version Tracking tool to annotate parts of aln
with a closely matching C static library from Slackware 2.3.
In this article, we will start making software ports of aln
, despite not having the source code for it.
What are we porting to anyways?
Currently, aln
is a statically-linked Linux a.out executable for Linux.
For our first port, we’ll aim for something small: a statically-linked Linux ELF executable for Linux.
Since I want this part to be meaningful and not just stick an ELF header and call it a day, I’ll place another restriction: it must work as-is on modern Linux systems.
Recall in part one that to run aln
nowadays we must sysctl -w vm.mmap_min_addr=4096
to get it to run.
This is because modern Linux systems have a minimum virtual address for user processes configured by default to 0x10000
, whereas aln
is based at the lower address 0x1000
.
To fulfill the stated restriction for the ELF port, it must be based at a higher address than aln
currently is.
Simply moving the bits of aln
around in the address space will end in disaster: references to absolute addresses embedded within aln
would not shift around with the bits, leading to incorrect memory accesses and most likely crashes.
Porting aln with the power of delinking
To work around the problem of absolute references, we’ll use a technique described in my previous series of articles on reverse-engineering: delinking aln
back into an object file so that we can then relink it however we want.
This solves the issue because references to absolute addresses will be converted to relocations as part of this process and the linker is going to stitch everything back together, as if nothing happened.
Making an object file out of a program
Continuing from the previous part, we have a partially analyzed artifact. After some further work on it (annotating things, fixing up incorrectly identified references…), we are ready for our first attempt at delinking it with the help of my Ghidra extension.
Select the address ranges 0x1020
to 0x1e993
(the entire aln
program without the a.out header), then click on Analysis > One shot > Relocation Table Analyzer
.
Click on Window > Relocation Table (synthesized)
to view the reconstructed relocations:
We have all the data we need to make an object file out of aln
.
Click on File > Export Program...
(or hit the O
key), and fill in the wizard as follows:
- Format:
ELF relocatable object
; - Output File:
aln.whole.o
; - Selection Only: checked.
Furthermore, click on Options...
and ensure that Include dynamic symbols
and Strip leading underscore
are checked:
Click on OK
and the object file will be written to disk.
We can inspect that object file using our toolchain:
$ readelf --wide --file-header aln.whole.o
ELF Header:
Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00
Class: ELF32
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: REL (Relocatable file)
Machine: Intel 80386
Version: 0x0
Entry point address: 0x0
Start of program headers: 0 (bytes into file)
Start of section headers: 52 (bytes into file)
Flags: 0x0
Size of this header: 52 (bytes)
Size of program headers: 0 (bytes)
Number of program headers: 0
Size of section headers: 40 (bytes)
Number of section headers: 9
Section header string table index: 8
$ readelf --wide --section-headers aln.whole.o
There are 9 section headers, starting at offset 0x34:
Section Headers:
[Nr] Name Type Addr Off Size ES Flg Lk Inf Al
[ 0] NULL 00000000 000000 000000 00 0 0 0
[ 1] .strtab STRTAB 00000000 00019c 00e9cd 00 0 0 1
[ 2] .symtab SYMTAB 00000000 00eb6c 00eeb0 10 1 155 4
[ 3] .text PROGBITS 00000000 01da20 01bfe0 00 WAX 0 0 16
[ 4] .data PROGBITS 00000000 039a00 001000 00 WA 0 0 16
[ 5] .bss NOBITS 00000000 000000 000994 00 WA 0 0 16
[ 6] .rel.text REL 00000000 03aa00 005c10 08 I 2 3 4
[ 7] .rel.data REL 00000000 040610 000268 08 I 2 4 4
[ 8] .shstrtab STRTAB 00000000 040878 000040 00 0 0 1
Key to Flags:
W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
L (link order), O (extra OS processing required), G (group), T (TLS),
C (compressed), x (unknown), o (OS specific), E (exclude),
p (processor specific)
$ readelf --wide --symbols aln.whole.o
Symbol table '.symtab' contains 3819 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 00000000 0 NOTYPE LOCAL DEFAULT UND
1: 00000000 0 FILE LOCAL DEFAULT ABS aln.whole.o
2: 00000000 0 SECTION LOCAL DEFAULT 3
3: 00000000 0 SECTION LOCAL DEFAULT 4
4: 00000000 0 SECTION LOCAL DEFAULT 5
5: 000018cf 7 OBJECT LOCAL DEFAULT 3 switchD_000028ef::switchD
6: 000018e4 4 OBJECT LOCAL DEFAULT 3 switchD_000028ef::switchdataD_00002904
7: 00001984 3 OBJECT LOCAL DEFAULT 3 switchD_000028ef::caseD_3f
8: 000019a4 3 OBJECT LOCAL DEFAULT 3 switchD_000028ef::caseD_5a
9: 000019d4 3 OBJECT LOCAL DEFAULT 3 switchD_000028ef::caseD_41
10: 00001cf4 3 OBJECT LOCAL DEFAULT 3 switchD_000028ef::caseD_42
...
3808: 000008d8 1 OBJECT GLOBAL DEFAULT 5 DAT_0001e8d8
3809: 000008e0 1 OBJECT GLOBAL DEFAULT 5 __malloc_initialized
3810: 000008e4 1 OBJECT GLOBAL DEFAULT 5 DAT_0001e8e4
3811: 000008e8 4 OBJECT GLOBAL DEFAULT 5 DAT_0001e8e8
3812: 000008f0 1 OBJECT GLOBAL DEFAULT 5 DAT_0001e8f0
3813: 00000920 96 OBJECT GLOBAL DEFAULT 5 _fraghead
3814: 00000928 4 OBJECT GLOBAL DEFAULT 5 _fraghead[1].next
3815: 00000980 4 OBJECT GLOBAL DEFAULT 5 __malloc_hook
3816: 00000984 4 OBJECT GLOBAL DEFAULT 5 DAT_0001e984
3817: 00000988 4 OBJECT GLOBAL DEFAULT 5 DAT_0001e988
3818: 00000990 4 OBJECT GLOBAL DEFAULT 5 DAT_0001e990
$ readelf --wide --relocs aln.whole.o
Relocation section '.rel.text' at offset 0x3aa00 contains 2946 entries:
Offset Info Type Sym. Value Symbol's Name
00000012 000e3401 R_386_32 00000064 DAT_0001d064
0000001b 000e3501 R_386_32 00000068 DAT_0001d068
00000022 000e3601 R_386_32 0000006c DAT_0001d06c
00000568 000e1e01 R_386_32 00000004 PTR_s_aln_0001d004
0000056e 00009f01 R_386_32 0000005b s_Usage:_%s_[-options]_<files|-x_f_0000107b
00000578 0000a001 R_386_32 00000097 s_Where_options_are:_000010b7
00000582 0000a101 R_386_32 000000aa s_?:_print_this_000010ca
...
It has sections, symbols and relocations, like any other object file produced through more conventional means.
Making a program out of an object file out of a program
Let’s try making an executable out of aln.whole.o
.
Since this object file is a whole program, we need to build it statically with no extra libraries whatsoever:
$ i686-linux-gnu-gcc -nostdlib -nostartfiles -e_entry -static -o aln.elf aln.whole.o
Again, we can inspect the executable with the toolchain:
$ readelf --wide --file-header aln.elf
ELF Header:
Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00
Class: ELF32
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: EXEC (Executable file)
Machine: Intel 80386
Version: 0x1
Entry point address: 0x8049000
Start of program headers: 52 (bytes into file)
Start of section headers: 243928 (bytes into file)
Flags: 0x0
Size of this header: 52 (bytes)
Size of program headers: 32 (bytes)
Number of program headers: 4
Size of section headers: 40 (bytes)
Number of section headers: 8
Section header string table index: 7
$ readelf --wide --section-headers aln.elf
There are 8 section headers, starting at offset 0x3b8d8:
Section Headers:
[Nr] Name Type Addr Off Size ES Flg Lk Inf Al
[ 0] NULL 00000000 000000 000000 00 0 0 0
[ 1] .note.gnu.build-id NOTE 080480b4 0000b4 000024 00 A 0 0 4
[ 2] .text PROGBITS 08049000 001000 01bfe0 00 AX 0 0 16
[ 3] .data PROGBITS 08065000 01d000 001000 00 WA 0 0 16
[ 4] .bss NOBITS 08066000 01e000 000994 00 WA 0 0 16
[ 5] .symtab SYMTAB 00000000 01e000 00eef0 10 6 156 4
[ 6] .strtab STRTAB 00000000 02cef0 00e9a6 00 0 0 1
[ 7] .shstrtab STRTAB 00000000 03b896 00003f 00 0 0 1
Key to Flags:
W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
L (link order), O (extra OS processing required), G (group), T (TLS),
C (compressed), x (unknown), o (OS specific), E (exclude),
p (processor specific)
$ readelf --wide --symbols aln.elf | head -n 20
Symbol table '.symtab' contains 3823 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 00000000 0 NOTYPE LOCAL DEFAULT UND
1: 080480b4 0 SECTION LOCAL DEFAULT 1
2: 08049000 0 SECTION LOCAL DEFAULT 2
3: 08065000 0 SECTION LOCAL DEFAULT 3
4: 08066000 0 SECTION LOCAL DEFAULT 4
5: 00000000 0 FILE LOCAL DEFAULT ABS aln.whole.o
6: 0804a8cf 7 OBJECT LOCAL DEFAULT 2 switchD_000028ef::switchD
7: 0804a8e4 4 OBJECT LOCAL DEFAULT 2 switchD_000028ef::switchdataD_00002904
8: 0804a984 3 OBJECT LOCAL DEFAULT 2 switchD_000028ef::caseD_3f
9: 0804a9a4 3 OBJECT LOCAL DEFAULT 2 switchD_000028ef::caseD_5a
10: 0804a9d4 3 OBJECT LOCAL DEFAULT 2 switchD_000028ef::caseD_41
...
3812: 080518d8 10 OBJECT GLOBAL DEFAULT 2 LAB_000098f8
3813: 08052808 3 OBJECT GLOBAL DEFAULT 2 LAB_0000a828
3814: 0804d330 3 OBJECT GLOBAL DEFAULT 2 LAB_00005350
3815: 080642e4 1 OBJECT GLOBAL DEFAULT 2 LAB_0001c304
3816: 0806400c 3 OBJECT GLOBAL DEFAULT 2 LAB_0001c02c
3817: 08063a0a 7 OBJECT GLOBAL DEFAULT 2 LAB_0001ba2a
3818: 080566ee 2 OBJECT GLOBAL DEFAULT 2 LAB_0000e70e
3819: 0805c375 2 OBJECT GLOBAL DEFAULT 2 LAB_00014395
3820: 08055b2c 179 FUNC GLOBAL DEFAULT 2 FUN_0000db4c
3821: 080593a7 3 OBJECT GLOBAL DEFAULT 2 LAB_000113c7
3822: 08055e98 1 OBJECT GLOBAL DEFAULT 2 LAB_0000deb8
$ readelf --wide --relocs aln.elf
There are no relocations in this file.
We can observe that its base address (automatically set by the toolchain) is now 0x0x8049000
, fulfilling our minimum virtual address space requirement of 0x10000
or higher.
We can also see that the symbols, whose names include the original addresses from the a.out layout, are shifted from their initial locations.
Debugging aln because of delinking
Let’s take aln.elf
for a spin:
$ rm -f jaghello.cof && make LINK=aln.elf V=1
aln.elf -e -l -g2 -rd -a 4000 x x -v -o jaghello.cof startup.o jag.o
make: *** [Makefile:12: jaghello.cof] Segmentation fault (core dumped)
… and it crashed immediately.
Let’s run aln.elf
under GDB:
$ gdb --args aln.elf -e -l -g2 -rd -a 4000 x x -v -o jaghello.cof startup.o jag.o
GNU gdb (Debian 10.1-1.7) 10.1.90.20210103-git
Copyright (C) 2021 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from aln.elf...
(No debugging symbols found in aln.elf)
(gdb) r
Starting program: /home/boricj/Documents/atari-sdk-elf/aln.elf -e -l -g2 -rd -a 4000 x x -v -o jaghello.cof startup.o jag.o
Program received signal SIGSEGV, Segmentation fault.
0x08058f6b in strncmp ()
(gdb) bt
#0 0x08058f6b in strncmp ()
#1 0x08058db2 in getenv ()
#2 0x0804c199 in main ()
(gdb)
The program crashed while doing some sort of processing with environment variables. The fun part of getting this port to run begins now.
Entrypoint shenanigans
Linux initializes the process differently between an a.out and an ELF executable: the latter is set up per the ELF i386 psABI specification. The relevant part is the differences in stack layout: for a.out executables, the Linux kernel used to set it up as-if the entrypoint is a C function with this signature that was just called:
[[noreturn]] void _start(int argc, char** argv, char** environ);
Needless to say, the ELF i386 psABI specification mandates something completely different, which is why aln.elf
is crashing due to the ABI mismatch.
The ELF entrypoint therefore needs to shuffle things around before it can jump into the a.out entrypoint.
The following assembly snippet is one way to implement it:
.text
.global _start
_start:
xor %ebp, %ebp # Terminate the call stack chain.
mov %esp, %eax # Save the old stack pointer for later.
and $-16, %esp # Align the stack to 16 bytes 'cause we can.
mov (%eax), %ebx # Get argc from the old stack.
lea 4(%eax), %esi # Get argv from the old stack.
lea 8(%eax, %ebx, 4), %edi # Get environ from the old stack.
push %edi # Push arguments in reverse order,
push %esi # per the System-V i386 calling
push %ebx # convention.
jmp _entry # Jump into the a.out entrypoint.
Now, let’s fast-forward a bit the process of fixing aln.elf
crashes until aln.elf
doesn’t crash anymore:
$ i686-linux-gnu-gcc -nostdlib -nostartfiles -e_start -static -o aln.elf aln.whole.o crt0.S
$ rm -f jaghello.cof && make LINK=aln.elf V=1
aln.elf -e -l -g2 -rd -a 4000 x x -v -o jaghello.cof startup.o jag.o
***********************************
* ATARI LINKER (Mar 17 1995) *
* Adds from Atari version 1.11 *
* and PC/DOS&Linux ports *
* Copyright 1993-95 Brainstorm. *
** Copyright 1987-95 Atari Corp. **
***********************************
Output file is jaghello.cof
/home/boricj/Documents/jaguar-sdk/jaguar/examples/jaghello/startup.s(1): Unresolved reference: in BSD object startup.o, symbol ___main
(@T+0X106): Unresolved reference: in BSD object jag.o, symbol _vidmem
Sizes: Text Data Bss Syms
(hex) 3C0 410 FA50 1C0E
Link complete.
$ file jaghello.cof
jaghello.cof: mc68k COFF object not stripped
$ sha256sum jaghello.cof
dfe2d010a3b526bc3d9e573016b614d0bfd0b382bca004a4a42f7e8a89a22c29 jaghello.cof
This time aln.elf
didn’t crash, but it has a different observable behavior than the original one and the SHA-256 hash of jaghello.cof
doesn’t match the one from part 1…
Since aln
didn’t crash, we don’t have an obvious spot to analyze with a debugger.
Fortunately, we do have the original aln
we can run and cross-check our port with.
All we need to do is to compare two different programs as they execute and find out where they start behaving differently and why.
When all you have is a hammer…
One little-known feature of GDB is the ability to do time travel debugging, in another words to step backwards. To do so, GDB records the side-effects of instructions as they execute: stepping backwards can then be achieved by unapplying those side-effects in reverse. We’re not interested in time-traveling debugging here, but rather in multi-dimensional diffing.
We’ll use GDB to make and save the recordings.
For this to work, we need to replicate the same conditions between the two runs as close as possible, like command lines, environment variables…
The recordings are also started in both cases from _entry
onwards, the original entrypoint from the a.out executable:
$ rm -f jaghello.cof && gdb-multiarch --args aout-loader aln -e -l -g2 -rd -a 4000 x x -v -o jaghello.cof startup.o jag.o
GNU gdb (Debian 13.1-3) 13.1
Copyright (C) 2023 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from aout-loader...
(No debugging symbols found in aout-loader)
(gdb) starti
Starting program: /home/boricj/Documents/jaguar-sdk/tools/bin/aout-loader aln -e -l -g2 -rd -a 4000 x x -v -o jaghello.cof startup.o jag.o
Program stopped.
0xf7fe4490 in ?? () from /lib/ld-linux.so.2
(gdb) hb *0x1020
Hardware assisted breakpoint 1 at 0x1020
(gdb) c
Continuing.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Breakpoint 1, 0x00001020 in ?? ()
(gdb) set record full insn-max-instructions unlimited
Undefined set record full command: "insn-max-instructions unlimited". Try "help set record full".
(gdb) set record full insn-number-max unlimited
(gdb) record full
(gdb) c
Continuing.
***********************************
* ATARI LINKER (Mar 17 1995) *
* Adds from Atari version 1.11 *
* and PC/DOS&Linux ports *
* Copyright 1993-95 Brainstorm. *
** Copyright 1987-95 Atari Corp. **
***********************************
Output file is jaghello.cof
Sizes: Text Data Bss Syms
(hex) 3C0 410 FA54 1C00
Link complete.
The next instruction is syscall exit. It will make the program exit. Do you want to stop the program?([y] or n) y
Process record: inferior program stopped.
Program stopped.
0x00011e64 in ?? ()
(gdb) record save aln.record
warning: target file /proc/11340/cmdline contained unexpected null characters
Saved core file aln.record with execution log.
$ rm -f jaghello.cof && gdb-multiarch --args aln.elf -e -l -g2 -rd -a 4000 x x -v -o jaghello.cof startup.o jag.o
GNU gdb (Debian 13.1-3) 13.1
Copyright (C) 2023 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from aln.elf...
(No debugging symbols found in aln.elf)
(gdb) b _entry
Breakpoint 1 at 0x8049000
(gdb) r
Starting program: /home/boricj/Documents/jaguar-sdk/jaguar/bin/linux/aln.elf -e -l -g2 -rd -a 4000 x x -v -o jaghello.cof startup.o jag.o
Breakpoint 1, 0x08049000 in _entry ()
(gdb) set record full insn-number-max unlimited
(gdb) record full
(gdb) c
Continuing.
***********************************
* ATARI LINKER (Mar 17 1995) *
* Adds from Atari version 1.11 *
* and PC/DOS&Linux ports *
* Copyright 1993-95 Brainstorm. *
** Copyright 1987-95 Atari Corp. **
***********************************
Output file is jaghello.cof
/home/boricj/Documents/jaguar-sdk/jaguar/examples/jaghello/startup.s(1): Unresolved reference: in BSD object startup.o, symbol ___main
(@T+0X106): Unresolved reference: in BSD object jag.o, symbol _vidmem
Sizes: Text Data Bss Syms
(hex) 3C0 410 FA50 1C0E
Link complete.
The next instruction is syscall exit. It will make the program exit. Do you want to stop the program?([y] or n) y
Process record: inferior program stopped.
Program stopped.
0x08059e44 in _exit ()
(gdb) record save aln.elf.record
warning: target file /proc/17807/cmdline contained unexpected null characters
Saved core file aln.elf.record with execution log.
We have two recordings (weighing over 30 MiB each), but we don’t have a tool to compare them. It is time to improvise: we’ll hack support for the precord format produced by GDB into pyelftools, a Python package for manipulating ELF files, then write a script to make the comparison:
diff --git a/elftools/elf/elffile.py b/elftools/elf/elffile.py
index 446d970..5074cb1 100644
--- a/elftools/elf/elffile.py
+++ b/elftools/elf/elffile.py
@@ -30,7 +30,8 @@ from .structs import ELFStructs
from .sections import (
Section, StringTableSection, SymbolTableSection,
SymbolTableIndexSection, SUNWSyminfoTableSection, NullSection,
- NoteSection, StabSection, ARMAttributesSection, RISCVAttributesSection)
+ NoteSection, StabSection, ARMAttributesSection, RISCVAttributesSection,
+ PrecordSection)
from .dynamic import DynamicSection, DynamicSegment
from .relocation import (RelocationSection, RelocationHandler,
RelrRelocationSection)
@@ -661,6 +662,8 @@ class ELFFile(object):
return NoteSection(section_header, name, self)
elif sectype == 'SHT_PROGBITS' and name == '.stab':
return StabSection(section_header, name, self)
+ elif sectype == 'SHT_PROGBITS' and name == 'precord':
+ return PrecordSection(section_header, name, self)
elif sectype == 'SHT_ARM_ATTRIBUTES':
return ARMAttributesSection(section_header, name, self)
elif sectype == 'SHT_RISCV_ATTRIBUTES':
diff --git a/elftools/elf/sections.py b/elftools/elf/sections.py
index 3805962..16ac706 100644
--- a/elftools/elf/sections.py
+++ b/elftools/elf/sections.py
@@ -6,12 +6,13 @@
# Eli Bendersky (eliben@gmail.com)
# This code is in the public domain
#-------------------------------------------------------------------------------
-from ..common.exceptions import ELFCompressionError
+from ..common.exceptions import ELFCompressionError, ELFParseError
from ..common.utils import struct_parse, elf_assert, parse_cstring_from_stream
from collections import defaultdict
from .constants import SH_FLAGS
from .notes import iter_notes
+import struct
import zlib
@@ -278,6 +279,79 @@ class NoteSection(Section):
"""
return iter_notes(self.elffile, self['sh_offset'], self['sh_size'])
+class PrecordEnd(object):
+ """ PrecordEnd object - representing the end of a precord list.
+ """
+ def __init__(self, signal, count):
+ self.signal = signal
+ self.count = count
+
+ def __repr__(self):
+ s = '<%s: signal=0x%08x count=%d>' % \
+ (self.__class__.__name__, self.signal, self.count)
+ return s
+
+class PrecordReg(object):
+ """ PrecordReg object - representing a register inside a precord list.
+ """
+ def __init__(self, number, name, value, length):
+ self.number = number
+ self.name = name
+ self.value = value
+ self.length = length
+
+ def __repr__(self):
+ s = '<%s (%s): 0x%s>' % \
+ (self.__class__.__name__, self.name, hex(self.value)[2:].zfill(self.length * 2))
+ return s
+
+
+class PrecordMem(object):
+ """ PrecordMem object - representing a memory write a precord list.
+ """
+ def __init__(self, memlen, memaddr, memval):
+ self.memlen = memlen
+ self.memaddr = memaddr
+ self.memval = memval
+
+ def __repr__(self):
+ s = '<%s (%016x, %d): 0x%s>' % \
+ (self.__class__.__name__, self.memaddr, self.memlen, hex(self.memval)[2:].zfill(self.memlen * 2))
+ return s
+
+class PrecordSection(Section):
+ """ GDB program record section.
+ """
+ def iter_precords(self, parse_reg, byteorder):
+ """ Yield all precord entries. Result types are precord objects.
+ """
+ offset = self['sh_offset']
+ size = self['sh_size']
+ end = offset + size
+
+ self.stream.seek(offset)
+ magic = struct.unpack('!I', self.stream.read(4))[0]
+
+ if magic != 0x20091016:
+ raise ELFParseError("Unknown precord magic %08x" % magic)
+
+ while self.stream.tell() < end:
+ record_type = struct.unpack('!B', self.stream.read(1))[0]
+
+ if record_type == 0x00:
+ signal, count = struct.unpack('!II', self.stream.read(8))
+ yield PrecordEnd(signal, count)
+ elif record_type == 0x01:
+ regnum = struct.unpack('!I', self.stream.read(4))[0]
+ regname, regval, reglength = parse_reg(regnum, self.stream)
+ yield PrecordReg(regnum, regname, int.from_bytes(regval, byteorder), reglength)
+ elif record_type == 0x02:
+ memlen, memaddr = struct.unpack('!IQ', self.stream.read(12))
+ memval = self.stream.read(memlen)
+ yield PrecordMem(memlen, memaddr, int.from_bytes(memval, byteorder))
+ else:
+ raise ELFParseError("Unknown precord type %02x" % record_type)
+
class StabSection(Section):
""" ELF stab section.
#!/usr/bin/env python
import argparse
from elftools.elf.elffile import ELFFile
from elftools.elf.sections import PrecordReg, PrecordEnd
PRECORD_NAME = 'precord'
def parse_reg_i386(regnum, stream):
# Taken from binutils-gdb/gdb/features/i386/32bit-core.xml
regnames=[
"eax", "ecx", "edx", "ebx",
"esp", "ebp", "esi", "edi",
"eip", "eflags", "cs", "ss",
"ds", "es", "fs", "gs",
"st0", "st1", "st2", "st3",
"st4", "st5", "st6", "st7",
"fctrl", "fstat", "ftag", "fiseg",
"fioff", "foseg", "fooff", "fop"
]
regsizes=[
4, 4, 4, 4,
4, 4, 4, 4,
4, 4, 4, 4,
4, 4, 4, 4,
10, 10, 10, 10,
10, 10, 10, 10,
4, 4, 4, 4,
4, 4, 4, 4
]
return regnames[regnum], stream.read(regsizes[regnum]), regsizes[regnum]
class AddressMapping:
def __init__(self, spec):
parts = spec.split(':')
self.source = int(parts[0], 16)
self.destination = int(parts[1], 16)
self.length = int(parts[2], 16)
def map_address(addr, mappings):
for mapping in mappings:
if addr >= mapping.source and addr < (mapping.source + mapping.length):
return addr - mapping.source + mapping.destination
return addr
def main():
argparser = argparse.ArgumentParser()
argparser.add_argument('precord', nargs=2, default=None, help='Record files to compare')
argparser.add_argument('-m', '--mapping', nargs='*', type=AddressMapping, dest='mappings', default=[])
argparser.add_argument('-r', '--check-register-mappings', nargs='*', type=str, dest='registers_mappings', default=[])
args = argparser.parse_args()
with open(args.precord[0], 'rb') as file1:
with open(args.precord[1], 'rb') as file2:
elffile1 = ELFFile(file1)
elffile2 = ELFFile(file2)
section1 = elffile1.get_section_by_name(PRECORD_NAME)
iter_precords1 = section1.iter_precords(parse_reg_i386, 'little')
section2 = elffile2.get_section_by_name(PRECORD_NAME)
iter_precords2 = section2.iter_precords(parse_reg_i386, 'little')
registers1 = dict()
registers2 = dict()
for record1, record2 in zip(iter_precords1, iter_precords2):
print("%-79s %s" % (record1, record2))
if type(record1) != type(record2):
print("Record type divergence")
break
elif type(record1) == PrecordEnd:
if record1.count != record2.count:
print("End record count divergence")
break
elif type(record1) == PrecordReg:
if record1.name != record2.name:
print("Register name divergence")
break
registers1[record1.name] = record1.value
registers2[record2.name] = record2.value
if record1.name in args.registers_mappings:
mapped_value = map_address(record1.value, args.mappings)
if mapped_value != record2.value:
print("Register value divergence (0x%016x -> 0x%016x != 0x%016x)" % (record1.value, mapped_value, record2.value))
break
if __name__ == '__main__':
main()
Maybe there are easier, off-the-shelf ways to do this kind of behavioral difference analysis between dissimilar programs, but I don’t know of one.
Let’s try it out:
$ PYTHONPATH=${HOME}/Documents/pyelftools python3 ./precordcompare.py ${JAGHELLO}/aln.record ${JAGHELLO}/aln.elf.record --check-register-mappings eip
<PrecordReg (esp): 0xf7d8bf24> <PrecordReg (esp): 0xffffce90>
<PrecordMem (00000000f7d8bf24, 4): 0x00001025> <PrecordMem (00000000ffffce90, 4): 0x08049005>
<PrecordReg (eip): 0x0000fd4c> <PrecordReg (eip): 0x08057d2c>
Register value divergence (0x000000000000fd4c -> 0x000000000000fd4c != 0x0000000008057d2c)
…and it diverged immediately.
Another tricky point is that we’re trying to compare two execution traces across two different programs, who don’t have the same memory layout.
aln
was delinked and relinked as a whole, so the meat of it is essentially the same but moved at a different virtual address.
We’ll account for this by adding mappings from one address space to another:
$ PYTHONPATH=${HOME}/Documents/pyelftools python3 ./recordcompare.py ${JAGHELLO}/aln.record ${JAGHELLO}/aln.elf.record --mapping 0x1020:0x08049000:0x01e973 --check-register-mappings eip
<PrecordReg (esp): 0xf7d8bf24> <PrecordReg (esp): 0xffffce90>
<PrecordMem (00000000f7d8bf24, 4): 0x00001025> <PrecordMem (00000000ffffce90, 4): 0x08049005>
<PrecordReg (eip): 0x0000fd4c> <PrecordReg (eip): 0x08057d2c>
<PrecordEnd: signal=0x00000000 count=1> <PrecordEnd: signal=0x00000000 count=1>
<PrecordReg (esp): 0xf7d8bf20> <PrecordReg (esp): 0xffffce8c>
<PrecordMem (00000000f7d8bf20, 4): 0xffffcdd8> <PrecordMem (00000000ffffce8c, 4): 0x00000000>
<PrecordReg (eip): 0x0000fd4d> <PrecordReg (eip): 0x08057d2d>
<PrecordEnd: signal=0x00000000 count=2> <PrecordEnd: signal=0x00000000 count=2>
...
<PrecordReg (esp): 0xf7d8be0c> <PrecordReg (esp): 0xffffcd78>
<PrecordReg (eflags): 0x00000286> <PrecordReg (eflags): 0x00000286>
<PrecordReg (eip): 0x00009da3> <PrecordReg (eip): 0x08051d83>
<PrecordEnd: signal=0x00000000 count=215180> <PrecordEnd: signal=0x00000000 count=215180>
<PrecordReg (eax): 0x5656a200> <PrecordReg (eax): 0x0807a200>
<PrecordReg (eip): 0x00009da7> <PrecordReg (eip): 0x08051d87>
<PrecordEnd: signal=0x00000000 count=215181> <PrecordEnd: signal=0x00000000 count=215181>
<PrecordReg (eax): 0x56562000> <PrecordReg (eax): 0x08070000>
<PrecordReg (eflags): 0x00000206> <PrecordReg (eflags): 0x00000246>
<PrecordReg (eip): 0x00009dab> <PrecordReg (eip): 0x08051d8b>
<PrecordEnd: signal=0x00000000 count=215182> <PrecordEnd: signal=0x00000000 count=215182>
<PrecordReg (eflags): 0x00000246> <PrecordReg (eflags): 0x00000287>
<PrecordReg (eip): 0x00009daf> <PrecordReg (eip): 0x08051d8f>
<PrecordEnd: signal=0x00000000 count=215183> <PrecordEnd: signal=0x00000000 count=215183>
<PrecordReg (eip): 0x00009e28> <PrecordReg (eip): 0x08051d91>
Register value divergence (0x0000000000009e28 -> 0x0000000008051e08 != 0x0000000008051d91)
Aha!
We have an instruction pointer divergence.
Looking backwards at the previous instruction pointer values (0x9e28
, 0x9daf
, 0x9dab
, 0x9da7
), we stumble upon the smoking gun, a bad reference:
Finally, by cross-referencing the raw disassembly of the artifacts at hand we can piece together what happened in detail:
$ i686-linux-gnu-objdump -D -b binary -m i386 --adjust-vma=0x1000 aln
...
9da0: 83 c4 10 add $0x10,%esp
9da3: 66 8b 45 fe mov -0x2(%ebp),%ax
9da7: 66 25 00 28 and $0x2800,%ax
9dab: 66 3d 00 20 cmp $0x2000,%ax
9daf: 74 77 je 0x9e28
...
$ i686-linux-gnu-objdump -Dr aln.whole.o
...
8d80: 83 c4 10 add $0x10,%esp
8d83: 66 8b 45 fe mov -0x2(%ebp),%ax
8d87: 66 25 06 00 and $0x6,%ax
8d89: R_386_NONE s_-x:_not_enough_arguments_000027fa
8d8b: 66 3d 00 20 cmp $0x2000,%ax
8d8f: 74 77 je 8e08 <LAB_00009e28>
...
$ i686-linux-gnu-objdump -D aln.elf
...
8051d80: 83 c4 10 add $0x10,%esp
8051d83: 66 8b 45 fe mov -0x2(%ebp),%ax
8051d87: 66 25 06 00 and $0x6,%ax
8051d8b: 66 3d 00 20 cmp $0x2000,%ax
8051d8f: 74 77 je 8051e08 <LAB_00009e28>
...
- The integer constant
0x2800
of the instructionand ax, 0x2800
was misidentified as an address during Ghidra’s static analysis ; - This led to the introduction of a fake reference to the symbol
s_t_enough_arguments_000027fa
at this location in the Ghidra database ; - That reference caused the relocation synthesizer analyzer to emit a bogus relocation for this integer constant ;
- This bogus relocation led to the corruption of the integer constant during the exportation of
aln.whole.o
and the linkage ofaln.elf
; - The instruction with the altered integer constant ultimately led to the observable change in behavior of
aln.elf
.
Whew!
That was quite the journey to track this down.
The fix here is to remove the bogus references by clicking on it and hitting the Delete
key (or right-cliking it and selecting Delete Memory References
), so that it doesn’t end up corrupting the exported object file.
Same software, different case
After another round of debugging aln.elf
, we finally end up with the following:
$ rm -f jaghello.cof && make LINK=aln.elf V=1
aln.elf -e -l -g2 -rd -a 4000 x x -v -o jaghello.cof startup.o jag.o
***********************************
* ATARI LINKER (Mar 17 1995) *
* Adds from Atari version 1.11 *
* and PC/DOS&Linux ports *
* Copyright 1993-95 Brainstorm. *
** Copyright 1987-95 Atari Corp. **
***********************************
Output file is jaghello.cof
Sizes: Text Data Bss Syms
(hex) 3C0 410 FA54 1C00
Link complete.
$ file jaghello.cof
jaghello.cof: mc68k COFF object not stripped
$ sha256sum jaghello.cof
f9c8269cdc998de01c0ac7a3e815c16b7ced106e25f10f92a7078c722a220dbb jaghello.cof
We have reproduced jaghello.cof
identically with out newly-minted aln.elf
port, as per the requirements laid out in part 1.
The files for this case study can be found here: case-study.tar.gz
Conclusion
We have successfully ported the whole of aln
from a Linux a.out executable to a modern Linux ELF executable.
Next time, we’ll rip out the old C standard library embedded within aln
and twist it further away from its roots.