<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="4.3.2">Jekyll</generator><link href="https://boricj.net/feed.xml" rel="self" type="application/atom+xml" /><link href="https://boricj.net/" rel="alternate" type="text/html" /><updated>2024-02-23T06:00:06+01:00</updated><id>https://boricj.net/feed.xml</id><title type="html">boricj’s entropy-increasing blog</title><subtitle>Increasing entropy inside the universe.</subtitle><author><name>Jean-Baptiste Boric</name><email>jean.baptiste.boric@gmail.com</email></author><entry><title type="html">Decompiling Tenchu: Stealth Assassins part 2: grabbing the files</title><link href="https://boricj.net/tenchu1/2024/02/19/part-2.html" rel="alternate" type="text/html" title="Decompiling Tenchu: Stealth Assassins part 2: grabbing the files" /><published>2024-02-19T01:00:00+01:00</published><updated>2024-02-19T01:00:00+01:00</updated><id>https://boricj.net/tenchu1/2024/02/19/part-2</id><content type="html" xml:base="https://boricj.net/tenchu1/2024/02/19/part-2.html"><![CDATA[<p><a href="/tenchu1/2024/02/12/part-1.html">Previously</a> in this <a href="/tenchu1/2024/02/05/introduction.html">series of articles</a>, we’ve answered some philosophical questions about reverse-engineering and practical ones about <em>Tenchu: Stealth Assassins</em>.
It’s time to start progress on this project by acquiring files to reverse-engineer, starting with the games themselves.</p>

<h2 id="acquiring-artifacts">Acquiring artifacts</h2>

<p>First order of business is to acquire the original media in a format we can work with.
There are a couple of options available here:</p>
<ul>
  <li>buying then downloading it from a digital store if available ;</li>
  <li>acquiring a physical copy (most likely on the second hand market in this case) and using a CD drive to rip it to an image file.</li>
</ul>

<div class="box box-warning">
  <p>Other ways of “acquiring” old video games may have legal issues surronding them depending on your jurisdiction.
Such issues will not be covered here.</p>
</div>

<p>That being said, let’s assume that we do have a complete set of ISOs for the various versions of the game.
We <em>could</em> go straight for the reverse-engineering part, but where’s the fun in that?</p>

<h2 id="running-the-artifacts">Running the artifacts</h2>

<p>To run <em>Tenchu: Stealth Assassins</em> in a controlled environment, I’ll use DuckStation, a modern PlayStation 1 emulator.
The main reason is that I <a href="https://github.com/stenzek/duckstation/pull/1244">contributed a GDB stub</a> a couple of years ago, which will be useful later on to introspect the game at runtime through a debugger.</p>

<p>Getting it to run on a Debian 11 system is a bit of a chore:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ sudo apt-get install \
    build-essential \
    cmake \
    ninja-build \
    libsdl2-dev \
    libcurl4-openssl-dev \
    libqt6-base-dev \
    qt6-base-dev \
    qt6-tools-dev \
    qt6-base-private-dev
</code></pre></div></div>

<p>It takes some amount of patching to manage to get it to build on this system:</p>

<div class="language-diff highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="gh">diff --git a/CMakeLists.txt b/CMakeLists.txt
index 47d00879..6d5fe750 100644
</span><span class="gd">--- a/CMakeLists.txt
</span><span class="gi">+++ b/CMakeLists.txt
</span><span class="p">@@ -47,7 +47,7 @@</span> endif()
 
 # Required libraries.
 if(ENABLE_SDL2)
<span class="gd">-  find_package(SDL2 2.28.5 REQUIRED)
</span><span class="gi">+  find_package(SDL2 2.0 REQUIRED)
</span> endif()
 if(NOT WIN32 AND NOT ANDROID)
   find_package(CURL REQUIRED)
<span class="p">@@ -59,7 +59,7 @@</span> if(NOT WIN32 AND NOT ANDROID)
   endif()
 endif()
 if(BUILD_QT_FRONTEND)
<span class="gd">-  find_package(Qt6 6.5.3 COMPONENTS Core Gui Widgets Network LinguistTools REQUIRED)
</span><span class="gi">+  find_package(Qt6 6.4.0 COMPONENTS Core Gui Widgets Network LinguistTools REQUIRED)
</span> endif()
 
 
<span class="gh">diff --git a/src/duckstation-qt/logwindow.cpp b/src/duckstation-qt/logwindow.cpp
index 64b95616..7d4cdcbf 100644
</span><span class="gd">--- a/src/duckstation-qt/logwindow.cpp
</span><span class="gi">+++ b/src/duckstation-qt/logwindow.cpp
</span><span class="p">@@ -274,7 +274,7 @@</span> void LogWindow::logCallback(void* pUserParam, const char* channelName, const cha
 
   QString qmessage;
   qmessage.reserve(message.length() + 1);
<span class="gd">-  qmessage.append(QUtf8StringView(message.data(), message.length()));
</span><span class="gi">+  qmessage.append(message.data());
</span>   qmessage.append(QChar('\n'));
 
   const QLatin1StringView qchannel((level &lt;= LOGLEVEL_PERF) ? functionName : channelName);
<span class="gh">diff --git a/src/util/sdl_input_source.cpp b/src/util/sdl_input_source.cpp
index 4db534bb..a07a3209 100644
</span><span class="gd">--- a/src/util/sdl_input_source.cpp
</span><span class="gi">+++ b/src/util/sdl_input_source.cpp
</span><span class="p">@@ -255,7 +255,7 @@</span> void SDLInputSource::SetHints()
   }
 
   SDL_SetHint(SDL_HINT_JOYSTICK_HIDAPI_PS4_RUMBLE, m_controller_enhanced_mode ? "1" : "0");
<span class="gd">-  SDL_SetHint(SDL_HINT_JOYSTICK_HIDAPI_PS5_RUMBLE, m_controller_enhanced_mode ? "1" : "0");
</span><span class="gi">+  //SDL_SetHint(SDL_HINT_JOYSTICK_HIDAPI_PS5_RUMBLE, m_controller_enhanced_mode ? "1" : "0");
</span>   // Enable Wii U Pro Controller support
   // New as of SDL 2.26, so use string
   SDL_SetHint("SDL_JOYSTICK_HIDAPI_WII", "1");
</code></pre></div></div>

<p>But after that, it builds and runs without <em>too</em> many issues:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ cmake -B build/ -G Ninja .
$ ninja -C build/
$ ln -s ~/Documents/duckstation/build/bin/duckstation-qt ~/.local/bin/duckstation-qt
</code></pre></div></div>

<div class="fullwindow-zoom-click">
    <label>
        <input type="checkbox" onclick="onclick_fullWindowZoom(event)" />
        <div class="fullwindow-zoom-click-content">
            <figure>
                <img src="/tenchu1/assets/part-2/duckstation.png" />
                <figcaption>Figure 1: DuckStation</figcaption>
            </figure>
        </div>
    </label>
</div>

<div class="box box-information">
  <p>The main reason for this janky setup instead of using the official prebuilt artifacts is to enable local development of DuckStation.
In particular, I anticipate that the bare-bones GDB stub I wrote several years ago will probably need some improvements to support this decompilation project.</p>
</div>

<h2 id="extracting-a-playstation-iso">Extracting a PlayStation ISO</h2>

<p>So, we have an image of a video game and we even checked that it works.
At the moment it’s just a big blob of bytes, but we can start tearing it apart with <a href="https://github.com/m35/jpsxdec">jPSXdec</a>, a tool to extract and convert files from PlayStation titles, free for <a href="https://github.com/m35/jpsxdec/blob/readme/.github/LICENSE.md">non-commercial uses</a>.</p>

<p>After grabbing the <a href="https://github.com/m35/jpsxdec/releases">latest release</a>, we can run the tool on the first track of <em>Rittai Ninja Katsugeki Tenchu: Shinobi Gaisen</em> and extract the files:</p>

<div class="fullwindow-zoom-click">
    <label>
        <input type="checkbox" onclick="onclick_fullWindowZoom(event)" />
        <div class="fullwindow-zoom-click-content">
            <figure>
                <img src="/tenchu1/assets/part-2/jpsxdec.png" />
                <figcaption>Figure 2: jPSXdec</figcaption>
            </figure>
        </div>
    </label>
</div>

<p>We get a whole bunch of files, which we can quickly triage as follows:</p>
<ul>
  <li><code class="language-plaintext highlighter-rouge">SYSTEM.CNF</code>: this text file contains <a href="https://problemkaputt.de/psxspx-cdrom-file-playstation-exe-and-system-cnf.htm">instructions</a> for the BIOS on how to launch the game ;</li>
  <li><code class="language-plaintext highlighter-rouge">SLPS_019.01</code>: this is the PSX-EXE executable to launch as specified by <code class="language-plaintext highlighter-rouge">SYSTEM.CNF</code> ;</li>
  <li><code class="language-plaintext highlighter-rouge">TENCHU/*.EXE</code>: these are executable files ;</li>
  <li><code class="language-plaintext highlighter-rouge">TENCHU/MOVIE/*</code>: these are full motion videos ;</li>
  <li><code class="language-plaintext highlighter-rouge">TENCHU/XA/*</code>: these are audio files used for music and cutscenes ;</li>
  <li><code class="language-plaintext highlighter-rouge">TENCHU/DATA.VOL</code>: this seems to be an archive file in an unknown format, as jPSXdec detects over 880 images contained inside of it.</li>
</ul>

<div class="box box-information">
  <p>Most, but not all versions of <em>Tenchu: Stealth Assassins</em> follow this naming pattern.</p>
</div>

<h2 id="whats-that-file">What’s that file?</h2>

<p>There is one file that jPSXdec can’t process: <code class="language-plaintext highlighter-rouge">TENCHU/CD.CA</code>.
This file is located inside the second track and therefore jPSXdec can’t extract it when opening the first track.
It also fails to recognize any file or data when we open the second track.</p>

<p>Hmmm…
What does the cue file for this CD tells us?</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ cat 'Rittai Ninja Katsugeki - Tenchu - Shinobi Gaisen (Japan).cue'
FILE "Rittai Ninja Katsugeki - Tenchu - Shinobi Gaisen (Japan) (Track 1).bin" BINARY
  TRACK 01 MODE2/2352
    INDEX 01 00:00:00
FILE "Rittai Ninja Katsugeki - Tenchu - Shinobi Gaisen (Japan) (Track 2).bin" BINARY
  TRACK 02 AUDIO
    INDEX 00 00:00:00
    INDEX 01 00:02:00
</code></pre></div></div>

<p>It’s most likely a CD audio track.
Assuming that the track file contains raw CD audio data, we can play it using the <code class="language-plaintext highlighter-rouge">play</code> tool from <code class="language-plaintext highlighter-rouge">sox</code> with the right parameters:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ sudo apt-get install sox
$ play --type raw --channels 2 --rate 44100 --encoding signed --bits 16 --endian little 'Rittai Ninja Katsugeki - Tenchu - Shinobi Gaisen (Japan) (Track 2).bin'
play WARN alsa: can't encode 0-bit Unknown or not applicable

Rittai Ninja Katsugeki - Tenchu - Shinobi Gaisen (Japan) (Track 2).bin:

 File Size: 7.81M     Bit Rate: 1.41M
  Encoding: Signed PCM    
  Channels: 2 @ 16-bit   
Samplerate: 44100Hz      
Replaygain: off         
  Duration: 00:00:44.29
In:100%  00:00:44.29 [00:00:00.00] Out:1.95M [      |      ] Hd:0.1 Clip:0    
Done.
</code></pre></div></div>

<div class="box box-warning">
  <p>Be <strong>very</strong> careful when playing raw files as audio, especially when using headphones.
A wrong guess can produce ear-shattering static noise.</p>
</div>

<p>It’s a voice recording of a conversation between Rikimaru, Ayame and Princess Kiku in Japanese.
I do not speak this language and therefore can’t understand what is being said, but I can hear the words “game disc”, “PlayStation” and “CD player”.
This is most likely a message played when the disc is inserted in a CD player, telling the user to put it in a PlayStation system.</p>

<div class="box box-information">
  <p>This was a common type of easter egg for CD-ROM video games in the 1990s.
Some well-known examples are <a href="https://www.youtube.com/watch?v=CQLkslOsAiM">Skies of Arcadia</a> for the Dreamcast or <a href="https://www.youtube.com/watch?v=slwmJLaVrdE">Lego Island</a> for the PC.</p>
</div>

<h2 id="conclusion">Conclusion</h2>

<p>We have the game, we have extracted its files and we have uncovered an audio easter egg when triaging them.
Next time, we’ll set up a reverse-engineering environment with Ghidra so that we can start figuring out how this game actually runs.</p>

<div style="display: grid; grid-template-columns: 1fr 1fr; margin-top: 1ch; margin-bottom: 1ch;">
<div style="padding-right: 15px;">


<a class="prev" href="/tenchu1/2024/02/12/part-1.html">&laquo; Decompiling Tenchu: Stealth Assassins part 1: what's decompiling?</a>

</div>
<div style="padding-left: 15px; text-align: right;">

</div>
</div>]]></content><author><name>Jean-Baptiste Boric</name></author><category term="tenchu1" /><summary type="html"><![CDATA[Previously in this series of articles, we’ve answered some philosophical questions about reverse-engineering and practical ones about Tenchu: Stealth Assassins. It’s time to start progress on this project by acquiring files to reverse-engineer, starting with the games themselves.]]></summary></entry><entry><title type="html">Decompiling Tenchu: Stealth Assassins part 1: what’s decompiling?</title><link href="https://boricj.net/tenchu1/2024/02/12/part-1.html" rel="alternate" type="text/html" title="Decompiling Tenchu: Stealth Assassins part 1: what’s decompiling?" /><published>2024-02-12T13:00:00+01:00</published><updated>2024-02-12T13:00:00+01:00</updated><id>https://boricj.net/tenchu1/2024/02/12/part-1</id><content type="html" xml:base="https://boricj.net/tenchu1/2024/02/12/part-1.html"><![CDATA[<p>After <a href="/tenchu1/2024/02/05/introduction.html">laying some context for this project</a>, it’s time to <strong>not</strong> start the decompilation of <em>Tenchu: Stealth Assassins</em> just yet.
We need to do some recon first before jumping head-first into this project and answer some important questions.</p>

<div class="box box-information">
  <p>This is my own personal take on a topic I’ve never attempted before.
It is not intended to be an autoritative document on the subject but rather a brain-dump on how I approached this problem.</p>
</div>

<h2 id="what-is-decompilation">What is decompilation?</h2>

<p>Let’s take this diagram from <a href="/reverse-engineering/2023/05/15/part-2.html">part 2</a> of my series of articles about reverse-engineering:</p>
<div class="fullwindow-zoom-click">
    <label>
        <input type="checkbox" onclick="onclick_fullWindowZoom(event)" />
        <div class="fullwindow-zoom-click-content">
            <figure>
                <img src="/reverse-engineering/assets/diagram-case-study-build-workflow.svg" />
                <figcaption>Figure 1: Diagram of the ASCII table case study build workflow.</figcaption>
            </figure>
        </div>
    </label>
</div>

<p>When a developer built a program, the <em>original</em> source code of the program went through a number of steps (compiler, assembler, linker…) before the artifact we’re interested in was produced.
Each step is lossy, with less and less information retained at is goes down the chain, leaving us with just the end result.</p>

<p>Decompilation is the act of taking a binary artifact (on the right) and creating some sort of source code out of it (on the left).
Depending on the goal and effort spent, it can go from a loosely inspired reimplementation using the original artifact as a reference, all the way to a source tree and build procedure that can reproduce byte-for-byte the original artifact.</p>

<div class="box box-information">
  <p>I’m also including reimplementation (clean-room or otherwise) as an option alongside decompilation interchangeably in this article.
While technically different from a pure decompilation effort, the end result is essentially the same for the end-user.</p>
</div>

<p>Even perfect decompilation can’t recover things like comments or (if debugging symbols are missing) variables names, due to the lossy nature of toolchains.
It’s a recreation of <strong>one</strong> <em>possible</em> source tree, not a recovery of <strong>the</strong> <em>original</em> source code.</p>

<h2 id="so-i-want-to-decompile-a-video-game">So I want to decompile a video game?</h2>

<p>At this point, it’s time to start asking questions and gather as much intel as possible.</p>

<div class="box box-warning">
  <p>Performing acts of reverse-engineering or decompilation may have legal issues surrounding them depending on your jurisdiction.
Such issues will not be covered here.</p>
</div>

<h3 id="can-i-decompile">Can I decompile?</h3>

<p>Reverse-engineering in general use skills that are uncommonly used during everyday software development work, which itself is a prerequisite in my opinion (it’s hard to pick apart a program if you don’t know how they are made in the first place).
That being said, there are plenty of resources online like <a href="https://www.youtube.com/@nathanbaggs">videos</a>, <a href="https://nee.lv/2021/02/28/How-I-cut-GTA-Online-loading-times-by-70/">write-ups</a> or <a href="https://neuviemeporte.github.io/category/f15-se2">other decompilation projects</a> that show the ropes.</p>

<p>But like any skill it also takes practice to learn. You might want to start with smaller projects or <a href="/reverse-engineering/2023/05/01/introduction.html">case studies</a> if you lack experience before taking on a big project.
Additionally, prior knowledge of things like the instruction set architecture or technical details about the target platform will be very helpful, as programs do not execute in a vaccum.</p>

<p>Ultimately, there is only one way to find out if you can indeed decompile: give it a try.
At least you’re bound to learn something on the way, regardless of the outcome.</p>

<h3 id="what-to-decompile">What to decompile?</h3>

<p>Before jumping head first into the thick of it, have a look around:</p>

<ul>
  <li>Did someone already do or is doing a decompilation or reimplementation project?</li>
  <li>Has the source code been open-sourced or leaked at some point?</li>
  <li>Are there multiple versions, builds or ports of the video game, including demos or betas?</li>
  <li>Does the game use third-party libraries or SDKs?</li>
  <li>Is there a build with <a href="https://www.retroreversing.com/ps1-debug-symbols">debugging symbols or a linker map file</a>?</li>
  <li>Is there a build with a debug menu or <a href="https://tcrf.net/Tenchu:_Stealth_Assassins">leftover data</a>?</li>
  <li>Are there any editors, mods, hacks or fan translations?</li>
  <li>Are there any GameShark codes or RetroAchievements?</li>
  <li>…</li>
</ul>

<p>Looking around might help uncover important artifacts or helpful knowledge that you might miss otherwise.
Needless to say, it is also necessary to acquire artifacts before one can start reverse-engineering or decompiling them.</p>

<p>Here, we have four main releases, not distinguishing between various revisions and demos:</p>
<ul>
  <li><em>Rittai Ninja Katsugeki Tenchu</em>, the original JP release with eight levels ;</li>
  <li><em>Tenchu: Stealth Assassins</em>, released for NA and EU with two additional levels and localizations ;</li>
  <li><em>Rittai Ninja Katsugeki Tenchu: Shinobi Gaisen</em>, an updated re-release for JP with a level editor ;</li>
  <li><em>Rittai Ninja Katsugeki Tenchu: Shinobi Hyakusen</em>, a standalone level pack released for JP.</li>
</ul>

<p>As mentioned before, the game is a PlayStation exclusive, a <a href="https://www.copetti.org/writings/consoles/playstation/">well-documented</a> video game console with lots of resources online about it.
It also uses <a href="https://www.psxdev.net/help/psyq_install.html">PSY-Q</a>, the official PlayStation SDK used back in the day to make commercial games.
I did not find any debugging symbols or map files, but there are lots of <a href="https://www.cheatcc.com/articles/tenchu-stealth-assassins-cheats-codes-cheat-codes-for-playstation-psx-psx-codes/">GameShark codes</a>, <a href="https://retroachievements.org/game/11300">RetroAchievements</a> and an extensive <a href="https://gamefaqs.gamespot.com/ps/198911-tenchu-stealth-assassins/faqs/4279">debug menu</a>.</p>

<h3 id="how-to-decompile">How to decompile?</h3>

<p>Unless you enjoy staring at raw hexadecimal dumps, you probably want to use tools to help the reverse-engineering and decompilation project.
There are <em>lots and lots</em> of tools out there for various purposes (disassembler, decompiler, asset extractors, level editors, …), to the point where it would be futile to try and make a list here.</p>

<p>Be on the lookout for anything that could be useful in case someone already covered that use-case.
Check out other similar decompilation projects for inspiration and guidance.
You might also need to write some tools yourself at some point.</p>

<p>I will introduce tools as they are used, but most of the reverse-engineering work will likely be done with <a href="https://ghidra-sre.org/">Ghidra</a>, an open-source software reverse engineering framework.</p>

<h3 id="where-to-decompile">Where to decompile?</h3>

<p>Simply put, what is the end goal?</p>
<ul>
  <li>Is it a pristine preservation effort of the original
game, like for <a href="https://github.com/n64decomp/sm64">Super Mario 64</a>?</li>
  <li>Is it a port of the vanilla experience but with bux fixes and improvements, like for <a href="https://github.com/OpenDriver2/REDRIVER2">REDRIVER2</a>?</li>
  <li>Is it an effort to recreate and modernize a classic game, like for <a href="https://www.openra.net/">Command &amp; Conquer</a>?</li>
  <li>…</li>
</ul>

<p>For this project, I’m aiming for something roughly similar in spirit to REDRIVER2.
Most of the effort is expected to be centered around <em>Rittai Ninja Katsugeki Tenchu: Shinobi Gaisen</em>, which is the latest release of this game.
However, I’ll also take a look at other editions as I see fit, like looting the title screen from <em>Tenchu: Stealth Assassin</em> for example.</p>

<h2 id="conclusion">Conclusion</h2>

<p>We went through most of the interrogative words regarding this project and came up with some answers.
Next time, we’ll actually start reverse-engineering activities by extracting assets from artifacts.</p>

<div style="display: grid; grid-template-columns: 1fr 1fr; margin-top: 1ch; margin-bottom: 1ch;">
<div style="padding-right: 15px;">


<a class="prev" href="/tenchu1/2024/02/05/introduction.html">&laquo; Decompiling Tenchu: Stealth Assassins: introduction</a>

</div>
<div style="padding-left: 15px; text-align: right;">


<a href="/tenchu1/2024/02/19/part-2.html">Decompiling Tenchu: Stealth Assassins part 2: grabbing the files &raquo;</a>

</div>
</div>]]></content><author><name>Jean-Baptiste Boric</name></author><category term="tenchu1" /><summary type="html"><![CDATA[After laying some context for this project, it’s time to not start the decompilation of Tenchu: Stealth Assassins just yet. We need to do some recon first before jumping head-first into this project and answer some important questions.]]></summary></entry><entry><title type="html">Decompiling Tenchu: Stealth Assassins: introduction</title><link href="https://boricj.net/tenchu1/2024/02/05/introduction.html" rel="alternate" type="text/html" title="Decompiling Tenchu: Stealth Assassins: introduction" /><published>2024-02-05T13:00:00+01:00</published><updated>2024-02-05T13:00:00+01:00</updated><id>https://boricj.net/tenchu1/2024/02/05/introduction</id><content type="html" xml:base="https://boricj.net/tenchu1/2024/02/05/introduction.html"><![CDATA[<p><em>Tenchu: Stealth Assassins</em> is a PlayStation stealth action video game from 1998.</p>

<p>I could go on a poetic throwback to my childhood and describe in details a nostalgia feeling for a 25 years old entertainment product, displayed on a 1980s CRT televisions and rhythmed by the noises of a PlayStation optical drive (something that emulators fail to recreate), but I’ll spare you the brunt of it.</p>

<p>What I will say is that this is inspired by video game decompilations projects of the past few years, like <a href="https://github.com/n64decomp/sm64">Super Mario 64</a> and <a href="https://github.com/OpenDriver2/REDRIVER2">Driver 2</a>.
However, I do not have it in me to make a perfect, byte-for-byte decompilation like the Super Mario 64 one.
Neither do I want to stare at a Ghidra window for an eternity, trying to blindly reimplement a source tree for a game <a href="https://youtu.be/caKC1s2nhyg?si=qbG0v28C3yUn09mb&amp;t=1665">held together with glue and duct tape</a>.</p>

<p>These considerations led me on a lengthy side-quest to try and find a solution to this conundrum…
But now, after spending <a href="https://github.com/boricj/ghidra-unlinker-scripts">nearly</a> <a href="https://github.com/boricj/ghidra/tree/feature/elfrelocatebleobjectexporter">two</a> <a href="https://github.com/boricj/ghidra-delinker-extension">years</a> prototyping tools and writing <a href="/reverse-engineering/2023/05/01/introduction.html">case</a> <a href="/atari-jaguar-sdk/2023/11/27/introduction.html">studies</a>, I think I have found the edge I’ve been looking for.</p>

<div class="box box-warning">
  <p>My objective is to create a port of this game by <em>delinking</em> it back into object files and then crafting a working Linux MIPS executable out of the pieces.
From there, I intend to rewrite each object file <em>Ship of Theseus</em>-style until I have a complete and portable source code tree.</p>
</div>

<p>As far as I can tell this is unlike any other decompilation project out there, so this is completely uncharted territory.
Hopefully, this approach will allow me to divide-and-conquer this effort into manageable chunks and make iterative progress possible, in order to keep myself motivated.</p>

<div class="box box-information">
  <p><em>Lord Gohda expects much of you.
The enemy will be skillful and ruthless.
You must be prepared, physically and spiritually.
You must focus, you must train hard and train well.</em></p>
</div>

<ul>
  
    <li>
      <a href="/tenchu1/2024/02/05/introduction.html">Decompiling Tenchu: Stealth Assassins: introduction</a>
    </li>
  
    <li>
      <a href="/tenchu1/2024/02/12/part-1.html">Decompiling Tenchu: Stealth Assassins part 1: what's decompiling?</a>
    </li>
  
    <li>
      <a href="/tenchu1/2024/02/19/part-2.html">Decompiling Tenchu: Stealth Assassins part 2: grabbing the files</a>
    </li>
  
</ul>

<div style="display: grid; grid-template-columns: 1fr 1fr; margin-top: 1ch; margin-bottom: 1ch;">
<div style="padding-right: 15px;">

</div>
<div style="padding-left: 15px; text-align: right;">


<a href="/tenchu1/2024/02/12/part-1.html">Decompiling Tenchu: Stealth Assassins part 1: what's decompiling? &raquo;</a>

</div>
</div>]]></content><author><name>Jean-Baptiste Boric</name></author><category term="tenchu1" /><summary type="html"><![CDATA[Tenchu: Stealth Assassins is a PlayStation stealth action video game from 1998.]]></summary></entry><entry><title type="html">Porting the Atari Jaguar SDK part 5: I have a feeling we’re not on Linux anymore</title><link href="https://boricj.net/atari-jaguar-sdk/2024/01/02/part-5.html" rel="alternate" type="text/html" title="Porting the Atari Jaguar SDK part 5: I have a feeling we’re not on Linux anymore" /><published>2024-01-02T13:00:00+01:00</published><updated>2024-01-02T13:00:00+01:00</updated><id>https://boricj.net/atari-jaguar-sdk/2024/01/02/part-5</id><content type="html" xml:base="https://boricj.net/atari-jaguar-sdk/2024/01/02/part-5.html"><![CDATA[<p><a href="/atari-jaguar-sdk/2024/01/01/part-4.html">Previously</a> in this <a href="/atari-jaguar-sdk/2023/11/27/introduction.html">series of articles</a>, we turned <code class="language-plaintext highlighter-rouge">aln</code>, a statically-linked Linux a.out program, into a dynamically-linked Linux ELF program with the power of delinking.
In this part, we’ll make a native port of <code class="language-plaintext highlighter-rouge">aln</code> to Windows, despite not having access to its source code.</p>

<div class="box box-warning">
  <p>This blog post demonstrates a complete disregard towards ABIs, computer engineering conventions and common sense in the name of science.
Its contents may be dangerous to junior programmers, academics or the faint of the heart.</p>

  <p>As such, this is your one and only warning to look away before I start butcherin’.</p>
</div>

<h2 id="windows-whats-that">Windows? What’s that?</h2>

<p>Normally, porting software requires access to its source code in order to build it for a new platform.
Here, we do not have such luxuries and must make do with just a binary executable artifact for Linux.
From the previous parts we have created <code class="language-plaintext highlighter-rouge">aln.o</code>, a “normal” Linux ELF object file of a complete program created with <a href="/reverse-engineering/2023/08/28/part-10.html">the power of delinking</a> from the carcass of <code class="language-plaintext highlighter-rouge">aln</code>, an actual Linux program.</p>

<p>Here comes our first problem: how do we get from an ELF object file for Linux to a Windows executable?</p>

<h3 id="for-want-of-a-coff-file">For want of a COFF file</h3>

<p>Traditionally, Windows toolchains use the COFF file format for objects.
<a href="https://github.com/boricj/ghidra-delinker-extension">My Ghidra extension</a> currently only has an object file exporter for the ELF file format.
In theory, I would have to implement a full-blown COFF object file exporter in order to produce an object file that Windows toolchains can grok, but that’s a <em>lot</em> of work.</p>

<p>Fortunately for me, I’ve forsaken tradition in this series of articles and there is one toolchain targeting Windows out there that can ingest ELF object files: MinGW-w64.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ sudo apt-get install mingw-w64
</code></pre></div></div>

<div class="box box-information">
  <p>Don’t worry, the final artifact will be a <em>bona fide</em> PE executable for Windows, but I’ll take whatever shortcuts I can in order to finish this in a reasonable amount of time.</p>

  <p>This is also why I’ll run <code class="language-plaintext highlighter-rouge">aln.exe</code> with WINE on Linux at first: I’m not going to bother dealing with Windows until I have to.</p>
</div>

<p>Now, we’ll take <code class="language-plaintext highlighter-rouge">aln.o</code> and the couple of stubs from the last part and just let’er rip:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ i686-w64-mingw32-gcc -g -no-pie -fno-pic -o aln.exe aln.o ctype.o FUN_0000fba0.c DAT_0001de20.c
/usr/bin/i686-w64-mingw32-ld: aln.o: in function `FUN_00001584':
aln.o:(.text+0x51f): undefined reference to `IO_printf'
/usr/bin/i686-w64-mingw32-ld: aln.o:(.text+0x529): undefined reference to `puts'
/usr/bin/i686-w64-mingw32-ld: aln.o:(.text+0x533): undefined reference to `puts'
/usr/bin/i686-w64-mingw32-ld: aln.o:(.text+0x53d): undefined reference to `puts'
/usr/bin/i686-w64-mingw32-ld: aln.o:(.text+0x547): undefined reference to `puts'
/usr/bin/i686-w64-mingw32-ld: aln.o:(.text+0x551): undefined reference to `puts'
/usr/bin/i686-w64-mingw32-ld: aln.o:aln.o:(.text+0x55b): more undefined references to `puts' follow
...
/usr/bin/i686-w64-mingw32-ld: aln.o: in function `FUN_0000fa3c':
aln.o:(.text+0xe9b3): undefined reference to `malloc'
/usr/bin/i686-w64-mingw32-ld: /usr/lib/gcc/i686-w64-mingw32/10-win32/../../../../i686-w64-mingw32/lib/../lib/libmingw32.a(lib32_libmingw32_a-crt0_c.o): in function `main':
./build/i686-w64-mingw32-i686-w64-mingw32-crt/./mingw-w64-crt/crt/crt0_c.c:18: undefined reference to `WinMain@16'
</code></pre></div></div>

<p>Well, basically none of the undefined symbols inside <code class="language-plaintext highlighter-rouge">aln.o</code> matched anything inside MinGW-w64 and no, it’s not just because I’m using a Linux object file on a Windows toolchain.</p>

<p>When I originally exported <code class="language-plaintext highlighter-rouge">aln.o</code>, I actually stripped the leading underscore from all symbols.
That’s useful when going from glibc 1.xx to glibc 2.xx since modern Linux toolchains don’t use an underscore prefix for C symbol names.
Unfortunately, that’s not the case for MinGW-w64, which does prefix C symbols with an underscore.</p>

<p>Rather than re-exporting <code class="language-plaintext highlighter-rouge">aln.o</code> with the original symbol names left intact, I’ll just use <code class="language-plaintext highlighter-rouge">objcopy</code> to remove the leading underscore from all undefined symbols.
Since I’ll have to <a href="/atari-jaguar-sdk/2024/01/01/part-4.html">make adjustments anyways</a>, I might as well be lazy about it.</p>

<div class="box box-warning">
  <p>In theory, I’m trying to mix two different things that aren’t ABI compatible.
In practice, the calling convention happens to be the same and <code class="language-plaintext highlighter-rouge">aln</code> uses a subset of POSIX that MinGW-w64 does provide, so I’ll start by aliasing equivalent symbols together and hope for the best.</p>

  <p>Remember, the goal of this series is to trick <code class="language-plaintext highlighter-rouge">aln</code> into linking a sample program successfully on new environments.
<em>Anything</em> goes as long as it works, no matter how wrong it is.</p>
</div>

<p>We’ll skip anything that’s just an underscore prefix away and go through the remaining troublemakers.</p>

<h3 id="fun_0000fba0-and-dat_0001de20">FUN_0000fba0() and DAT_0001de20</h3>

<p>The linker doesn’t seem to find the stubs from the previous parts, despite the fact that I do provide these source files on the command line.</p>

<p>I’m not sure why, possibly a side-effect of mixing ELF and COFF object files, but I don’t care: I just want to smash the bits together.
Let’s just cheese it by prefixing these symbols with an underscore:</p>

<table>
  <thead>
    <tr>
      <th>Old symbol name</th>
      <th>New name (MinGW-w64)</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">DAT_0001de20</code></td>
      <td><code class="language-plaintext highlighter-rouge">_DAT_0001de20</code></td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">FUN_0000fba0</code></td>
      <td><code class="language-plaintext highlighter-rouge">_FUN_0000fba0</code></td>
    </tr>
  </tbody>
</table>

<h3 id="index--rindex">index() &amp; rindex()</h3>

<p>MinGW-w64 doesn’t appear to provide these two functions:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">char</span> <span class="o">*</span><span class="nf">index</span><span class="p">(</span><span class="k">const</span> <span class="kt">char</span> <span class="o">*</span><span class="n">s</span><span class="p">,</span> <span class="kt">int</span> <span class="n">c</span><span class="p">);</span>
<span class="kt">char</span> <span class="o">*</span><span class="nf">rindex</span><span class="p">(</span><span class="k">const</span> <span class="kt">char</span> <span class="o">*</span><span class="n">s</span><span class="p">,</span> <span class="kt">int</span> <span class="n">c</span><span class="p">);</span>
</code></pre></div></div>

<p>These functions search for a character in an ASCIIZ string and come from 4.3BSD.
They were marked legacy in POSIX.1-2001 and removed in POSIX.1-2008, with a recommendation to migrate to <code class="language-plaintext highlighter-rouge">strchr()</code> and <code class="language-plaintext highlighter-rouge">strrchr()</code> respectively:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">char</span> <span class="o">*</span><span class="nf">strchr</span><span class="p">(</span><span class="k">const</span> <span class="kt">char</span> <span class="o">*</span><span class="n">s</span><span class="p">,</span> <span class="kt">int</span> <span class="n">c</span><span class="p">);</span>
<span class="kt">char</span> <span class="o">*</span><span class="nf">strrchr</span><span class="p">(</span><span class="k">const</span> <span class="kt">char</span> <span class="o">*</span><span class="n">s</span><span class="p">,</span> <span class="kt">int</span> <span class="n">c</span><span class="p">);</span>
</code></pre></div></div>

<p>Luckily for us, these are drop-in compatible substitutions and MinGW-w64 provides these replacements.
Let’s rename the symbols accordingly:</p>

<table>
  <thead>
    <tr>
      <th>Old symbol name</th>
      <th>New name (MinGW-w64)</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">index</code></td>
      <td><code class="language-plaintext highlighter-rouge">_strchr</code></td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">rindex</code></td>
      <td><code class="language-plaintext highlighter-rouge">_strrchr</code></td>
    </tr>
  </tbody>
</table>

<h3 id="stdout-fprintf--co">stdout, fprintf() &amp; Co.</h3>

<p>How does the C standard I/O functions work under MinGW-w64?
I could go spelunking the source code to find out, but I can just ask the toolchain directly by compiling a sample program and analyzing the artifact:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">#include</span> <span class="cpf">&lt;stdio.h&gt;</span><span class="cp">
</span>
<span class="kt">int</span> <span class="nf">main</span><span class="p">(</span><span class="kt">int</span> <span class="n">argc</span><span class="p">,</span> <span class="kt">char</span> <span class="o">**</span><span class="n">argv</span><span class="p">)</span> <span class="p">{</span>
    <span class="n">fprintf</span><span class="p">(</span><span class="n">stdout</span><span class="p">,</span> <span class="s">"Hello, world!</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span>
    <span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Let’s build it with MinGW-w64 and disassemble the object file with <code class="language-plaintext highlighter-rouge">objdump</code>:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ i686-w64-mingw32-gcc hello-world.c -c -o hello-world.o
$ i686-w64-mingw32-objdump --reloc --disassemble hello-world.o

hello-world.o:     file format pe-i386


Disassembly of section .text:

00000000 &lt;_fprintf&gt;:
   0:   55                      push   %ebp
   1:   89 e5                   mov    %esp,%ebp
   3:   83 ec 28                sub    $0x28,%esp
   6:   8d 45 10                lea    0x10(%ebp),%eax
   9:   89 45 f0                mov    %eax,-0x10(%ebp)
   c:   8b 45 f0                mov    -0x10(%ebp),%eax
   f:   89 44 24 08             mov    %eax,0x8(%esp)
  13:   8b 45 0c                mov    0xc(%ebp),%eax
  16:   89 44 24 04             mov    %eax,0x4(%esp)
  1a:   8b 45 08                mov    0x8(%ebp),%eax
  1d:   89 04 24                mov    %eax,(%esp)
  20:   e8 00 00 00 00          call   25 &lt;_fprintf+0x25&gt;
                        21: DISP32      ___mingw_vfprintf
  25:   89 45 f4                mov    %eax,-0xc(%ebp)
  28:   8b 45 f4                mov    -0xc(%ebp),%eax
  2b:   c9                      leave  
  2c:   c3                      ret    

0000002d &lt;_main&gt;:
  2d:   55                      push   %ebp
  2e:   89 e5                   mov    %esp,%ebp
  30:   83 e4 f0                and    $0xfffffff0,%esp
  33:   83 ec 10                sub    $0x10,%esp
  36:   e8 00 00 00 00          call   3b &lt;_main+0xe&gt;
                        37: DISP32      ___main
  3b:   c7 04 24 01 00 00 00    movl   $0x1,(%esp)
  42:   a1 00 00 00 00          mov    0x0,%eax
                        43: dir32       __imp____acrt_iob_func
  47:   ff d0                   call   *%eax
  49:   c7 44 24 04 00 00 00    movl   $0x0,0x4(%esp)
  50:   00 
                        4d: dir32       .rdata
  51:   89 04 24                mov    %eax,(%esp)
  54:   e8 a7 ff ff ff          call   0 &lt;_fprintf&gt;
  59:   b8 00 00 00 00          mov    $0x0,%eax
  5e:   c9                      leave  
  5f:   c3                      ret
</code></pre></div></div>

<p>In MinGW-w64 land, <code class="language-plaintext highlighter-rouge">stdout</code> appears to be a macro that expands to <code class="language-plaintext highlighter-rouge">__imp____acrt_iob_func(1)</code>.
The standard output stream is retrieved through a function call, unlike with glibc where it is a direct data reference.</p>

<p>Well, that’s not how glibc does things <em>at all</em>.
No mere symbol renaming can get us out of this mess, this means it’s time to <strong>shim</strong>:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">#include</span> <span class="cpf">&lt;fcntl.h&gt;</span><span class="cp">
#include</span> <span class="cpf">&lt;stdio.h&gt;</span><span class="cp">
#include</span> <span class="cpf">&lt;stdarg.h&gt;</span><span class="cp">
#include</span> <span class="cpf">&lt;sys/stat.h&gt;</span><span class="cp">
#include</span> <span class="cpf">&lt;unistd.h&gt;</span><span class="cp">
</span>
<span class="kt">void</span><span class="o">*</span> <span class="n">stdin_placeholder</span><span class="p">;</span>
<span class="kt">void</span><span class="o">*</span> <span class="n">stdout_placeholder</span><span class="p">;</span>
<span class="kt">void</span><span class="o">*</span> <span class="n">stderr_placeholder</span><span class="p">;</span>

<span class="k">static</span> <span class="kt">FILE</span><span class="o">*</span> <span class="nf">normalize_file</span><span class="p">(</span><span class="kt">FILE</span><span class="o">*</span> <span class="n">fp</span><span class="p">)</span>
<span class="p">{</span>
    <span class="k">if</span> <span class="p">((</span><span class="kt">void</span><span class="o">*</span><span class="p">)</span><span class="n">fp</span> <span class="o">==</span> <span class="o">&amp;</span><span class="n">stdin_placeholder</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">fp</span> <span class="o">=</span> <span class="n">stdin</span><span class="p">;</span>
    <span class="p">}</span> <span class="k">else</span> <span class="k">if</span> <span class="p">((</span><span class="kt">void</span><span class="o">*</span><span class="p">)</span><span class="n">fp</span> <span class="o">==</span> <span class="o">&amp;</span><span class="n">stdout_placeholder</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">fp</span> <span class="o">=</span> <span class="n">stdout</span><span class="p">;</span>
    <span class="p">}</span> <span class="k">else</span> <span class="k">if</span> <span class="p">((</span><span class="kt">void</span><span class="o">*</span><span class="p">)</span><span class="n">fp</span> <span class="o">==</span> <span class="o">&amp;</span><span class="n">stderr_placeholder</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">fp</span> <span class="o">=</span> <span class="n">stderr</span><span class="p">;</span>
    <span class="p">}</span>

    <span class="k">return</span> <span class="n">fp</span><span class="p">;</span>
<span class="p">}</span>

<span class="kt">int</span> <span class="nf">fprintf_shim</span><span class="p">(</span><span class="kt">FILE</span><span class="o">*</span> <span class="n">fp</span><span class="p">,</span> <span class="k">const</span> <span class="kt">char</span><span class="o">*</span> <span class="n">fmt</span><span class="p">,</span> <span class="p">...)</span>
<span class="p">{</span>
    <span class="kt">va_list</span> <span class="n">argptr</span><span class="p">;</span>
    <span class="n">va_start</span><span class="p">(</span><span class="n">argptr</span><span class="p">,</span> <span class="n">fmt</span><span class="p">);</span>

    <span class="kt">int</span> <span class="n">ret</span> <span class="o">=</span> <span class="n">vfprintf</span><span class="p">(</span><span class="n">normalize_file</span><span class="p">(</span><span class="n">fp</span><span class="p">),</span> <span class="n">fmt</span><span class="p">,</span> <span class="n">argptr</span><span class="p">);</span>
    <span class="n">va_end</span><span class="p">(</span><span class="n">argptr</span><span class="p">);</span>

    <span class="k">return</span> <span class="n">ret</span><span class="p">;</span>
<span class="p">}</span>

<span class="kt">int</span> <span class="nf">fflush_shim</span><span class="p">(</span><span class="kt">FILE</span><span class="o">*</span> <span class="n">fp</span><span class="p">)</span>
<span class="p">{</span>
    <span class="k">return</span> <span class="n">fflush</span><span class="p">(</span><span class="n">normalize_file</span><span class="p">(</span><span class="n">fp</span><span class="p">));</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Instead of aliasing a symbol directly to a native MinGW-w64 one, we’ll redirect them to <em>ad hoc</em> implementations.
With functions in particular, this allows up to bridge the gap between two incompatible ABIs, for example by replacing placeholder values with the real ones before forwarding to the underlying implementation:</p>

<table>
  <thead>
    <tr>
      <th>Old symbol name</th>
      <th>New name (MinGW-w64)</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">stdin</code></td>
      <td><code class="language-plaintext highlighter-rouge">_stdin_placeholder</code></td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">stdout</code></td>
      <td><code class="language-plaintext highlighter-rouge">_stdout_placeholder</code></td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">stderr</code></td>
      <td><code class="language-plaintext highlighter-rouge">_stderr_placeholder</code></td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">_IO_fflush</code></td>
      <td><code class="language-plaintext highlighter-rouge">_fflush_shim</code></td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">_IO_fprintf</code></td>
      <td><code class="language-plaintext highlighter-rouge">_fprintf_shim</code></td>
    </tr>
  </tbody>
</table>

<h3 id="open">open()</h3>

<p>At this point, the MinGW-w64 toolchain manages to link a Windows program:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ file aln.o 
aln.o: ELF 32-bit LSB relocatable, Intel 80386, invalid version (SYSV), not stripped
$ i686-linux-gnu-objcopy --redefine-syms=glibc-1xx-to-mingw32.syms aln.o aln.mingw32.o
$ i686-w64-mingw32-gcc -g -no-pie -fno-pic -o aln.exe aln.mingw32.o ctype.o FUN_0000fba0.c DAT_0001de20.c mingw32-shims.c
$ file aln.exe 
aln.exe: PE32 executable (console) Intel 80386, for MS Windows
</code></pre></div></div>

<p>However, <code class="language-plaintext highlighter-rouge">aln.exe</code> can’t actually link the sample program from the Atari Jaguar community SDK:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ rm -f jaghello.cof &amp;&amp; make LINK="wine aln.exe" V=1
wine aln.exe -e -l -g2 -rd -a 4000 x x -v -o jaghello.cof startup.o jag.o
***********************************
*   ATARI LINKER (Mar 17 1995)    *
*  Adds from Atari version 1.11   *
*     and PC/DOS&amp;Linux ports      *
*  Copyright 1993-95 Brainstorm.  *
** Copyright 1987-95 Atari Corp. **
***********************************
Output file is jaghello.cof
File read error on startup.o
Link aborted.
make: *** [Makefile:12: jaghello.cof] Error 1
</code></pre></div></div>

<p>After digging a bit, this is a triple case of subtle incompatibilities:</p>
<ul>
  <li>POSIX specifies a <code class="language-plaintext highlighter-rouge">O_BINARY</code> flag for <code class="language-plaintext highlighter-rouge">open()</code> to specify that we want to open a file in binary mode and <code class="language-plaintext highlighter-rouge">aln</code> doesn’t set it.
On Linux there is no difference between text mode and binary mode, but that’s not the case on Windows (the file isn’t read correctly).</li>
  <li>Even if we account for this, the <em>values</em> of the flags for the second argument of <code class="language-plaintext highlighter-rouge">open()</code> are different on <a href="https://elixir.bootlin.com/linux/v6.6.8/source/include/uapi/asm-generic/fcntl.h#L19">Linux</a> and <a href="https://github.com/mingw-w64/mingw-w64/blob/95a2cf4b7260200c4a60f4f1d1127e72799a3541/mingw-w64-headers/crt/fcntl.h#L13-L33">MinGW-w64</a>, leading to further problems down the road.
POSIX mandates the behavior of these constants, but does not actually mandate their values.</li>
  <li>Similarly, the values for the third argument (permissions) are different.</li>
</ul>

<p>Therefore, even though the function signatures of <code class="language-plaintext highlighter-rouge">open()</code> happen to be compatible between the two platforms, we still need to shim it to deal with these mismatches:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">int</span> <span class="nf">open_shim</span><span class="p">(</span><span class="k">const</span> <span class="kt">char</span> <span class="o">*</span><span class="n">filename</span><span class="p">,</span> <span class="kt">int</span> <span class="n">oflag</span><span class="p">,</span> <span class="kt">int</span> <span class="n">pmode</span><span class="p">)</span>
<span class="p">{</span>
    <span class="kt">int</span> <span class="n">realoflag</span> <span class="o">=</span> <span class="p">(</span><span class="n">oflag</span> <span class="o">&amp;</span> <span class="mo">00000003</span><span class="p">)</span> <span class="o">|</span> <span class="n">O_BINARY</span><span class="p">;</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">oflag</span> <span class="o">&amp;</span> <span class="mo">00000100</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">realoflag</span> <span class="o">=</span> <span class="n">O_CREAT</span><span class="p">;</span>
    <span class="p">}</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">oflag</span> <span class="o">&amp;</span> <span class="mo">00001000</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">realoflag</span> <span class="o">=</span> <span class="n">O_TRUNC</span><span class="p">;</span>
    <span class="p">}</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">oflag</span> <span class="o">&amp;</span> <span class="mo">00002000</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">realoflag</span> <span class="o">=</span> <span class="n">O_APPEND</span><span class="p">;</span>
    <span class="p">}</span>
    <span class="k">return</span> <span class="n">open</span><span class="p">(</span><span class="n">filename</span><span class="p">,</span> <span class="n">realoflag</span><span class="p">,</span> <span class="n">S_IREAD</span> <span class="o">|</span> <span class="n">S_IWRITE</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>

<table>
  <thead>
    <tr>
      <th>Old symbol name</th>
      <th>New name (MinGW-w64)</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">open</code></td>
      <td><code class="language-plaintext highlighter-rouge">_open_shim</code></td>
    </tr>
  </tbody>
</table>

<p>Besides converting the bitflags, we’ll also fix the first issue by forcing <code class="language-plaintext highlighter-rouge">O_BINARY</code> in our shim.
Technically this is a bug in <code class="language-plaintext highlighter-rouge">aln</code> and this should be fixed at the source (or rather binary patched at the artifact in this case), but I’m not going to bother touching <code class="language-plaintext highlighter-rouge">aln.o</code> unless I have to.</p>

<h2 id="throwing-aln-at-the-windows">Throwing aln at the Windows</h2>

<p>After the last round of mutilations, we finally have our chimera up and running:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ rm -f jaghello.cof &amp;&amp; make LINK="wine aln.exe" V=1
wine aln.exe -e -l -g2 -rd -a 4000 x x -v -o jaghello.cof startup.o jag.o
***********************************
*   ATARI LINKER (Mar 17 1995)    *
*  Adds from Atari version 1.11   *
*     and PC/DOS&amp;Linux ports      *
*  Copyright 1993-95 Brainstorm.  *
** Copyright 1987-95 Atari Corp. **
***********************************
Output file is jaghello.cof

Sizes:   Text   Data    Bss   Syms
(hex)     3C0    410   FA54   1C00

Link complete.
$ sha256sum jaghello.cof
f9c8269cdc998de01c0ac7a3e815c16b7ced106e25f10f92a7078c722a220dbb  jaghello.cof
</code></pre></div></div>

<p>Who needs source code to make software ports anyways?
Just throw the bits at the virtual address space and apply glue until everything sticks.</p>

<p>Yes, it does work on Windows too:</p>

<div class="fullwindow-zoom-click">
    <label>
        <input type="checkbox" onclick="onclick_fullWindowZoom(event)" />
        <div class="fullwindow-zoom-click-content">
            <figure>
                <img src="/atari-jaguar-sdk/assets/part-5/aln-windows-11.png" />
                <figcaption>Figure 1: aln.exe running on Windows 11</figcaption>
            </figure>
        </div>
    </label>
</div>

<div class="box box-information">
  <p>The files for this case study can be found here: <a href="/atari-jaguar-sdk/assets/case-study.tar.gz" download="">case-study.tar.gz</a></p>
</div>

<h2 id="conclusion">Conclusion</h2>

<p>We have ported <code class="language-plaintext highlighter-rouge">aln</code> from a Linux a.out executable to a native Windows PE executable.
This concludes this series of articles about how to leverage the delinking technique to make software ports without having access to the original source code.</p>

<p>Together with my series of <a href="/reverse-engineering/2023/05/01/introduction.html">series of articles on reverse-engineering</a>, by now hopefully I have demonstrated that delinking is a powerful and practical technique for reverse-engineering.
What was an obscure ritual <a href="https://news.ycombinator.com/item?id=35729232&amp;p=3#35740761">mastered by few</a> is now within reach of the everyday reverse-engineer, thanks to <a href="https://github.com/boricj/ghidra-delinker-extension">my Ghidra extension</a> which automates the process of crafting object files from bits of programs down to a couple of mouse clicks.</p>

<div style="display: grid; grid-template-columns: 1fr 1fr; margin-top: 1ch; margin-bottom: 1ch;">
<div style="padding-right: 15px;">


<a class="prev" href="/atari-jaguar-sdk/2024/01/01/part-4.html">&laquo; Porting the Atari Jaguar SDK part 4: where we're going, we don't need the C standard library</a>

</div>
<div style="padding-left: 15px; text-align: right;">

</div>
</div>]]></content><author><name>Jean-Baptiste Boric</name></author><category term="atari-jaguar-sdk" /><summary type="html"><![CDATA[Previously in this series of articles, we turned aln, a statically-linked Linux a.out program, into a dynamically-linked Linux ELF program with the power of delinking. In this part, we’ll make a native port of aln to Windows, despite not having access to its source code.]]></summary></entry><entry><title type="html">Porting the Atari Jaguar SDK part 4: where we’re going, we don’t need the C standard library</title><link href="https://boricj.net/atari-jaguar-sdk/2024/01/01/part-4.html" rel="alternate" type="text/html" title="Porting the Atari Jaguar SDK part 4: where we’re going, we don’t need the C standard library" /><published>2024-01-01T13:00:00+01:00</published><updated>2024-01-01T13:00:00+01:00</updated><id>https://boricj.net/atari-jaguar-sdk/2024/01/01/part-4</id><content type="html" xml:base="https://boricj.net/atari-jaguar-sdk/2024/01/01/part-4.html"><![CDATA[<p><a href="/atari-jaguar-sdk/2023/12/18/part-3.html">Previously</a> in this <a href="/atari-jaguar-sdk/2023/11/27/introduction.html">series of articles</a>, we ported <code class="language-plaintext highlighter-rouge">aln</code> as a whole from a Linux a.out executable to a modern Linux ELF executable, <code class="language-plaintext highlighter-rouge">aln.elf</code>.
However, that executable still contains the old, glibc 1.xx C standard library <code class="language-plaintext highlighter-rouge">aln</code> was originally built with.
If we are to make more ambitious ports of <code class="language-plaintext highlighter-rouge">aln</code>, we need to get rid of it.</p>

<h2 id="static-library-switcharoo">Static library switcharoo</h2>

<p>This time, rather than just basically repackaging <code class="language-plaintext highlighter-rouge">aln</code> from the a.out file format to ELF, we’ll swap out the C standard library <code class="language-plaintext highlighter-rouge">aln</code> was statically linked with a contemporary glibc 2.xx one.
To do so, instead of exporting the whole of <code class="language-plaintext highlighter-rouge">aln</code> as an object file (<code class="language-plaintext highlighter-rouge">aln.whole.o</code>) like we did previously, we’ll export <code class="language-plaintext highlighter-rouge">aln</code> without its C standard library bits (<code class="language-plaintext highlighter-rouge">aln.o</code>) instead, then link it as if it was a normal, everyday object file.</p>

<p>We’ll separate the different components of <code class="language-plaintext highlighter-rouge">aln</code> inside of a new program tree.
After creating folders and memory fragments, then triaging the program bits by dragging and dropping selections of the program into memory fragments, we end up with the following pieces:</p>

<div class="fullwindow-zoom-click">
    <label>
        <input type="checkbox" onclick="onclick_fullWindowZoom(event)" />
        <div class="fullwindow-zoom-click-content">
            <figure>
                <img src="/atari-jaguar-sdk/assets/part-4/aln-program-trees-object-files.png" />
                <figcaption>Figure 1: Sliced up program trees for aln</figcaption>
            </figure>
        </div>
    </label>
</div>

<p>Slicing up <code class="language-plaintext highlighter-rouge">aln</code> in that manner makes it easy to export each piece by right-clicking on a piece and selecting <code class="language-plaintext highlighter-rouge">Select Addresses</code>, then exporting the selection as an object file with <a href="https://github.com/boricj/ghidra-delinker-extension">my Ghidra extension</a>, like in <a href="/atari-jaguar-sdk/2023/12/18/part-3.html">the previous part</a>.</p>

<p>After exporting <code class="language-plaintext highlighter-rouge">aln.o</code>, we can inspect the undefined symbols for this object file with <code class="language-plaintext highlighter-rouge">nm</code>:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ i686-linux-gnu-nm --undefined-only aln.o
         U calloc
         U close
         U _ctype_b
         U _ctype_tolower
         U _ctype_toupper
         U DAT_0001de20
         U exit
         U free
         U FUN_0000fba0
         U getenv
         U index
         U _IO_fflush
         U _IO_fprintf
         U _IO_gets
         U _IO_printf
         U longjmp
         U lseek
         U malloc
         U memmove
         U memset
         U open
         U puts
         U read
         U realloc
         U rindex
         U scanf
         U _setjmp
         U sprintf
         U stderr
         U stdout
         U strcat
         U strcmp
         U strcpy
         U strncmp
         U strncpy
         U write
</code></pre></div></div>

<p>If we can provide compatible replacements for every undefined symbol in this file, then <code class="language-plaintext highlighter-rouge">aln.o</code> will link successfully and <em>should</em> run, regardless of the provenance of these replacements.</p>

<h3 id="what-could-possibly-go-wrong">What could possibly go wrong?</h3>

<p>For starters, let’s link <code class="language-plaintext highlighter-rouge">aln.o</code> statically as if this was a normal, everyday object file:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ i686-linux-gnu-gcc -static -o aln.static.elf aln.o
/usr/lib/gcc-cross/i686-linux-gnu/10/../../../../i686-linux-gnu/bin/ld: aln.o: in function `FUN_00002824':
aln.o:(.text+0x1867): undefined reference to `_ctype_toupper'
/usr/lib/gcc-cross/i686-linux-gnu/10/../../../../i686-linux-gnu/bin/ld: aln.o:(.text+0x1a42): undefined reference to `_ctype_toupper'
/usr/lib/gcc-cross/i686-linux-gnu/10/../../../../i686-linux-gnu/bin/ld: aln.o:(.text+0x1b22): undefined reference to `_ctype_toupper'
/usr/lib/gcc-cross/i686-linux-gnu/10/../../../../i686-linux-gnu/bin/ld: aln.o:(.text+0x1c02): undefined reference to `_ctype_toupper'
/usr/lib/gcc-cross/i686-linux-gnu/10/../../../../i686-linux-gnu/bin/ld: aln.o:(.text+0x2194): undefined reference to `_ctype_toupper'
/usr/lib/gcc-cross/i686-linux-gnu/10/../../../../i686-linux-gnu/bin/ld: aln.o:aln.o:(.text+0x2572): more undefined references to `_ctype_toupper' follow
/usr/lib/gcc-cross/i686-linux-gnu/10/../../../../i686-linux-gnu/bin/ld: aln.o: in function `main':
aln.o:(.text+0x3123): undefined reference to `FUN_0000fba0'
/usr/lib/gcc-cross/i686-linux-gnu/10/../../../../i686-linux-gnu/bin/ld: aln.o: in function `FUN_00004bc4':
aln.o:(.text+0x3bce): undefined reference to `_ctype_b'
/usr/lib/gcc-cross/i686-linux-gnu/10/../../../../i686-linux-gnu/bin/ld: aln.o: in function `FUN_00004c74':
aln.o:(.text+0x3c61): undefined reference to `_ctype_tolower'
/usr/lib/gcc-cross/i686-linux-gnu/10/../../../../i686-linux-gnu/bin/ld: aln.o:(.text+0x3c7e): undefined reference to `_ctype_b'
collect2: error: ld returned 1 exit status
make: *** [Makefile:11: aln.static.elf] Error 1
</code></pre></div></div>

<p>Well, it wouldn’t be worthy of a blog article if it was <em>that</em> simple…</p>

<h3 id="just-put-some-flex-tape-stubs-on-it">Just put some <del>Flex TAPE®</del> stubs on it</h3>

<p>We have several undefined references to fix:</p>
<ul>
  <li><code class="language-plaintext highlighter-rouge">_ctype_toupper</code>, <code class="language-plaintext highlighter-rouge">_ctype_tolower</code> and <code class="language-plaintext highlighter-rouge">_ctype_b</code> are global variables, part of glibc’s 1.xx implementation of <code class="language-plaintext highlighter-rouge">ctype.h</code> ;</li>
  <li><code class="language-plaintext highlighter-rouge">FUN_0000fba0</code> is as far as I can tell an internal initialization function of the C standard library that is called from <code class="language-plaintext highlighter-rouge">main</code> for some reason.</li>
</ul>

<p>The first issue occurs because we’re trying to use glibc 2.xx from the host Linux system, yet <code class="language-plaintext highlighter-rouge">aln</code> was originally built with glibc 1.xx ; what we see here are the first signs of ABI mismatches, where the expectations of the original program don’t line up with its new environment.</p>

<p>Instead of fixing this at the source (which we don’t have), we’ll cheese it by stubbing out <code class="language-plaintext highlighter-rouge">FUN_0000fba0</code> and borrowing the reconstructed <code class="language-plaintext highlighter-rouge">ctype.o</code> from <code class="language-plaintext highlighter-rouge">aln</code> into <code class="language-plaintext highlighter-rouge">aln.elf</code>:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ cat FUN_0000fba0.c
void FUN_0000fba0() {
}
$ i686-linux-gnu-gcc -static -o aln.static.elf aln.o ctype.o FUN_0000fba0.c
$ file aln.static.elf 
aln.static.elf: ELF 32-bit LSB executable, Intel 80386, version 1 (GNU/Linux), statically linked, BuildID[sha1]=004086f9962fe01ffdf12dab5f3c264773cf922b, for GNU/Linux 3.2.0, with debug_info, not stripped
</code></pre></div></div>

<p>We have a successful link!</p>

<div class="box box-warning">
  <p>We’re not aiming for purity here but rather pragmatism: as long as <code class="language-plaintext highlighter-rouge">aln</code> can be tricked into running successfully, <em>anything</em> goes.</p>
</div>

<p>Now that we have an <code class="language-plaintext highlighter-rouge">aln.static.elf</code> file, let’s try something simple like printing out the help message with debug mode on:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ ./aln.static.elf -z -?
Option `?'
Segmentation fault (core dumped)
</code></pre></div></div>

<p>Hmm…
It managed to write out some output, but it crashed fairly quickly.</p>

<h3 id="you-say-application-binary-interface-i-say-ɛ̃tɛɾfasə-daplikasjõ-binɛɾə">You say “Application Binary Interface”, I say [ɛ̃tɛɾfasə daplikasjõ binɛɾə]</h3>

<p>Let’s see what’s happening with GDB:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ gdb-multiarch --args ./aln.static.elf -z -?
...
Reading symbols from ./aln.static.elf...
(gdb) run
Starting program: /home/boricj/Documents/atari-sdk-elf/aln.static.elf -z -\?
Option `?'

Program received signal SIGSEGV, Segmentation fault.
0x0806d6e8 in fflush ()
(gdb) backtrace
#0  0x0806d6e8 in fflush ()
#1  0x0804b6b6 in ?? ()
#2  0x0804d005 in ?? ()
#3  0x08059528 in __libc_start_main ()
#4  0x08049ce2 in _start ()
(gdb) 
</code></pre></div></div>

<p>GDB isn’t telling much, but we have a thread to pull on, <code class="language-plaintext highlighter-rouge">fflush()</code>:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>(gdb) break fflush
Breakpoint 1 at 0x806d674
(gdb) r
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: /home/boricj/Documents/atari-sdk-elf/aln.static.elf -z -\?
Option `?'

Breakpoint 1, 0x0806d674 in fflush ()
(gdb) x/4wx $ebp
0xffffd188:     0xffffd1d0      0x0804b6b6      0x08118bb0      0xffffd314
(gdb) info symbol 0x08118bb0
stdout in section .data of /home/boricj/Documents/atari-sdk-elf/aln.static.elf
(gdb) x/wx &amp;stdout
0x8118bb0 &lt;stdout&gt;:     0x08118940
(gdb) x/wx (void*)stdout
0x8118940 &lt;_IO_2_1_stdout_&gt;:    0xfbad2a84
</code></pre></div></div>

<p>GDB isn’t providing much information, but we can recover what we need by hand.
With the standard calling convention on i386, the arguments to a function are passed on the stack.
They can be inspected using the frame pointer register, starting at <code class="language-plaintext highlighter-rouge">$ebp+8</code>.
<code class="language-plaintext highlighter-rouge">fflush()</code> takes a single <code class="language-plaintext highlighter-rouge">FILE*</code> argument and we can deduce that its raw value here is <code class="language-plaintext highlighter-rouge">0x08118bb0</code>, a pointer to <code class="language-plaintext highlighter-rouge">stdout</code>.</p>

<p>Looking inside glibc’s source code, we can see that the opaque <a href="https://elixir.bootlin.com/glibc/glibc-2.38/source/libio/bits/types/FILE.h#L7"><code class="language-plaintext highlighter-rouge">FILE</code> structure</a> from the C standard library is a typedef for the <a href="https://elixir.bootlin.com/glibc/glibc-2.38/source/libio/bits/types/struct_FILE.h#L49"><code class="language-plaintext highlighter-rouge">_IO_FILE</code> structure</a> private to glibc.
This structure begins with <a href="https://elixir.bootlin.com/glibc/glibc-2.38/source/libio/libio.h#L66">the magic number <code class="language-plaintext highlighter-rouge">0xFBAD</code></a>.
The 32-bit word at <code class="language-plaintext highlighter-rouge">0x8118940 &lt;_IO_2_1_stdout_&gt;</code> contains it, but the 32-bit word at <code class="language-plaintext highlighter-rouge">0x8118bb0 &lt;stdout&gt;</code> is a pointer to the former.</p>

<p>Somehow, things got mixed up where a <code class="language-plaintext highlighter-rouge">FILE**</code> value was passed to <code class="language-plaintext highlighter-rouge">fflush()</code> instead of a <code class="language-plaintext highlighter-rouge">FILE*</code> as expected, which led to the segmentation fault.
To fix this, we need to rename the symbol <code class="language-plaintext highlighter-rouge">stdout</code> to <code class="language-plaintext highlighter-rouge">_IO_2_1_stdout_</code> so that <code class="language-plaintext highlighter-rouge">fflush()</code> gets called with the correct value.</p>

<p>Rather than adjusting it inside the Ghidra database (whose contents are supposedly correct for an executable with glibc 1.xx), we’ll rename it after the fact with <code class="language-plaintext highlighter-rouge">objcopy</code>:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ cat glibc-1xx-to-glibc-2xx.syms
stdin           _IO_2_1_stdin_
stdout          _IO_2_1_stdout_
stderr          _IO_2_1_stderr_
$ i686-linux-gnu-objcopy --redefine-syms=glibc-1xx-to-glibc-2xx.syms aln.o aln.glibc20.o
$ i686-linux-gnu-gcc -g -static -no-pie -fno-pic -o aln.static.elf aln.glibc20.o ctype.o FUN_0000fba0.c
</code></pre></div></div>

<p><code class="language-plaintext highlighter-rouge">glibc-1xx-to-glibc-2xx.syms</code> is a text file containing all symbols to redefine, one per line:</p>

<table>
  <thead>
    <tr>
      <th>Old symbol name</th>
      <th>New name (glibc 2.xx)</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">stdin</code></td>
      <td><code class="language-plaintext highlighter-rouge">_IO_2_1_stdin_</code></td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">stdout</code></td>
      <td><code class="language-plaintext highlighter-rouge">_IO_2_1_stdout_</code></td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">stderr</code></td>
      <td><code class="language-plaintext highlighter-rouge">_IO_2_1_stderr_</code></td>
    </tr>
  </tbody>
</table>

<div class="box box-warning">
  <p>We’re trying to mend ABIs together that aren’t supposed to be compatible with one another.
As noted above, <em>anything goes</em> as long as it runs, no matter how dubious the hacks are.</p>
</div>

<p>With that out of the way, let’s see if it made a difference:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ ./aln.static.elf -z -?
Option `?'
Usage: ./aln.static.elf [-options] &lt;files|-x file|-i[i] &lt;fname&gt; &lt;label&gt;&gt;
Where options are:
?: print this
a &lt;text&gt; &lt;data&gt; &lt;bss&gt;: output absolute file
        hex value: segment address
        r: relocatable segment
        x: contiguous segment
b: don't remove multiply defined local labels
d: wait for key after link
...
No object files to link.
</code></pre></div></div>

<p>Good, it’s no longer crashing.
What about our nominal test case, linking the “Hello, world!” sample from the Atari Jaguar community SDK?</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ rm -f jaghello.cof &amp;&amp; make LINK=aln.static.elf V=1
aln.static.elf -e -l -g2 -rd -a 4000 x x -v -o jaghello.cof startup.o jag.o
***********************************
*   ATARI LINKER (Mar 17 1995)    *
*  Adds from Atari version 1.11   *
*     and PC/DOS&amp;Linux ports      *
*  Copyright 1993-95 Brainstorm.  *
** Copyright 1987-95 Atari Corp. **
***********************************
Output file is jaghello.cof
make: *** [Makefile:12: jaghello.cof] Segmentation fault (core dumped)
</code></pre></div></div>

<p><em>Sigh.</em></p>

<p>It’s never that simple, is it?</p>

<h3 id="despair-kicks-in">Despair kicks in</h3>

<p>Let’s run <code class="language-plaintext highlighter-rouge">aln</code> with the debug mode on to see what’s going on:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ aln.static.elf -z -e -l -g2 -rd -a 4000 x x -v -o jaghello.cof startup.o jag.o
...
***********************************
*   ATARI LINKER (Mar 17 1995)    *
*  Adds from Atari version 1.11   *
*     and PC/DOS&amp;Linux ports      *
*  Copyright 1993-95 Brainstorm.  *
** Copyright 1987-95 Atari Corp. **
***********************************
...
bsddoobj(startup.o,0x854f240,&lt;none&gt;)
bsdaddsyms for file startup.o
add_global(gSetOLP,startup.o,80,a200,GLOBAL)
...
add_global(_vidmem,startup.o,50,a100,GLOBAL)
add_unresolved(___main,startup.o)
bsddoobj(jag.o,0x8552420,&lt;none&gt;)
AlignSect: off=0X20, size=0X14C, align=0X10, padd=0x4
AlignSect: off=0X170, size=0X40C, align=0X10, padd=0x4
bsdaddsyms for file jag.o
add_global(_textfont,jag.o,0,a400,GLOBAL)
...
add_global(___main,jag.o,100,a200,GLOBAL)
Initialize symbols
add_local(_TEXT_E,(null))
...
add_global(_BSS_E,(null),0,a100,GLOBAL)
find_global(___main) =&gt;  global  in BSD object jag.o
DOUNRESOLVED
DOCOMMON
add_local(_jagscreen,(null))
DOSYM(startup.o)
add_local(DRAM,(null))
...
add_local(VC,(null))
Segmentation fault (core dumped)
</code></pre></div></div>

<p>It’s crashing in the middle of the linking process.
What can GDB tell us?</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ gdb-multiarch --args aln.static.elf -e -l -g2 -rd -a 4000 x x -v -o jaghello.cof startup.o jag.o
...
Reading symbols from aln.static.elf...
(gdb) r
Starting program: /home/boricj/Documents/atari-sdk-elf/aln.static.elf -e -l -g2 -rd -a 4000 x x -v -o jaghello.cof startup.o jag.o
***********************************
*   ATARI LINKER (Mar 17 1995)    *
*  Adds from Atari version 1.11   *
*     and PC/DOS&amp;Linux ports      *
*  Copyright 1993-95 Brainstorm.  *
** Copyright 1987-95 Atari Corp. **
***********************************
Output file is jaghello.cof

Program received signal SIGSEGV, Segmentation fault.
0x0804d801 in FUN_00004a04 ()
(gdb) bt
#0  0x0804d801 in FUN_00004a04 ()
#1  0x0805760f in ?? ()
#2  0x08057922 in ?? ()
#3  0x08057c7a in ?? ()
#4  0x0804d179 in LAB_00004388 ()
#5  0x08118000 in ?? ()
#6  0x08059528 in __libc_start_main ()
#7  0x08049ce2 in _start ()
(gdb) 
</code></pre></div></div>

<p><em>Oh no.</em></p>

<p>It’s not crashing inside glibc, it’s crashing <em>deep</em> inside <code class="language-plaintext highlighter-rouge">aln</code>.
That means I have no obvious threads to pull on to figure this one out…</p>

<div class="box box-information">
  <p>I’ll admit, this one’s got me stumped for quite a long time, chasing dead-ends one after another.
I’ll skip ahead to the resolution.</p>
</div>

<p><code class="language-plaintext highlighter-rouge">FUN_00004a04()</code> seems to be part of a hash table implementation and its decompilation contains the following line:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">iVar1</span> <span class="o">=</span> <span class="o">*</span><span class="p">(</span><span class="kt">int</span> <span class="o">*</span><span class="p">)(</span><span class="o">&amp;</span><span class="n">DAT_0001de20</span> <span class="o">+</span> <span class="p">(</span><span class="n">uVar2</span> <span class="o">&amp;</span> <span class="mh">0xff</span><span class="p">)</span> <span class="o">*</span> <span class="mi">4</span><span class="p">);</span>
</code></pre></div></div>

<p>From this, we can assume that <code class="language-plaintext highlighter-rouge">DAT_0001de20</code> is an array of 256 elements, each 4 bytes wide, for a total of 1024 bytes.
Say, what does that array contain?</p>

<div class="tabs">
<input type="radio" name="gdb-DAT_0001de20-aln" id="tab-gdb-DAT_0001de20-aln" checked="" />
<label for="tab-gdb-DAT_0001de20-aln">DAT_0001de20 (aln.elf)</label>
<div class="tab"><div class="language-plaintext highlighter-rouge">
<div class="highlight">
<pre class="highlight">
<code>(gdb) info address DAT&#95;0001de20
Symbol "DAT&#95;0001de20" is at 0x8065e20 in a file compiled without debugging.
(gdb) x/256wx 0x8065e20
0x8065e20 &lt;DAT&#95;0001de20&gt;:       0x00000000      0x00000000      0x00000000      0x00000000
0x8065e30:      0x00000000      0x00000000      0x00000000      0x00000000
0x8065e40:      0x00000000      0x00000000      0x00000000      0x00000000
0x8065e50:      0x00000000      0x00000000      0x00000000      0x00000000
0x8065e60:      0x00000000      0x00000000      0x00000000      0x00000000
0x8065e70:      0x00000000      0x08078ee0      0x00000000      0x00000000
0x8065e80:      0x00000000      0x00000000      0x00000000      0x08078e20
0x8065e90:      0x00000000      0x08078da0      0x00000000      0x08078f40
0x8065ea0:      0x00000000      0x00000000      0x08078ea0      0x00000000
0x8065eb0:      0x08078ec0      0x00000000      0x00000000      0x00000000
0x8065ec0:      0x00000000      0x00000000      0x00000000      0x00000000
0x8065ed0:      0x00000000      0x00000000      0x00000000      0x00000000
0x8065ee0:      0x00000000      0x00000000      0x00000000      0x00000000
0x8065ef0:      0x00000000      0x00000000      0x00000000      0x00000000
0x8065f00:      0x00000000      0x00000000      0x00000000      0x00000000
0x8065f10:      0x00000000      0x00000000      0x08078d20      0x00000000
0x8065f20:      0x00000000      0x00000000      0x00000000      0x00000000
0x8065f30:      0x00000000      0x00000000      0x00000000      0x00000000
0x8065f40:      0x08078de0      0x00000000      0x00000000      0x00000000
0x8065f50:      0x00000000      0x00000000      0x00000000      0x00000000
0x8065f60:      0x00000000      0x00000000      0x00000000      0x00000000
0x8065f70:      0x00000000      0x00000000      0x00000000      0x08078e00
0x8065f80:      0x00000000      0x08078fa0      0x00000000      0x00000000
0x8065f90:      0x00000000      0x00000000      0x00000000      0x08078e80
0x8065fa0:      0x00000000      0x00000000      0x08078d40      0x08078fe0
0x8065fb0:      0x08078e60      0x00000000      0x08078000      0x00000000
0x8065fc0:      0x00000000      0x00000000      0x00000000      0x00000000
0x8065fd0:      0x00000000      0x00000000      0x00000000      0x08078f00
0x8065fe0:      0x08078dc0      0x00000000      0x00000000      0x00000000
0x8065ff0:      0x00000000      0x00000000      0x00000000      0x00000000
0x8066000:      0x00000000      0x00000000      0x00000000      0x00000000
0x8066010:      0x00000000      0x00000000      0x00000000      0x00000000
0x8066020:      0x00000000      0x00000000      0x00000000      0x00000000
0x8066030:      0x00000000      0x00000000      0x00000000      0x00000000
0x8066040:      0x00000000      0x00000000      0x00000000      0x08078f20
0x8066050:      0x00000000      0x00000000      0x00000000      0x00000000
0x8066060:      0x00000000      0x00000000      0x00000000      0x00000000
0x8066070:      0x00000000      0x00000000      0x00000000      0x00000000
0x8066080:      0x00000000      0x08078fc0      0x00000000      0x00000000
0x8066090:      0x00000000      0x00000000      0x00000000      0x00000000
0x80660a0:      0x00000000      0x00000000      0x00000000      0x00000000
0x80660b0:      0x00000000      0x00000000      0x00000000      0x00000000
0x80660c0:      0x08078e40      0x00000000      0x00000000      0x00000000
0x80660d0:      0x00000000      0x00000000      0x00000000      0x00000000
0x80660e0:      0x00000000      0x00000000      0x00000000      0x00000000
0x80660f0:      0x00000000      0x00000000      0x00000000      0x08078f60
0x8066100:      0x00000000      0x00000000      0x00000000      0x00000000
0x8066110:      0x00000000      0x00000000      0x00000000      0x00000000
0x8066120:      0x00000000      0x00000000      0x00000000      0x00000000
0x8066130:      0x00000000      0x00000000      0x00000000      0x00000000
0x8066140:      0x00000000      0x00000000      0x00000000      0x00000000
0x8066150:      0x00000000      0x00000000      0x00000000      0x00000000
0x8066160:      0x00000000      0x00000000      0x00000000      0x00000000
0x8066170:      0x00000000      0x00000000      0x00000000      0x00000000
0x8066180:      0x00000000      0x00000000      0x00000000      0x00000000
0x8066190:      0x00000000      0x00000000      0x00000000      0x00000000
0x80661a0:      0x00000000      0x00000000      0x00000000      0x00000000
0x80661b0:      0x00000000      0x00000000      0x00000000      0x00000000
0x80661c0:      0x00000000      0x00000000      0x00000000      0x08078d60
0x80661d0:      0x00000000      0x00000000      0x00000000      0x00000000
0x80661e0:      0x00000000      0x00000000      0x00000000      0x00000000
0x80661f0:      0x00000000      0x00000000      0x00000000      0x00000000
0x8066200:      0x00000000      0x00000000      0x00000000      0x00000000
0x8066210:      0x00000000      0x00000000      0x00000000      0x00000000</code>
</pre>
</div>
</div></div>
<input type="radio" name="gdb-DAT_0001de20-aln" id="tab-gdb-DAT_0001de20-aln-static" />
<label for="tab-gdb-DAT_0001de20-aln-static">DAT_0001de20 (aln.static.elf)</label>
<div class="tab"><div class="language-plaintext highlighter-rouge">
<div class="highlight">
<pre class="highlight">
<code>(gdb) info address DAT&#95;0001de20
Symbol "DAT&#95;0001de20" is at 0x8118600 in a file compiled without debugging.
(gdb) x/256wx 0x8118600
0x8118600 &lt;DAT&#95;0001de20&gt;:       0x00000000      0x00000000      0x00000000      0x00000000
0x8118610:      0x00000000      0x00000000      0x00000000      0x00000000
0x8118620:      0x00000000      0x00000000      0x00000000      0x00000000
0x8118630:      0x00000000      0x00000000      0x00000000      0x00000000
0x8118640:      0x00000000      0x00000000      0x00000000      0x00000000
0x8118650:      0x00000000      0x08124060      0x00000000      0x00000000
0x8118660:      0x00000000      0x00000000      0x00000000      0x08124040
0x8118670:      0x00000000      0x081245b0      0x00000000      0x08124590
0x8118680:      0x00000000      0x00000000      0x08124140      0x00000000
0x8118690:      0x08124120      0x00000000      0x00000000      0x00000000
0x81186a0:      0x00000000      0x00000000      0x00000000      0x00000000
0x81186b0:      0x00000000      0x00000000      0x00000000      0x00000000
0x81186c0:      0x00000000      0x00000000      0x00000000      0x00000000
0x81186d0:      0x00000000      0x00000000      0x00000000      0x00000000
0x81186e0:      0x00000000      0x00000000      0x00000000      0x00000000
0x81186f0:      0x00000000      0x00000000      0x08124080      0x00000000
0x8118700:      0x00000000      0x00000000      0x00000000      0x00000000
0x8118710:      0x00000000      0x00000000      0x00000000      0x00000000
0x8118720:      0x08124570      0x00000000      0x00000000      0x00000000
0x8118730:      0x00000000      0x00000000      0x00000000      0x00000000
0x8118740:      0x00000000      0x00000000      0x00000000      0x00000000
0x8118750:      0x00000000      0x00000000      0x00000000      0x08123fa0
0x8118760:      0x00000000      0x081245d0      0x00000000      0x00000000
0x8118770:      0x00000000      0x00000000      0x00000000      0x081241a0
0x8118780:      0x00000000      0x00000000      0x08124000      0x08124610
0x8118790:      0x08123f60      0x00000000      0x08123fc0      0x00000000
0x81187a0:      0x00000000      0x00000000      0x00000000      0x00000000
0x81187b0:      0x00000000      0x00000000      0x00000000      0x08124180
0x81187c0:      0x08123fe0      0x00000000      0x00000000      0x00000000
0x81187d0:      0x00000000      0x00000000      0x00000000      0x00000000
0x81187e0 &lt;&#95;ctype&#95;b&gt;:   0x080588e2      0x08058ae3      0x08058be4      0x080e32bc
0x81187f0 &lt;&#95;&#95;exit&#95;funcs&gt;:       0x0811a8e0      0x080e1b40      0x08118800      0x00000000
0x8118800 &lt;&#95;IO&#95;2&#95;1&#95;stderr&#95;&gt;:    0xfbad2086      0x00000000      0x00000000      0x00000000
0x8118810 &lt;&#95;IO&#95;2&#95;1&#95;stderr&#95;+16&gt;: 0x00000000      0x00000000      0x00000000      0x00000000
0x8118820 &lt;&#95;IO&#95;2&#95;1&#95;stderr&#95;+32&gt;: 0x00000000      0x00000000      0x00000000      0x08123f40
0x8118830 &lt;&#95;IO&#95;2&#95;1&#95;stderr&#95;+48&gt;: 0x00000000      0x08118940      0x00000002      0x00000000
0x8118840 &lt;&#95;IO&#95;2&#95;1&#95;stderr&#95;+64&gt;: 0xffffffff      0x00000000      0x0811ab04      0xffffffff
0x8118850 &lt;&#95;IO&#95;2&#95;1&#95;stderr&#95;+80&gt;: 0xffffffff      0x00000000      0x081188a0      0x00000000
0x8118860 &lt;&#95;IO&#95;2&#95;1&#95;stderr&#95;+96&gt;: 0x00000000      0x081241c0      0x00000000      0x00000000
0x8118870 &lt;&#95;IO&#95;2&#95;1&#95;stderr&#95;+112&gt;:        0x00000000      0x00000000      0x00000000      0x00000000
0x8118880 &lt;&#95;IO&#95;2&#95;1&#95;stderr&#95;+128&gt;:        0x00000000      0x00000000      0x00000000      0x00000000
0x8118890 &lt;&#95;IO&#95;2&#95;1&#95;stderr&#95;+144&gt;:        0x00000000      0x08119980      0x00000000      0x00000000
0x81188a0 &lt;&#95;IO&#95;wide&#95;data&#95;2&gt;:    0x08124020      0x00000000      0x00000000      0x00000000
0x81188b0 &lt;&#95;IO&#95;wide&#95;data&#95;2+16&gt;: 0x00000000      0x00000000      0x00000000      0x00000000
0x81188c0 &lt;&#95;IO&#95;wide&#95;data&#95;2+32&gt;: 0x00000000      0x00000000      0x00000000      0x00000000
0x81188d0 &lt;&#95;IO&#95;wide&#95;data&#95;2+48&gt;: 0x00000000      0x00000000      0x00000000      0x08124160
0x81188e0 &lt;&#95;IO&#95;wide&#95;data&#95;2+64&gt;: 0x00000000      0x00000000      0x00000000      0x00000000
0x81188f0 &lt;&#95;IO&#95;wide&#95;data&#95;2+80&gt;: 0x00000000      0x00000000      0x00000000      0x00000000
0x8118900 &lt;&#95;IO&#95;wide&#95;data&#95;2+96&gt;: 0x00000000      0x00000000      0x00000000      0x00000000
0x8118910 &lt;&#95;IO&#95;wide&#95;data&#95;2+112&gt;:        0x00000000      0x00000000      0x00000000      0x00000000
0x8118920 &lt;&#95;IO&#95;wide&#95;data&#95;2+128&gt;:        0x00000000      0x00000000      0x08119860      0x00000000
0x8118930:      0x00000000      0x00000000      0x00000000      0x00000000
0x8118940 &lt;&#95;IO&#95;2&#95;1&#95;stdout&#95;&gt;:    0xfbad2a84      0x0811cdb0      0x0811cdb0      0x0811cdb0
0x8118950 &lt;&#95;IO&#95;2&#95;1&#95;stdout&#95;+16&gt;: 0x0811cdb0      0x0811cdb0      0x0811cdb0      0x0811cdb0
0x8118960 &lt;&#95;IO&#95;2&#95;1&#95;stdout&#95;+32&gt;: 0x0811d1b0      0x00000000      0x00000000      0x00000000
0x8118970 &lt;&#95;IO&#95;2&#95;1&#95;stdout&#95;+48&gt;: 0x00000000      0x08118a80      0x00000001      0x00000000
0x8118980 &lt;&#95;IO&#95;2&#95;1&#95;stdout&#95;+64&gt;: 0xffffffff      0x00000000      0x0811ab10      0xffffffff
0x8118990 &lt;&#95;IO&#95;2&#95;1&#95;stdout&#95;+80&gt;: 0xffffffff      0x00000000      0x081189e0      0x00000000
0x81189a0 &lt;&#95;IO&#95;2&#95;1&#95;stdout&#95;+96&gt;: 0x00000000      0x00000000      0xffffffff      0x081245f0
0x81189b0 &lt;&#95;IO&#95;2&#95;1&#95;stdout&#95;+112&gt;:        0x00000000      0x00000000      0x00000000      0x00000000
0x81189c0 &lt;&#95;IO&#95;2&#95;1&#95;stdout&#95;+128&gt;:        0x00000000      0x00000000      0x00000000      0x00000000
0x81189d0 &lt;&#95;IO&#95;2&#95;1&#95;stdout&#95;+144&gt;:        0x00000000      0x08119980      0x00000000      0x00000000
0x81189e0 &lt;&#95;IO&#95;wide&#95;data&#95;1&gt;:    0x00000000      0x00000000      0x00000000      0x00000000
0x81189f0 &lt;&#95;IO&#95;wide&#95;data&#95;1+16&gt;: 0x00000000      0x00000000      0x00000000      0x00000000</code>
</pre>
</div>
</div></div>
</div>

<p>Whereas <code class="language-plaintext highlighter-rouge">aln.elf</code> seems in good order, inside <code class="language-plaintext highlighter-rouge">aln.static.elf</code> this array appears to be truncated at 480 bytes, with unrelated variables following it.
However, <code class="language-plaintext highlighter-rouge">aln</code>’s code still assumed it was 1024 bytes long, so these variables were overwritten until that memory corruption led to a segmentation fault.</p>

<p>How come <code class="language-plaintext highlighter-rouge">DAT_0001de20</code> was truncated in <code class="language-plaintext highlighter-rouge">aln.static.elf</code> and not in <code class="language-plaintext highlighter-rouge">aln.elf</code>?
That array was originally located inside <code class="language-plaintext highlighter-rouge">aln</code> from <code class="language-plaintext highlighter-rouge">0x1de20</code> to <code class="language-plaintext highlighter-rouge">0x1e21f</code>.
However, the <code class="language-plaintext highlighter-rouge">.data</code> segment stops at <code class="language-plaintext highlighter-rouge">0x1dfff</code> and the <code class="language-plaintext highlighter-rouge">.bss</code> segment begins at <code class="language-plaintext highlighter-rouge">0x1e000</code>.
As far as I can tell, that variable is laid across <em>two</em> sections.</p>

<div class="box box-warning">
  <p>I might not give a damn about computer engineering conventions in this series of articles, but so did toolchains in the 90’s apparently.</p>

  <p>(╯°□°)╯︵ ┻━┻</p>
</div>

<p>When the <code class="language-plaintext highlighter-rouge">aln.o</code> object file got exported, these two sections became untethered.
Later on, the linker didn’t place these bits next to each other, <em>splitting</em> that variable in two, effectively truncating it down to 480 bytes.</p>

<p>That was a <em>fun</em> one to track down.</p>

<h3 id="work-goddamnit">Work, goddamnit!</h3>

<p>So, let’s excise that problematic array from <code class="language-plaintext highlighter-rouge">aln.o</code> and substitute it with a clean stub:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ cat DAT_0001de20.c
int DAT_0001de20[256];
$ i686-linux-gnu-gcc -g -static -no-pie -fno-pic -o aln.static.elf aln.glibc20.o ctype.o FUN_0000fba0.c DAT_0001de20.c
</code></pre></div></div>

<div class="box box-information">
  <p>We don’t care about the type of <code class="language-plaintext highlighter-rouge">DAT_0001de20</code>, we only want it to be at least 1024 bytes long.
Unlike a compiler, a traditional linker only processes symbol names and doesn’t care about typing information.</p>
</div>

<p><em>Please</em> work this time…</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ rm -f jaghello.cof &amp;&amp; aln.static.elf -e -l -g2 -rd -a 4000 x x -v -o jaghello.cof startup.o jag.o
***********************************
*   ATARI LINKER (Mar 17 1995)    *
*  Adds from Atari version 1.11   *
*     and PC/DOS&amp;Linux ports      *
*  Copyright 1993-95 Brainstorm.  *
** Copyright 1987-95 Atari Corp. **
***********************************
Output file is jaghello.cof

Sizes:   Text   Data    Bss   Syms
(hex)     3C0    410   FA54   1C00

Link complete.
$ sha256sum jaghello.cof
f9c8269cdc998de01c0ac7a3e815c16b7ced106e25f10f92a7078c722a220dbb  jaghello.cof
</code></pre></div></div>

<p><em>Finally</em>, <code class="language-plaintext highlighter-rouge">aln</code> lives on with a transplanted C standard library from the 21st century and it only took the better part of my sanity.</p>

<div class="box box-information">
  <p>Despite all this nonsense, it wasn’t as bad as it could be.
In particular, <code class="language-plaintext highlighter-rouge">aln</code> doesn’t use <code class="language-plaintext highlighter-rouge">errno</code>, which was a static variable in the old days and is a thread-local one nowadays.
These two storage schemes are mutually incompatible.</p>

  <p>There are ways to work around this problem, but thankfully I didn’t have to deal with this nonsense here.</p>
</div>

<h2 id="dynamic-library-displacement">Dynamic library displacement</h2>

<p>We’ve swapped a C standard library with another, how about removing it altogether from the executable?
We can do so by dynamically linking it:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ i686-linux-gnu-gcc -g -no-pie -fno-pic -o aln.dynamic.elf aln.glibc20.o ctype.o FUN_0000fba0.c DAT_0001de20.c
/usr/lib/gcc-cross/i686-linux-gnu/10/../../../../i686-linux-gnu/bin/ld: aln.glibc20.o: warning: relocation against `_IO_2_1_stderr_@@GLIBC_2.1' in read-only section `.text'
/usr/lib/gcc-cross/i686-linux-gnu/10/../../../../i686-linux-gnu/bin/ld: warning: creating DT_TEXTREL in a PIE
$ file aln.dynamic.elf 
aln.dynamic.elf: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux.so.2, BuildID[sha1]=d6aabe004514845f6530384228f12741aa87dbc1, for GNU/Linux 3.2.0, with debug_info, not stripped
</code></pre></div></div>

<p>The linker isn’t very happy about it, but it did spat out an executable.
Let’s try it out:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ rm -f jaghello.cof &amp;&amp; aln.dynamic.elf -e -l -g2 -rd -a 4000 x x -v -o jaghello.cof startup.o jag.o
***********************************
*   ATARI LINKER (Mar 17 1995)    *
*  Adds from Atari version 1.11   *
*     and PC/DOS&amp;Linux ports      *
*  Copyright 1993-95 Brainstorm.  *
** Copyright 1987-95 Atari Corp. **
***********************************
Output file is jaghello.cof

Sizes:   Text   Data    Bss   Syms
(hex)     3C0    410   FA54   1C00

Link complete.
$ sha256sum jaghello.cof
f9c8269cdc998de01c0ac7a3e815c16b7ced106e25f10f92a7078c722a220dbb  jaghello.cof
</code></pre></div></div>

<p>It worked out of the box?
Well, after dealing with all these shenanigans previously it better be!</p>

<div class="box box-information">
  <p>The files for this case study can be found here: <a href="/atari-jaguar-sdk/assets/case-study.tar.gz" download="">case-study.tar.gz</a></p>
</div>

<h2 id="conclusion">Conclusion</h2>

<p>We have delinked <code class="language-plaintext highlighter-rouge">aln</code> back into a standard object file without its C standard library, then produced both statically and dynamically linked versions of <code class="language-plaintext highlighter-rouge">aln</code> using contemporary versions of glibc.
Next time, we’ll attempt our most ambitious port yet: turning <code class="language-plaintext highlighter-rouge">aln</code> from a Linux program into a native Windows executable.</p>

<div style="display: grid; grid-template-columns: 1fr 1fr; margin-top: 1ch; margin-bottom: 1ch;">
<div style="padding-right: 15px;">


<a class="prev" href="/atari-jaguar-sdk/2023/12/18/part-3.html">&laquo; Porting the Atari Jaguar SDK part 3: makin' elves</a>

</div>
<div style="padding-left: 15px; text-align: right;">


<a href="/atari-jaguar-sdk/2024/01/02/part-5.html">Porting the Atari Jaguar SDK part 5: I have a feeling we're not on Linux anymore &raquo;</a>

</div>
</div>]]></content><author><name>Jean-Baptiste Boric</name></author><category term="atari-jaguar-sdk" /><summary type="html"><![CDATA[Previously in this series of articles, we ported aln as a whole from a Linux a.out executable to a modern Linux ELF executable, aln.elf. However, that executable still contains the old, glibc 1.xx C standard library aln was originally built with. If we are to make more ambitious ports of aln, we need to get rid of it.]]></summary></entry><entry><title type="html">Porting the Atari Jaguar SDK part 3: makin’ elves</title><link href="https://boricj.net/atari-jaguar-sdk/2023/12/18/part-3.html" rel="alternate" type="text/html" title="Porting the Atari Jaguar SDK part 3: makin’ elves" /><published>2023-12-18T13:00:00+01:00</published><updated>2023-12-18T13:00:00+01:00</updated><id>https://boricj.net/atari-jaguar-sdk/2023/12/18/part-3</id><content type="html" xml:base="https://boricj.net/atari-jaguar-sdk/2023/12/18/part-3.html"><![CDATA[<p><a href="/atari-jaguar-sdk/2023/12/11/part-2.html">Previously</a> in this <a href="/atari-jaguar-sdk/2023/11/27/introduction.html">series of articles</a>, we used Ghidra’s Version Tracking tool to annotate parts of <code class="language-plaintext highlighter-rouge">aln</code> with a closely matching C static library from Slackware 2.3.
In this article, we will start making software ports of <code class="language-plaintext highlighter-rouge">aln</code>, despite not having the source code for it.</p>

<h2 id="what-are-we-porting-to-anyways">What are we porting to anyways?</h2>

<p>Currently, <code class="language-plaintext highlighter-rouge">aln</code> is a statically-linked Linux a.out executable for Linux.
For our first port, we’ll aim for something small: a statically-linked Linux <strong>ELF</strong> executable for Linux.
Since I want this part to be meaningful and not just stick an ELF header and call it a day, I’ll place another restriction: it must work as-is on modern Linux systems.</p>

<p>Recall in <a href="/atari-jaguar-sdk/2023/12/04/part-1.html">part one</a> that to run <code class="language-plaintext highlighter-rouge">aln</code> nowadays we must <code class="language-plaintext highlighter-rouge">sysctl -w vm.mmap_min_addr=4096</code> to get it to run.
This is because modern Linux systems have a minimum virtual address for user processes configured by default to <code class="language-plaintext highlighter-rouge">0x10000</code>, whereas <code class="language-plaintext highlighter-rouge">aln</code> is based at the lower address <code class="language-plaintext highlighter-rouge">0x1000</code>.</p>

<p>To fulfill the stated restriction for the ELF port, it must be based at a higher address than <code class="language-plaintext highlighter-rouge">aln</code> currently is.
Simply moving the bits of <code class="language-plaintext highlighter-rouge">aln</code> around in the address space will end in disaster: references to absolute addresses embedded within <code class="language-plaintext highlighter-rouge">aln</code> would not shift around with the bits, leading to incorrect memory accesses and most likely crashes.</p>

<h2 id="porting-aln-with-the-power-of-delinking">Porting aln with the power of delinking</h2>

<p>To work around the problem of absolute references, we’ll use a technique described in my previous <a href="/reverse-engineering/2023/05/01/introduction.html">series of articles on reverse-engineering</a>: delinking <code class="language-plaintext highlighter-rouge">aln</code> back into an object file so that we can then relink it however we want.
This solves the issue because references to absolute addresses will be converted to relocations as part of this process and the linker is going to stitch everything back together, as if nothing happened.</p>

<h3 id="making-an-object-file-out-of-a-program">Making an object file out of a program</h3>

<p>Continuing from the <a href="/atari-jaguar-sdk/2023/12/11/part-2.html">previous part</a>, we have a partially analyzed artifact.
After some further work on it (annotating things, fixing up incorrectly identified references…), we are ready for our first attempt at delinking it with the help of <a href="https://github.com/boricj/ghidra-delinker-extension">my Ghidra extension</a>.</p>

<p>Select the address ranges <code class="language-plaintext highlighter-rouge">0x1020</code> to <code class="language-plaintext highlighter-rouge">0x1e993</code> (the entire <code class="language-plaintext highlighter-rouge">aln</code> program without the a.out header), then click on <code class="language-plaintext highlighter-rouge">Analysis &gt; One shot &gt; Relocation Table Analyzer</code>.
Click on <code class="language-plaintext highlighter-rouge">Window &gt; Relocation Table (synthesized)</code> to view the reconstructed relocations:</p>

<div class="fullwindow-zoom-click">
    <label>
        <input type="checkbox" onclick="onclick_fullWindowZoom(event)" />
        <div class="fullwindow-zoom-click-content">
            <figure>
                <img src="/atari-jaguar-sdk/assets/part-3/aln-relocation-table-resynthesized-window.png" />
                <figcaption>Figure 1: Resynthesized relocations for aln</figcaption>
            </figure>
        </div>
    </label>
</div>

<p>We have all the data we need to make an object file out of <code class="language-plaintext highlighter-rouge">aln</code>.
Click on <code class="language-plaintext highlighter-rouge">File &gt; Export Program...</code> (or hit the <code class="language-plaintext highlighter-rouge">O</code> key), and fill in the wizard as follows:</p>
<ul>
  <li>Format: <code class="language-plaintext highlighter-rouge">ELF relocatable object</code> ;</li>
  <li>Output File: <code class="language-plaintext highlighter-rouge">aln.whole.o</code> ;</li>
  <li>Selection Only: checked.</li>
</ul>

<p>Furthermore, click on <code class="language-plaintext highlighter-rouge">Options...</code> and ensure that <code class="language-plaintext highlighter-rouge">Include dynamic symbols</code> and <code class="language-plaintext highlighter-rouge">Strip leading underscore</code> are checked:</p>

<div class="fullwindow-zoom-click">
    <label>
        <input type="checkbox" onclick="onclick_fullWindowZoom(event)" />
        <div class="fullwindow-zoom-click-content">
            <figure>
                <img src="/atari-jaguar-sdk/assets/part-3/export-aln.whole.o-dialog.png" />
                <figcaption>Figure 2: Exporting aln.whole.o as an ELF object file</figcaption>
            </figure>
        </div>
    </label>
</div>

<p>Click on <code class="language-plaintext highlighter-rouge">OK</code> and the object file will be written to disk.
We can inspect that object file using our toolchain:</p>

<div class="tabs">
<input type="radio" name="file-aln-whole-o" id="tab-header-readelf-aln-whole-o" />
<label for="tab-header-readelf-aln-whole-o">Header</label>
<div class="tab"><div class="language-plaintext highlighter-rouge">
<div class="highlight">
<pre class="highlight">
<code>$ readelf --wide --file-header aln.whole.o
ELF Header:
  Magic:   7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00 
  Class:                             ELF32
  Data:                              2's complement, little endian
  Version:                           1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       0
  Type:                              REL (Relocatable file)
  Machine:                           Intel 80386
  Version:                           0x0
  Entry point address:               0x0
  Start of program headers:          0 (bytes into file)
  Start of section headers:          52 (bytes into file)
  Flags:                             0x0
  Size of this header:               52 (bytes)
  Size of program headers:           0 (bytes)
  Number of program headers:         0
  Size of section headers:           40 (bytes)
  Number of section headers:         9
  Section header string table index: 8</code>
</pre>
</div>
</div></div>
<input type="radio" name="file-aln-whole-o" id="tab-sections-readelf-aln-whole-o" checked="" />
<label for="tab-sections-readelf-aln-whole-o">Sections</label>
<div class="tab"><div class="language-plaintext highlighter-rouge">
<div class="highlight">
<pre class="highlight">
<code>$ readelf --wide --section-headers aln.whole.o 
There are 9 section headers, starting at offset 0x34:

Section Headers:
  [Nr] Name              Type            Addr     Off    Size   ES Flg Lk Inf Al
  [ 0]                   NULL            00000000 000000 000000 00      0   0  0
  [ 1] .strtab           STRTAB          00000000 00019c 00e9cd 00      0   0  1
  [ 2] .symtab           SYMTAB          00000000 00eb6c 00eeb0 10      1 155  4
  [ 3] .text             PROGBITS        00000000 01da20 01bfe0 00 WAX  0   0 16
  [ 4] .data             PROGBITS        00000000 039a00 001000 00  WA  0   0 16
  [ 5] .bss              NOBITS          00000000 000000 000994 00  WA  0   0 16
  [ 6] .rel.text         REL             00000000 03aa00 005c10 08   I  2   3  4
  [ 7] .rel.data         REL             00000000 040610 000268 08   I  2   4  4
  [ 8] .shstrtab         STRTAB          00000000 040878 000040 00      0   0  1
Key to Flags:
  W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
  L (link order), O (extra OS processing required), G (group), T (TLS),
  C (compressed), x (unknown), o (OS specific), E (exclude),
  p (processor specific)</code>
</pre>
</div>
</div></div>
<input type="radio" name="file-aln-whole-o" id="tab-symbols-readelf-aln-whole-o" />
<label for="tab-symbols-readelf-aln-whole-o">Symbols</label>
<div class="tab"><div class="language-plaintext highlighter-rouge">
<div class="highlight">
<pre class="highlight">
<code>$ readelf --wide --symbols aln.whole.o

Symbol table '.symtab' contains 3819 entries:
   Num:    Value  Size Type    Bind   Vis      Ndx Name
     0: 00000000     0 NOTYPE  LOCAL  DEFAULT  UND 
     1: 00000000     0 FILE    LOCAL  DEFAULT  ABS aln.whole.o
     2: 00000000     0 SECTION LOCAL  DEFAULT    3 
     3: 00000000     0 SECTION LOCAL  DEFAULT    4 
     4: 00000000     0 SECTION LOCAL  DEFAULT    5 
     5: 000018cf     7 OBJECT  LOCAL  DEFAULT    3 switchD_000028ef::switchD
     6: 000018e4     4 OBJECT  LOCAL  DEFAULT    3 switchD_000028ef::switchdataD_00002904
     7: 00001984     3 OBJECT  LOCAL  DEFAULT    3 switchD_000028ef::caseD_3f
     8: 000019a4     3 OBJECT  LOCAL  DEFAULT    3 switchD_000028ef::caseD_5a
     9: 000019d4     3 OBJECT  LOCAL  DEFAULT    3 switchD_000028ef::caseD_41
    10: 00001cf4     3 OBJECT  LOCAL  DEFAULT    3 switchD_000028ef::caseD_42
...
  3808: 000008d8     1 OBJECT  GLOBAL DEFAULT    5 DAT_0001e8d8
  3809: 000008e0     1 OBJECT  GLOBAL DEFAULT    5 __malloc_initialized
  3810: 000008e4     1 OBJECT  GLOBAL DEFAULT    5 DAT_0001e8e4
  3811: 000008e8     4 OBJECT  GLOBAL DEFAULT    5 DAT_0001e8e8
  3812: 000008f0     1 OBJECT  GLOBAL DEFAULT    5 DAT_0001e8f0
  3813: 00000920    96 OBJECT  GLOBAL DEFAULT    5 _fraghead
  3814: 00000928     4 OBJECT  GLOBAL DEFAULT    5 _fraghead[1].next
  3815: 00000980     4 OBJECT  GLOBAL DEFAULT    5 __malloc_hook
  3816: 00000984     4 OBJECT  GLOBAL DEFAULT    5 DAT_0001e984
  3817: 00000988     4 OBJECT  GLOBAL DEFAULT    5 DAT_0001e988
  3818: 00000990     4 OBJECT  GLOBAL DEFAULT    5 DAT_0001e990</code>
</pre>
</div>
</div></div>
<input type="radio" name="file-aln-whole-o" id="tab-relocations-readelf-aln-whole-o" />
<label for="tab-relocations-readelf-aln-whole-o">Relocations</label>
<div class="tab"><div class="language-plaintext highlighter-rouge">
<div class="highlight">
<pre class="highlight">
<code>$ readelf --wide --relocs aln.whole.o

Relocation section '.rel.text' at offset 0x3aa00 contains 2946 entries:
 Offset     Info    Type                Sym. Value  Symbol's Name
00000012  000e3401 R_386_32               00000064   DAT_0001d064
0000001b  000e3501 R_386_32               00000068   DAT_0001d068
00000022  000e3601 R_386_32               0000006c   DAT_0001d06c
00000568  000e1e01 R_386_32               00000004   PTR_s_aln_0001d004
0000056e  00009f01 R_386_32               0000005b   s_Usage:_%s_[-options]_&lt;files|-x_f_0000107b
00000578  0000a001 R_386_32               00000097   s_Where_options_are:_000010b7
00000582  0000a101 R_386_32               000000aa   s_?:_print_this_000010ca
...</code>
</pre>
</div>
</div></div>
</div>

<p>It has sections, symbols and relocations, like any other object file produced through more conventional means.</p>

<h3 id="making-a-program-out-of-an-object-file-out-of-a-program">Making a program out of an object file out of a program</h3>

<p>Let’s try making an executable out of <code class="language-plaintext highlighter-rouge">aln.whole.o</code>.
Since this object file is a whole program, we need to build it statically with no extra libraries whatsoever:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ i686-linux-gnu-gcc -nostdlib -nostartfiles -e_entry -static -o aln.elf aln.whole.o
</code></pre></div></div>

<p>Again, we can inspect the executable with the toolchain:</p>

<div class="tabs">
<input type="radio" name="file-aln-elf" id="tab-header-readelf-aln-elf" />
<label for="tab-header-readelf-aln-elf">Header</label>
<div class="tab"><div class="language-plaintext highlighter-rouge">
<div class="highlight">
<pre class="highlight">
<code>$ readelf --wide --file-header aln.elf
ELF Header:
  Magic:   7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00 
  Class:                             ELF32
  Data:                              2's complement, little endian
  Version:                           1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       0
  Type:                              EXEC (Executable file)
  Machine:                           Intel 80386
  Version:                           0x1
  Entry point address:               0x8049000
  Start of program headers:          52 (bytes into file)
  Start of section headers:          243928 (bytes into file)
  Flags:                             0x0
  Size of this header:               52 (bytes)
  Size of program headers:           32 (bytes)
  Number of program headers:         4
  Size of section headers:           40 (bytes)
  Number of section headers:         8
  Section header string table index: 7</code>
</pre>
</div>
</div></div>
<input type="radio" name="file-aln-elf" id="tab-sections-readelf-aln-elf" checked="" />
<label for="tab-sections-readelf-aln-elf">Sections</label>
<div class="tab"><div class="language-plaintext highlighter-rouge">
<div class="highlight">
<pre class="highlight">
<code>$ readelf --wide --section-headers aln.elf
There are 8 section headers, starting at offset 0x3b8d8:

Section Headers:
  [Nr] Name              Type            Addr     Off    Size   ES Flg Lk Inf Al
  [ 0]                   NULL            00000000 000000 000000 00      0   0  0
  [ 1] .note.gnu.build-id NOTE            080480b4 0000b4 000024 00   A  0   0  4
  [ 2] .text             PROGBITS        08049000 001000 01bfe0 00  AX  0   0 16
  [ 3] .data             PROGBITS        08065000 01d000 001000 00  WA  0   0 16
  [ 4] .bss              NOBITS          08066000 01e000 000994 00  WA  0   0 16
  [ 5] .symtab           SYMTAB          00000000 01e000 00eef0 10      6 156  4
  [ 6] .strtab           STRTAB          00000000 02cef0 00e9a6 00      0   0  1
  [ 7] .shstrtab         STRTAB          00000000 03b896 00003f 00      0   0  1
Key to Flags:
  W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
  L (link order), O (extra OS processing required), G (group), T (TLS),
  C (compressed), x (unknown), o (OS specific), E (exclude),
  p (processor specific)</code>
</pre>
</div>
</div></div>
<input type="radio" name="file-aln-elf" id="tab-symbols-readelf-aln-elf" />
<label for="tab-symbols-readelf-aln-elf">Symbols</label>
<div class="tab"><div class="language-plaintext highlighter-rouge">
<div class="highlight">
<pre class="highlight">
<code>$ readelf --wide --symbols aln.elf | head -n 20

Symbol table '.symtab' contains 3823 entries:
   Num:    Value  Size Type    Bind   Vis      Ndx Name
     0: 00000000     0 NOTYPE  LOCAL  DEFAULT  UND 
     1: 080480b4     0 SECTION LOCAL  DEFAULT    1 
     2: 08049000     0 SECTION LOCAL  DEFAULT    2 
     3: 08065000     0 SECTION LOCAL  DEFAULT    3 
     4: 08066000     0 SECTION LOCAL  DEFAULT    4 
     5: 00000000     0 FILE    LOCAL  DEFAULT  ABS aln.whole.o
     6: 0804a8cf     7 OBJECT  LOCAL  DEFAULT    2 switchD_000028ef::switchD
     7: 0804a8e4     4 OBJECT  LOCAL  DEFAULT    2 switchD_000028ef::switchdataD_00002904
     8: 0804a984     3 OBJECT  LOCAL  DEFAULT    2 switchD_000028ef::caseD_3f
     9: 0804a9a4     3 OBJECT  LOCAL  DEFAULT    2 switchD_000028ef::caseD_5a
    10: 0804a9d4     3 OBJECT  LOCAL  DEFAULT    2 switchD_000028ef::caseD_41
...
  3812: 080518d8    10 OBJECT  GLOBAL DEFAULT    2 LAB_000098f8
  3813: 08052808     3 OBJECT  GLOBAL DEFAULT    2 LAB_0000a828
  3814: 0804d330     3 OBJECT  GLOBAL DEFAULT    2 LAB_00005350
  3815: 080642e4     1 OBJECT  GLOBAL DEFAULT    2 LAB_0001c304
  3816: 0806400c     3 OBJECT  GLOBAL DEFAULT    2 LAB_0001c02c
  3817: 08063a0a     7 OBJECT  GLOBAL DEFAULT    2 LAB_0001ba2a
  3818: 080566ee     2 OBJECT  GLOBAL DEFAULT    2 LAB_0000e70e
  3819: 0805c375     2 OBJECT  GLOBAL DEFAULT    2 LAB_00014395
  3820: 08055b2c   179 FUNC    GLOBAL DEFAULT    2 FUN_0000db4c
  3821: 080593a7     3 OBJECT  GLOBAL DEFAULT    2 LAB_000113c7
  3822: 08055e98     1 OBJECT  GLOBAL DEFAULT    2 LAB_0000deb8</code>
</pre>
</div>
</div></div>
<input type="radio" name="file-aln-elf" id="tab-relocations-readelf-aln-elf" />
<label for="tab-relocations-readelf-aln-elf">Relocations</label>
<div class="tab"><div class="language-plaintext highlighter-rouge">
<div class="highlight">
<pre class="highlight">
<code>$ readelf --wide --relocs aln.elf

There are no relocations in this file.</code>
</pre>
</div>
</div></div>
</div>

<p>We can observe that its base address (automatically set by the toolchain) is now <code class="language-plaintext highlighter-rouge">0x0x8049000</code>, fulfilling our minimum virtual address space requirement of <code class="language-plaintext highlighter-rouge">0x10000</code> or higher.
We can also see that the symbols, whose names include the original addresses from the a.out layout, are shifted from their initial locations.</p>

<h2 id="debugging-aln-because-of-delinking">Debugging aln because of delinking</h2>

<p>Let’s take <code class="language-plaintext highlighter-rouge">aln.elf</code> for a spin:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ rm -f jaghello.cof &amp;&amp; make LINK=aln.elf V=1
aln.elf -e -l -g2 -rd -a 4000 x x -v -o jaghello.cof startup.o jag.o
make: *** [Makefile:12: jaghello.cof] Segmentation fault (core dumped)
</code></pre></div></div>

<p>… and it crashed immediately.
Let’s run <code class="language-plaintext highlighter-rouge">aln.elf</code> under GDB:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ gdb --args aln.elf -e -l -g2 -rd -a 4000 x x -v -o jaghello.cof startup.o jag.o
GNU gdb (Debian 10.1-1.7) 10.1.90.20210103-git
Copyright (C) 2021 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later &lt;http://gnu.org/licenses/gpl.html&gt;
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
&lt;https://www.gnu.org/software/gdb/bugs/&gt;.
Find the GDB manual and other documentation resources online at:
    &lt;http://www.gnu.org/software/gdb/documentation/&gt;.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from aln.elf...
(No debugging symbols found in aln.elf)
(gdb) r
Starting program: /home/boricj/Documents/atari-sdk-elf/aln.elf -e -l -g2 -rd -a 4000 x x -v -o jaghello.cof startup.o jag.o

Program received signal SIGSEGV, Segmentation fault.
0x08058f6b in strncmp ()
(gdb) bt
#0  0x08058f6b in strncmp ()
#1  0x08058db2 in getenv ()
#2  0x0804c199 in main ()
(gdb)
</code></pre></div></div>

<p>The program crashed while doing some sort of processing with environment variables.
The <em>fun</em> part of getting this port to run begins now.</p>

<h3 id="entrypoint-shenanigans">Entrypoint shenanigans</h3>

<p>Linux initializes the process differently between an a.out and an ELF executable: the latter is set up per the ELF i386 psABI specification.
The relevant part is the differences in stack layout: for a.out executables, the Linux kernel used to set it up <em>as-if</em> the entrypoint is a C function with this signature that was just called:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">[[</span><span class="n">noreturn</span><span class="p">]]</span> <span class="kt">void</span> <span class="nf">_start</span><span class="p">(</span><span class="kt">int</span> <span class="n">argc</span><span class="p">,</span> <span class="kt">char</span><span class="o">**</span> <span class="n">argv</span><span class="p">,</span> <span class="kt">char</span><span class="o">**</span> <span class="n">environ</span><span class="p">);</span>
</code></pre></div></div>

<p>Needless to say, the ELF i386 psABI specification mandates something completely different, which is why <code class="language-plaintext highlighter-rouge">aln.elf</code> is crashing due to the ABI mismatch.
The ELF entrypoint therefore needs to shuffle things around before it can jump into the a.out entrypoint.
The following assembly snippet is one way to implement it:</p>

<pre><code class="language-asm">.text
.global _start
_start:
        xor %ebp, %ebp              # Terminate the call stack chain.
        mov %esp, %eax              # Save the old stack pointer for later.
        and $-16, %esp              # Align the stack to 16 bytes 'cause we can.

        mov (%eax), %ebx            # Get argc from the old stack.
        lea 4(%eax), %esi           # Get argv from the old stack.
        lea 8(%eax, %ebx, 4), %edi  # Get environ from the old stack.

        push %edi                   # Push arguments in reverse order,
        push %esi                   # per the System-V i386 calling
        push %ebx                   # convention.

        jmp _entry                  # Jump into the a.out entrypoint.
</code></pre>

<p>Now, let’s fast-forward a bit the process of fixing <code class="language-plaintext highlighter-rouge">aln.elf</code> crashes until <code class="language-plaintext highlighter-rouge">aln.elf</code> doesn’t crash anymore:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ i686-linux-gnu-gcc -nostdlib -nostartfiles -e_start -static -o aln.elf aln.whole.o crt0.S
$ rm -f jaghello.cof &amp;&amp; make LINK=aln.elf V=1
aln.elf -e -l -g2 -rd -a 4000 x x -v -o jaghello.cof startup.o jag.o
***********************************
*   ATARI LINKER (Mar 17 1995)    *
*  Adds from Atari version 1.11   *
*     and PC/DOS&amp;Linux ports      *
*  Copyright 1993-95 Brainstorm.  *
** Copyright 1987-95 Atari Corp. **
***********************************
Output file is jaghello.cof
/home/boricj/Documents/jaguar-sdk/jaguar/examples/jaghello/startup.s(1): Unresolved reference: in BSD object startup.o, symbol ___main
(@T+0X106): Unresolved reference: in BSD object jag.o, symbol _vidmem

Sizes:   Text   Data    Bss   Syms
(hex)     3C0    410   FA50   1C0E

Link complete.
$ file jaghello.cof
jaghello.cof: mc68k COFF object not stripped
$ sha256sum jaghello.cof 
dfe2d010a3b526bc3d9e573016b614d0bfd0b382bca004a4a42f7e8a89a22c29  jaghello.cof
</code></pre></div></div>

<p>This time <code class="language-plaintext highlighter-rouge">aln.elf</code> didn’t crash, but it has a different observable behavior than the original one and the SHA-256 hash of <code class="language-plaintext highlighter-rouge">jaghello.cof</code> doesn’t match the one from <a href="/atari-jaguar-sdk/2023/12/04/part-1.html">part 1</a>…</p>

<p>Since <code class="language-plaintext highlighter-rouge">aln</code> didn’t crash, we don’t have an obvious spot to analyze with a debugger.
Fortunately, we do have the original <code class="language-plaintext highlighter-rouge">aln</code> we can run and cross-check our port with.
All we need to do is to compare two different programs as they execute and find out where they start behaving differently and why.</p>

<h3 id="when-all-you-have-is-a-hammer">When all you have is a hammer…</h3>

<p>One little-known feature of GDB is the ability to do time travel debugging, in another words to step <em>backwards</em>.
To do so, GDB records the side-effects of instructions as they execute: stepping backwards can then be achieved by <em>unapplying</em> those side-effects in reverse.
We’re not interested in <em>time-travelling debugging</em> here, but rather in <em>multi-dimentional diffing</em>.</p>

<p>We’ll use GDB to make and save the recordings.
For this to work, we need to replicate the same conditions between the two runs as close as possible, like command lines, environment variables…
The recordings are also started in both cases from <code class="language-plaintext highlighter-rouge">_entry</code> onwards, the original entrypoint from the a.out executable:</p>

<div class="tabs">
<input type="radio" name="gdb-record" id="tab-gdb-record-aln" checked="" />
<label for="tab-gdb-record-aln">GDB recording of aln</label>
<div class="tab"><div class="language-plaintext highlighter-rouge">
<div class="highlight">
<pre class="highlight">
<code>$ rm -f jaghello.cof &amp;&amp; gdb-multiarch --args aout-loader aln -e -l -g2 -rd -a 4000 x x -v -o jaghello.cof startup.o jag.o
GNU gdb (Debian 13.1-3) 13.1
Copyright (C) 2023 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later &lt;http://gnu.org/licenses/gpl.html&gt;
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
&lt;https://www.gnu.org/software/gdb/bugs/&gt;.
Find the GDB manual and other documentation resources online at:
    &lt;http://www.gnu.org/software/gdb/documentation/&gt;.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from aout-loader...
(No debugging symbols found in aout-loader)
(gdb) starti
Starting program: /home/boricj/Documents/jaguar-sdk/tools/bin/aout-loader aln -e -l -g2 -rd -a 4000 x x -v -o jaghello.cof startup.o jag.o

Program stopped.
0xf7fe4490 in ?? () from /lib/ld-linux.so.2
(gdb) hb *0x1020
Hardware assisted breakpoint 1 at 0x1020
(gdb) c
Continuing.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

Breakpoint 1, 0x00001020 in ?? ()
(gdb) set record full insn-max-instructions unlimited
Undefined set record full command: "insn-max-instructions unlimited".  Try "help set record full".
(gdb) set record full insn-number-max unlimited
(gdb) record full
(gdb) c
Continuing.
***********************************
*   ATARI LINKER (Mar 17 1995)    *
*  Adds from Atari version 1.11   *
*     and PC/DOS&amp;Linux ports      *
*  Copyright 1993-95 Brainstorm.  *
** Copyright 1987-95 Atari Corp. **
***********************************
Output file is jaghello.cof

Sizes:   Text   Data    Bss   Syms
(hex)     3C0    410   FA54   1C00

Link complete.
The next instruction is syscall exit.  It will make the program exit.  Do you want to stop the program?([y] or n) y
Process record: inferior program stopped.

Program stopped.
0x00011e64 in ?? ()
(gdb) record save aln.record
warning: target file /proc/11340/cmdline contained unexpected null characters
Saved core file aln.record with execution log.</code>
</pre>
</div>
</div></div>
<input type="radio" name="gdb-record" id="tab-gdb-record-aln-elf" />
<label for="tab-gdb-record-aln-elf">GDB recording of aln.elf</label>
<div class="tab"><div class="language-plaintext highlighter-rouge">
<div class="highlight">
<pre class="highlight">
<code>$ rm -f jaghello.cof &amp;&amp; gdb-multiarch --args aln.elf -e -l -g2 -rd -a 4000 x x -v -o jaghello.cof startup.o jag.o
GNU gdb (Debian 13.1-3) 13.1
Copyright (C) 2023 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later &lt;http://gnu.org/licenses/gpl.html&gt;
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
&lt;https://www.gnu.org/software/gdb/bugs/&gt;.
Find the GDB manual and other documentation resources online at:
    &lt;http://www.gnu.org/software/gdb/documentation/&gt;.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from aln.elf...
(No debugging symbols found in aln.elf)
(gdb) b _entry
Breakpoint 1 at 0x8049000
(gdb) r
Starting program: /home/boricj/Documents/jaguar-sdk/jaguar/bin/linux/aln.elf -e -l -g2 -rd -a 4000 x x -v -o jaghello.cof startup.o jag.o

Breakpoint 1, 0x08049000 in _entry ()
(gdb) set record full insn-number-max unlimited
(gdb) record full
(gdb) c
Continuing.
***********************************
*   ATARI LINKER (Mar 17 1995)    *
*  Adds from Atari version 1.11   *
*     and PC/DOS&amp;Linux ports      *
*  Copyright 1993-95 Brainstorm.  *
** Copyright 1987-95 Atari Corp. **
***********************************
Output file is jaghello.cof
/home/boricj/Documents/jaguar-sdk/jaguar/examples/jaghello/startup.s(1): Unresolved reference: in BSD object startup.o, symbol ___main
(@T+0X106): Unresolved reference: in BSD object jag.o, symbol _vidmem

Sizes:   Text   Data    Bss   Syms
(hex)     3C0    410   FA50   1C0E

Link complete.
The next instruction is syscall exit.  It will make the program exit.  Do you want to stop the program?([y] or n) y
Process record: inferior program stopped.

Program stopped.
0x08059e44 in _exit ()
(gdb) record save aln.elf.record
warning: target file /proc/17807/cmdline contained unexpected null characters
Saved core file aln.elf.record with execution log.</code>
</pre>
</div>
</div></div>
</div>

<p>We have two recordings (weighing over 30 MiB each), but we don’t have a tool to compare them.
It is time to improvise: we’ll hack support for the precord format produced by GDB into <a href="https://github.com/eliben/pyelftools">pyelftools</a>, a Python package for manipulating ELF files, then write a script to make the comparison:</p>

<div class="tabs">
<input type="radio" name="pyelftools" id="tab-pyelftools-diff" />
<label for="tab-pyelftools-diff">pyelftools.diff</label>
<div class="tab"><div class="language-diff highlighter-rouge">
<div class="highlight">
<pre class="highlight">
<code>diff --git a/elftools/elf/elffile.py b/elftools/elf/elffile.py
index 446d970..5074cb1 100644
--- a/elftools/elf/elffile.py
+++ b/elftools/elf/elffile.py
@@ -30,7 +30,8 @@ from .structs import ELFStructs
 from .sections import (
         Section, StringTableSection, SymbolTableSection,
         SymbolTableIndexSection, SUNWSyminfoTableSection, NullSection,
-        NoteSection, StabSection, ARMAttributesSection, RISCVAttributesSection)
+        NoteSection, StabSection, ARMAttributesSection, RISCVAttributesSection,
+        PrecordSection)
 from .dynamic import DynamicSection, DynamicSegment
 from .relocation import (RelocationSection, RelocationHandler,
         RelrRelocationSection)
@@ -661,6 +662,8 @@ class ELFFile(object):
             return NoteSection(section_header, name, self)
         elif sectype == 'SHT_PROGBITS' and name == '.stab':
             return StabSection(section_header, name, self)
+        elif sectype == 'SHT_PROGBITS' and name == 'precord':
+            return PrecordSection(section_header, name, self)
         elif sectype == 'SHT_ARM_ATTRIBUTES':
             return ARMAttributesSection(section_header, name, self)
         elif sectype == 'SHT_RISCV_ATTRIBUTES':
diff --git a/elftools/elf/sections.py b/elftools/elf/sections.py
index 3805962..16ac706 100644
--- a/elftools/elf/sections.py
+++ b/elftools/elf/sections.py
@@ -6,12 +6,13 @@
 # Eli Bendersky (eliben@gmail.com)
 # This code is in the public domain
 #-------------------------------------------------------------------------------
-from ..common.exceptions import ELFCompressionError
+from ..common.exceptions import ELFCompressionError, ELFParseError
 from ..common.utils import struct_parse, elf_assert, parse_cstring_from_stream
 from collections import defaultdict
 from .constants import SH_FLAGS
 from .notes import iter_notes
 
+import struct
 import zlib
 
 
@@ -278,6 +279,79 @@ class NoteSection(Section):
         """
         return iter_notes(self.elffile, self['sh_offset'], self['sh_size'])
 
+class PrecordEnd(object):
+    """ PrecordEnd object - representing the end of a precord list.
+    """
+    def __init__(self, signal, count):
+        self.signal = signal
+        self.count = count
+
+    def __repr__(self):
+        s = '&lt;%s: signal=0x%08x count=%d&gt;' % \
+            (self.__class__.__name__, self.signal, self.count)
+        return s
+
+class PrecordReg(object):
+    """ PrecordReg object - representing a register inside a precord list.
+    """
+    def __init__(self, number, name, value, length):
+        self.number = number
+        self.name = name
+        self.value = value
+        self.length = length
+
+    def __repr__(self):
+        s = '&lt;%s (%s): 0x%s&gt;' % \
+            (self.__class__.__name__, self.name, hex(self.value)[2:].zfill(self.length * 2))
+        return s
+
+
+class PrecordMem(object):
+    """ PrecordMem object - representing a memory write a precord list.
+    """
+    def __init__(self, memlen, memaddr, memval):
+        self.memlen = memlen
+        self.memaddr = memaddr
+        self.memval = memval
+
+    def __repr__(self):
+        s = '&lt;%s (%016x, %d): 0x%s&gt;' % \
+            (self.__class__.__name__, self.memaddr, self.memlen, hex(self.memval)[2:].zfill(self.memlen * 2))
+        return s
+
+class PrecordSection(Section):
+    """ GDB program record section.
+    """
+    def iter_precords(self, parse_reg, byteorder):
+        """ Yield all precord entries.  Result types are precord objects.
+        """
+        offset = self['sh_offset']
+        size = self['sh_size']
+        end = offset + size
+
+        self.stream.seek(offset)
+        magic = struct.unpack('!I', self.stream.read(4))[0]
+
+        if magic != 0x20091016:
+            raise ELFParseError("Unknown precord magic %08x" % magic)
+
+        while self.stream.tell() &lt; end:
+            record_type = struct.unpack('!B', self.stream.read(1))[0]
+
+            if record_type == 0x00:
+                signal, count = struct.unpack('!II', self.stream.read(8))
+                yield PrecordEnd(signal, count)
+            elif record_type == 0x01:
+                regnum = struct.unpack('!I', self.stream.read(4))[0]
+                regname, regval, reglength = parse_reg(regnum, self.stream)
+                yield PrecordReg(regnum, regname, int.from_bytes(regval, byteorder), reglength)
+            elif record_type == 0x02:
+                memlen, memaddr = struct.unpack('!IQ', self.stream.read(12))
+                memval = self.stream.read(memlen)
+                yield PrecordMem(memlen, memaddr, int.from_bytes(memval, byteorder))
+            else:
+                raise ELFParseError("Unknown precord type %02x" % record_type)
+
 
 class StabSection(Section):
     """ ELF stab section.
</code>
</pre>
</div>
</div></div>
<input type="radio" name="pyelftools" id="tab-pyelftools-precordcompare" checked="" />
<label for="tab-pyelftools-precordcompare">precordcompare.py</label>
<div class="tab"><div class="language-python highlighter-rouge">
<div class="highlight">
<pre class="highlight">
<code>#!/usr/bin/env python
import argparse

from elftools.elf.elffile import ELFFile
from elftools.elf.sections import PrecordReg, PrecordEnd

PRECORD_NAME = 'precord'

def parse_reg_i386(regnum, stream):
  # Taken from binutils-gdb/gdb/features/i386/32bit-core.xml
  regnames=[
    "eax",   "ecx",    "edx",   "ebx",
    "esp",   "ebp",    "esi",   "edi",
    "eip",   "eflags", "cs",    "ss",
    "ds",    "es",     "fs",    "gs",
    "st0",   "st1",    "st2",   "st3",
    "st4",   "st5",    "st6",   "st7",
    "fctrl", "fstat",  "ftag",  "fiseg",
    "fioff", "foseg",  "fooff", "fop"
  ]
  regsizes=[
    4, 4, 4, 4,
    4, 4, 4, 4,
    4, 4, 4, 4,
    4, 4, 4, 4,
    10, 10, 10, 10,
    10, 10, 10, 10,
    4, 4, 4, 4,
    4, 4, 4, 4
  ]
  return regnames[regnum], stream.read(regsizes[regnum]), regsizes[regnum]

class AddressMapping:
    def __init__(self, spec):
        parts = spec.split(':')
        self.source = int(parts[0], 16)
        self.destination = int(parts[1], 16)
        self.length = int(parts[2], 16)

def map_address(addr, mappings):
    for mapping in mappings:
        if addr &gt;= mapping.source and addr &lt; (mapping.source + mapping.length):
            return addr - mapping.source + mapping.destination
    return addr

def main():
    argparser = argparse.ArgumentParser()
    argparser.add_argument('precord', nargs=2, default=None, help='Record files to compare')
    argparser.add_argument('-m', '--mapping', nargs='*', type=AddressMapping, dest='mappings', default=[])
    argparser.add_argument('-r', '--check-register-mappings', nargs='*', type=str, dest='registers_mappings', default=[])

    args = argparser.parse_args()

    with open(args.precord[0], 'rb') as file1:
        with open(args.precord[1], 'rb') as file2:
            elffile1 = ELFFile(file1)
            elffile2 = ELFFile(file2)

            section1 = elffile1.get_section_by_name(PRECORD_NAME)
            iter_precords1 = section1.iter_precords(parse_reg_i386, 'little')
            section2 = elffile2.get_section_by_name(PRECORD_NAME)
            iter_precords2 = section2.iter_precords(parse_reg_i386, 'little')

            registers1 = dict()
            
            registers2 = dict()
            for record1, record2 in zip(iter_precords1, iter_precords2):
                print("%-79s %s" % (record1, record2))
                if type(record1) != type(record2):
                    print("Record type divergence")
                    break
                elif type(record1) == PrecordEnd:
                    if record1.count != record2.count:
                        print("End record count divergence")
                        break
                elif type(record1) == PrecordReg:
                    if record1.name != record2.name:
                        print("Register name divergence")
                        break
                    registers1[record1.name] = record1.value
                    registers2[record2.name] = record2.value
                    
                    if record1.name in args.registers_mappings:
                        mapped_value = map_address(record1.value, args.mappings)
                        if mapped_value != record2.value:
                            print("Register value divergence (0x%016x -&gt; 0x%016x != 0x%016x)" % (record1.value, mapped_value, record2.value))
                            break

if __name__ == '__main__':
    main()
</code>
</pre>
</div>
</div></div>
</div>

<div class="box box-warning">
  <p>Maybe there are easier, off-the-shelf ways to do this kind of behavioral difference analysis between dissimilar programs, but I don’t know of one.</p>
</div>

<p>Let’s try it out:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ PYTHONPATH=${HOME}/Documents/pyelftools python3 ./precordcompare.py ${JAGHELLO}/aln.record ${JAGHELLO}/aln.elf.record --check-register-mappings eip
&lt;PrecordReg (esp): 0xf7d8bf24&gt;                                                  &lt;PrecordReg (esp): 0xffffce90&gt;
&lt;PrecordMem (00000000f7d8bf24, 4): 0x00001025&gt;                                  &lt;PrecordMem (00000000ffffce90, 4): 0x08049005&gt;
&lt;PrecordReg (eip): 0x0000fd4c&gt;                                                  &lt;PrecordReg (eip): 0x08057d2c&gt;
Register value divergence (0x000000000000fd4c -&gt; 0x000000000000fd4c != 0x0000000008057d2c)
</code></pre></div></div>

<p>…and it diverged immediately.</p>

<p>Another tricky point is that we’re trying to compare two execution traces across two <em>different</em> programs, who don’t have the same memory layout.
<code class="language-plaintext highlighter-rouge">aln</code> was delinked and relinked as a whole, so the meat of it is essentially the same but moved at a different virtual address.
We’ll account for this by adding mappings from one address space to another:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ PYTHONPATH=${HOME}/Documents/pyelftools python3 ./recordcompare.py ${JAGHELLO}/aln.record ${JAGHELLO}/aln.elf.record --mapping 0x1020:0x08049000:0x01e973 --check-register-mappings eip
&lt;PrecordReg (esp): 0xf7d8bf24&gt;                                                  &lt;PrecordReg (esp): 0xffffce90&gt;
&lt;PrecordMem (00000000f7d8bf24, 4): 0x00001025&gt;                                  &lt;PrecordMem (00000000ffffce90, 4): 0x08049005&gt;
&lt;PrecordReg (eip): 0x0000fd4c&gt;                                                  &lt;PrecordReg (eip): 0x08057d2c&gt;
&lt;PrecordEnd: signal=0x00000000 count=1&gt;                                         &lt;PrecordEnd: signal=0x00000000 count=1&gt;
&lt;PrecordReg (esp): 0xf7d8bf20&gt;                                                  &lt;PrecordReg (esp): 0xffffce8c&gt;
&lt;PrecordMem (00000000f7d8bf20, 4): 0xffffcdd8&gt;                                  &lt;PrecordMem (00000000ffffce8c, 4): 0x00000000&gt;
&lt;PrecordReg (eip): 0x0000fd4d&gt;                                                  &lt;PrecordReg (eip): 0x08057d2d&gt;
&lt;PrecordEnd: signal=0x00000000 count=2&gt;                                         &lt;PrecordEnd: signal=0x00000000 count=2&gt;
...
&lt;PrecordReg (esp): 0xf7d8be0c&gt;                                                  &lt;PrecordReg (esp): 0xffffcd78&gt;
&lt;PrecordReg (eflags): 0x00000286&gt;                                               &lt;PrecordReg (eflags): 0x00000286&gt;
&lt;PrecordReg (eip): 0x00009da3&gt;                                                  &lt;PrecordReg (eip): 0x08051d83&gt;
&lt;PrecordEnd: signal=0x00000000 count=215180&gt;                                    &lt;PrecordEnd: signal=0x00000000 count=215180&gt;
&lt;PrecordReg (eax): 0x5656a200&gt;                                                  &lt;PrecordReg (eax): 0x0807a200&gt;
&lt;PrecordReg (eip): 0x00009da7&gt;                                                  &lt;PrecordReg (eip): 0x08051d87&gt;
&lt;PrecordEnd: signal=0x00000000 count=215181&gt;                                    &lt;PrecordEnd: signal=0x00000000 count=215181&gt;
&lt;PrecordReg (eax): 0x56562000&gt;                                                  &lt;PrecordReg (eax): 0x08070000&gt;
&lt;PrecordReg (eflags): 0x00000206&gt;                                               &lt;PrecordReg (eflags): 0x00000246&gt;
&lt;PrecordReg (eip): 0x00009dab&gt;                                                  &lt;PrecordReg (eip): 0x08051d8b&gt;
&lt;PrecordEnd: signal=0x00000000 count=215182&gt;                                    &lt;PrecordEnd: signal=0x00000000 count=215182&gt;
&lt;PrecordReg (eflags): 0x00000246&gt;                                               &lt;PrecordReg (eflags): 0x00000287&gt;
&lt;PrecordReg (eip): 0x00009daf&gt;                                                  &lt;PrecordReg (eip): 0x08051d8f&gt;
&lt;PrecordEnd: signal=0x00000000 count=215183&gt;                                    &lt;PrecordEnd: signal=0x00000000 count=215183&gt;
&lt;PrecordReg (eip): 0x00009e28&gt;                                                  &lt;PrecordReg (eip): 0x08051d91&gt;
Register value divergence (0x0000000000009e28 -&gt; 0x0000000008051e08 != 0x0000000008051d91)
</code></pre></div></div>

<p>Aha!
We have an instruction pointer divergence.
Looking backwards at the previous instruction pointer values (<code class="language-plaintext highlighter-rouge">0x9e28</code>, <code class="language-plaintext highlighter-rouge">0x9daf</code>, <code class="language-plaintext highlighter-rouge">0x9dab</code>, <code class="language-plaintext highlighter-rouge">0x9da7</code>), we stumble upon the smoking gun, a bad reference:</p>

<div class="fullwindow-zoom-click">
    <label>
        <input type="checkbox" onclick="onclick_fullWindowZoom(event)" />
        <div class="fullwindow-zoom-click-content">
            <figure>
                <img src="/atari-jaguar-sdk/assets/part-3/aln-bad-reference.png" />
                <figcaption>Figure 3: Bad reference to s_t_enough_arguments_000027fa</figcaption>
            </figure>
        </div>
    </label>
</div>

<p>Finally, by cross-referencing the raw disassembly of the artifacts at hand we can piece together what happened in detail:</p>

<div class="tabs">
<input type="radio" name="disassembly-diff" id="tab-disassembly-aln" checked="" />
<label for="tab-disassembly-aln">aln (original a.out)</label>
<div class="tab"><div class="language-plaintext highlighter-rouge">
<div class="highlight">
<pre class="highlight">
<code>$ i686-linux-gnu-objdump -D -b binary -m i386 --adjust-vma=0x1000 aln
...
    9da0:       83 c4 10                add    $0x10,%esp
    9da3:       66 8b 45 fe             mov    -0x2(%ebp),%ax
    9da7:       66 25 00 28             and    $0x2800,%ax
    9dab:       66 3d 00 20             cmp    $0x2000,%ax
    9daf:       74 77                   je     0x9e28
...</code>
</pre>
</div>
</div></div>
<input type="radio" name="disassembly-diff" id="tab-disassembly-aln-whole-o" />
<label for="tab-disassembly-aln-whole-o">aln.whole.o (delinked)</label>
<div class="tab"><div class="language-plaintext highlighter-rouge">
<div class="highlight">
<pre class="highlight">
<code>$ i686-linux-gnu-objdump -Dr aln.whole.o
...
    8d80:       83 c4 10                add    $0x10,%esp
    8d83:       66 8b 45 fe             mov    -0x2(%ebp),%ax
    8d87:       66 25 06 00             and    $0x6,%ax
                        8d89: R_386_NONE        s_-x:_not_enough_arguments_000027fa
    8d8b:       66 3d 00 20             cmp    $0x2000,%ax
    8d8f:       74 77                   je     8e08 &lt;LAB_00009e28&gt;
...</code>
</pre>
</div>
</div></div>
<input type="radio" name="disassembly-diff" id="tab-disassembly-aln-elf" />
<label for="tab-disassembly-aln-elf">aln.elf (relinked)</label>
<div class="tab"><div class="language-plaintext highlighter-rouge">
<div class="highlight">
<pre class="highlight">
<code>$ i686-linux-gnu-objdump -D aln.elf
...
 8051d80:       83 c4 10                add    $0x10,%esp
 8051d83:       66 8b 45 fe             mov    -0x2(%ebp),%ax
 8051d87:       66 25 06 00             and    $0x6,%ax
 8051d8b:       66 3d 00 20             cmp    $0x2000,%ax
 8051d8f:       74 77                   je     8051e08 &lt;LAB_00009e28&gt;
...</code>
</pre>
</div>
</div></div>
</div>

<ol>
  <li>The integer constant <code class="language-plaintext highlighter-rouge">0x2800</code> of the instruction <code class="language-plaintext highlighter-rouge">and ax, 0x2800</code> was misidentified as an address during Ghidra’s static analysis ;</li>
  <li>This led to the introduction of a fake reference to the symbol <code class="language-plaintext highlighter-rouge">s_t_enough_arguments_000027fa</code> at this location in the Ghidra database ;</li>
  <li>That reference caused the relocation synthesizer analyzer to emit a bogus relocation for this integer constant ;</li>
  <li>This bogus relocation led to the corruption of the integer constant during the exportation of <code class="language-plaintext highlighter-rouge">aln.whole.o</code> and the linkage of <code class="language-plaintext highlighter-rouge">aln.elf</code> ;</li>
  <li>The instruction with the altered integer constant ultimately led to the observable change in behavior of <code class="language-plaintext highlighter-rouge">aln.elf</code>.</li>
</ol>

<p>Whew!
That was quite the journey to track this down.
The fix here is to remove the bogus references by clicking on it and hitting the <code class="language-plaintext highlighter-rouge">Delete</code> key (or right-cliking it and selecting <code class="language-plaintext highlighter-rouge">Delete Memory References</code>), so that it doesn’t end up corrupting the exported object file.</p>

<h2 id="same-software-different-case">Same software, different case</h2>

<p>After another round of debugging <code class="language-plaintext highlighter-rouge">aln.elf</code>, we finally end up with the following:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ rm -f jaghello.cof &amp;&amp; make LINK=aln.elf V=1
aln.elf -e -l -g2 -rd -a 4000 x x -v -o jaghello.cof startup.o jag.o
***********************************
*   ATARI LINKER (Mar 17 1995)    *
*  Adds from Atari version 1.11   *
*     and PC/DOS&amp;Linux ports      *
*  Copyright 1993-95 Brainstorm.  *
** Copyright 1987-95 Atari Corp. **
***********************************
Output file is jaghello.cof

Sizes:   Text   Data    Bss   Syms
(hex)     3C0    410   FA54   1C00

Link complete.
$ file jaghello.cof
jaghello.cof: mc68k COFF object not stripped
$ sha256sum jaghello.cof
f9c8269cdc998de01c0ac7a3e815c16b7ced106e25f10f92a7078c722a220dbb  jaghello.cof
</code></pre></div></div>

<p>We have reproduced <code class="language-plaintext highlighter-rouge">jaghello.cof</code> identically with out newly-minted <code class="language-plaintext highlighter-rouge">aln.elf</code> port, as per the requirements laid out in <a href="/atari-jaguar-sdk/2023/12/04/part-1.html">part 1</a>.</p>

<div class="box box-information">
  <p>The files for this case study can be found here: <a href="/atari-jaguar-sdk/assets/case-study.tar.gz" download="">case-study.tar.gz</a></p>
</div>

<h2 id="conclusion">Conclusion</h2>

<p>We have successfully ported the whole of <code class="language-plaintext highlighter-rouge">aln</code> from a Linux a.out executable to a modern Linux ELF executable.
Next time, we’ll rip out the old C standard library embedded within <code class="language-plaintext highlighter-rouge">aln</code> and twist it further away from its roots.</p>

<div style="display: grid; grid-template-columns: 1fr 1fr; margin-top: 1ch; margin-bottom: 1ch;">
<div style="padding-right: 15px;">


<a class="prev" href="/atari-jaguar-sdk/2023/12/11/part-2.html">&laquo; Porting the Atari Jaguar SDK part 2: abusing Ghidra's version tracking tool for fun and profit</a>

</div>
<div style="padding-left: 15px; text-align: right;">


<a href="/atari-jaguar-sdk/2024/01/01/part-4.html">Porting the Atari Jaguar SDK part 4: where we're going, we don't need the C standard library &raquo;</a>

</div>
</div>]]></content><author><name>Jean-Baptiste Boric</name></author><category term="atari-jaguar-sdk" /><summary type="html"><![CDATA[Previously in this series of articles, we used Ghidra’s Version Tracking tool to annotate parts of aln with a closely matching C static library from Slackware 2.3. In this article, we will start making software ports of aln, despite not having the source code for it.]]></summary></entry><entry><title type="html">Porting the Atari Jaguar SDK part 2: abusing Ghidra’s version tracking tool for fun and profit</title><link href="https://boricj.net/atari-jaguar-sdk/2023/12/11/part-2.html" rel="alternate" type="text/html" title="Porting the Atari Jaguar SDK part 2: abusing Ghidra’s version tracking tool for fun and profit" /><published>2023-12-11T13:00:00+01:00</published><updated>2023-12-11T13:00:00+01:00</updated><id>https://boricj.net/atari-jaguar-sdk/2023/12/11/part-2</id><content type="html" xml:base="https://boricj.net/atari-jaguar-sdk/2023/12/11/part-2.html"><![CDATA[<p><a href="/atari-jaguar-sdk/2023/12/04/part-1.html">Previously</a> in this <a href="/atari-jaguar-sdk/2023/11/27/introduction.html">series of articles</a>, we loaded by hand into Ghidra <code class="language-plaintext highlighter-rouge">aln</code>, an executable artifact in the traditional Unix a.out format, despite Ghidra not offering support for this particular file format.
In this article, we will start reverse-engineering <code class="language-plaintext highlighter-rouge">aln</code> with the intent to port it to new environments.</p>

<h2 id="reverse-engineering-tricks-for-lazy-people">Reverse-engineering tricks for lazy people</h2>

<p>In order to make software ports of <code class="language-plaintext highlighter-rouge">aln</code>, we will need some level of understanding on how it is put together.
The <code class="language-plaintext highlighter-rouge">aln</code> executable itself is about 116 KiB in size, which is a large program to reverse-engineer.
Unlike our past <a href="/reverse-engineering/2023/05/01/introduction.html">case study</a> we will not attempt a thorough or detailed analysis of <code class="language-plaintext highlighter-rouge">aln</code>, since that would simply take way too long.
Instead, we’ll cheat and exploit every trick or leverage that we can to cut down on the amount of work required.</p>

<p>One such leverage is the fact that <code class="language-plaintext highlighter-rouge">aln</code> is a statically-linked program.
It is self-sufficient and doesn’t require external libraries to run: any library the program needs is embedded within it.
In particular, we can expect it to contain parts of a C runtime library somewhere in there.</p>

<h3 id="hunting-for-the-c-standard-library">Hunting for the C standard library</h3>

<p>Given its age and context, <code class="language-plaintext highlighter-rouge">aln</code> can reasonably be expected to contain a glibc 1.x static library from a Linux distribution of this era.
Unfortunately, getting a hold of derelict Linux userlands can be challenging and getting them to run even more so.
After trying a bunch of distributions and much gnashing of teeth, Slackware 2.3’s <code class="language-plaintext highlighter-rouge">libc.a</code> happens to be a near-perfect match.</p>

<h3 id="loading-the-c-standard-library">Loading the C standard library</h3>

<p>Static libraries are an archive of object files, but we’re going to use Ghidra’s Version Tracking tool, which can only work on one source/destination program pair at a time.
We need to extract the static archive (<code class="language-plaintext highlighter-rouge">ar x libc.a</code>) and then combine all those object files into one (<code class="language-plaintext highlighter-rouge">ld -r *.o -o libc.o</code>) so that the tool can work with it.</p>

<p>The upside is that after combining <code class="language-plaintext highlighter-rouge">libc.a</code> into <code class="language-plaintext highlighter-rouge">libc.o</code> we have one object file which contains every symbol from the various <code class="language-plaintext highlighter-rouge">.o</code> files that <code class="language-plaintext highlighter-rouge">libc.a</code> is made up of.
The downside is that <code class="language-plaintext highlighter-rouge">libc.a</code> doesn’t carry any <em>debugging</em> symbols, but we can cross-reference the original glibc source code if we need to make up for it.</p>

<p>Now that we have one file to work with, we need to actually load it.
As of Ghidra 10.4 the a.out file format isn’t supported, but there is a <a href="https://github.com/NationalSecurityAgency/ghidra/pull/5004">pull request</a> that implements it. Since <code class="language-plaintext highlighter-rouge">libc.o</code> contains a lot of metadata like symbols we want to use, <a href="/atari-jaguar-sdk/2023/12/04/part-1.html">unlike last time</a> we will not try to load this object file by hand.
Instead, after building this modified version of Ghidra we can load <code class="language-plaintext highlighter-rouge">libc.o</code> using the UNIX A.out loader format:</p>

<div class="fullwindow-zoom-click">
    <label>
        <input type="checkbox" onclick="onclick_fullWindowZoom(event)" />
        <div class="fullwindow-zoom-click-content">
            <figure>
                <img src="/atari-jaguar-sdk/assets/part-2/import-libc.o.png" />
                <figcaption>Figure 1: Importing libc.o</figcaption>
            </figure>
        </div>
    </label>
</div>

<h2 id="version-tracking-to-the-rescue">Version tracking to the rescue</h2>

<p>Now that we have both our source program (<code class="language-plaintext highlighter-rouge">libc.o</code>) and our destination program (<code class="language-plaintext highlighter-rouge">aln</code>), we can finally start Ghidra’s Version Tracking tool by clicking on the footprints icon in the project window:</p>

<div class="fullwindow-zoom-click">
    <label>
        <input type="checkbox" onclick="onclick_fullWindowZoom(event)" />
        <div class="fullwindow-zoom-click-content">
            <figure>
                <img src="/atari-jaguar-sdk/assets/part-2/version-tracking-empty.png" />
                <figcaption>Figure 2: Version tracking tool with no session opened</figcaption>
            </figure>
        </div>
    </label>
</div>

<p>Since we directly launched this tool, it started without a session opened.
Start a new session by clicking on the footstep icon (or <code class="language-plaintext highlighter-rouge">File &gt; New Session...</code>):</p>

<div class="fullwindow-zoom-click">
    <label>
        <input type="checkbox" onclick="onclick_fullWindowZoom(event)" />
        <div class="fullwindow-zoom-click-content">
            <figure>
                <img src="/atari-jaguar-sdk/assets/part-2/version-tracking-new-session.png" />
                <figcaption>Figure 3: New session of version tracking</figcaption>
            </figure>
        </div>
    </label>
</div>

<p>After running the precondition checks and finishing the new session wizard, Ghidra will open two CodeBrowser windows, one for the source program and one for the destination program.</p>

<p>From there, to leverage the source file we need to start matching it to the destination file.
To get matches, we need to run correlators by clicking on the green plus icon (or <code class="language-plaintext highlighter-rouge">File &gt; Add to session</code>) and selecting which correlation algorithms to use:</p>

<div class="fullwindow-zoom-click">
    <label>
        <input type="checkbox" onclick="onclick_fullWindowZoom(event)" />
        <div class="fullwindow-zoom-click-content">
            <figure>
                <img src="/atari-jaguar-sdk/assets/part-2/version-tracking-adding-correlators.png" />
                <figcaption>Figure 4: Adding new correlators to the version tracking session</figcaption>
            </figure>
        </div>
    </label>
</div>

<p>Since we have a source artifact that is a very close match and a destination artifact that has no symbols, we’ll wing it with just the exact match correlation algorithms with the default settings for now.</p>

<p>After running the correlation algorithms, over 300 potential matches appeared in the version tracking window:</p>

<div class="fullwindow-zoom-click">
    <label>
        <input type="checkbox" onclick="onclick_fullWindowZoom(event)" />
        <div class="fullwindow-zoom-click-content">
            <figure>
                <img src="/atari-jaguar-sdk/assets/part-2/version-tracking-new-matches-found.png" />
                <figcaption>Figure 5: Newly found matches in the version tracking session</figcaption>
            </figure>
        </div>
    </label>
</div>

<p>We can see the details of a particular match by selecting it, such as the exact function mnemonics match for <code class="language-plaintext highlighter-rouge">_brk</code>:</p>

<div class="fullwindow-zoom-click">
    <label>
        <input type="checkbox" onclick="onclick_fullWindowZoom(event)" />
        <div class="fullwindow-zoom-click-content">
            <figure>
                <img src="/atari-jaguar-sdk/assets/part-2/version-tracking-_brk_match.png" />
                <figcaption>Figure 6: Exact function mnemonics match for _brk</figcaption>
            </figure>
        </div>
    </label>
</div>

<p>The bottom part of the version tracking tool shows how the two pieces of both programs correlates. 
From there, we have three options:</p>
<ul>
  <li>Accepting a match, giving it a green flag status ;</li>
  <li>Rejecting a match, giving it a forbidden icon status ;</li>
  <li>Applying a match’s markup, removing it from the list of matches.</li>
</ul>

<p>We’ll accept this match by clicking on the green flag (or by right-cliking it and selecting <code class="language-plaintext highlighter-rouge">Accept</code>):</p>

<div class="fullwindow-zoom-click">
    <label>
        <input type="checkbox" onclick="onclick_fullWindowZoom(event)" />
        <div class="fullwindow-zoom-click-content">
            <figure>
                <img src="/atari-jaguar-sdk/assets/part-2/version-tracking-_brk_match_accepted.png" />
                <figcaption>Figure 7: Exact function mnemonics match for _brk accepted</figcaption>
            </figure>
        </div>
    </label>
</div>

<p>Both matches for <code class="language-plaintext highlighter-rouge">_brk</code> have been accepted and the <code class="language-plaintext highlighter-rouge">FUN_0000fe78</code> function in the destination program has been renamed <code class="language-plaintext highlighter-rouge">_brk</code>.
Furthermore, two new implied data matches for <code class="language-plaintext highlighter-rouge">____brk_addr</code> and <code class="language-plaintext highlighter-rouge">_errno</code> appeared with red highlights, since both references inside the <code class="language-plaintext highlighter-rouge">_brk</code> function are identical between the two programs.
We can retire accepted matches by clicking on the green checkmark icon (or right-clicking them and selection <code class="language-plaintext highlighter-rouge">Apply Markup</code>), which will make them disappear from the matches list to unclutter it.</p>

<p>Using the version tracking tool effectively requires triaging matches.
This in turn may generate further implied matches, transferring more and more of the markups from the source program to the destination program.</p>

<p>After processing every match, the destination program (<code class="language-plaintext highlighter-rouge">aln</code>) is fully marked up with all the information from the source program (<code class="language-plaintext highlighter-rouge">libc.o</code>):</p>

<div class="fullwindow-zoom-click">
    <label>
        <input type="checkbox" onclick="onclick_fullWindowZoom(event)" />
        <div class="fullwindow-zoom-click-content">
            <figure>
                <img src="/atari-jaguar-sdk/assets/part-2/aln-symbol-table.png" />
                <figcaption>Figure 8: Symbol table for aln populated from libc.o</figcaption>
            </figure>
        </div>
    </label>
</div>

<p>That’s a lot of information inside <code class="language-plaintext highlighter-rouge">aln</code> we didn’t need to reverse-engineer by hand.</p>

<div class="box box-information">
  <p>The files for this case study can be found here: <a href="/atari-jaguar-sdk/assets/case-study.tar.gz" download="">case-study.tar.gz</a></p>
</div>

<h2 id="conclusion">Conclusion</h2>

<p>We have learned how to use Ghidra’s version tracking tool to transfer markings from one program to another and used it to annotate parts of <code class="language-plaintext highlighter-rouge">aln</code> with a closely matching C static library.
Next time, we’ll start our porting journey by converting this a.out executable into a modern ELF executable.</p>

<div style="display: grid; grid-template-columns: 1fr 1fr; margin-top: 1ch; margin-bottom: 1ch;">
<div style="padding-right: 15px;">


<a class="prev" href="/atari-jaguar-sdk/2023/12/04/part-1.html">&laquo; Porting the Atari Jaguar SDK part 1: loading executables into Ghidra the hard way</a>

</div>
<div style="padding-left: 15px; text-align: right;">


<a href="/atari-jaguar-sdk/2023/12/18/part-3.html">Porting the Atari Jaguar SDK part 3: makin' elves &raquo;</a>

</div>
</div>]]></content><author><name>Jean-Baptiste Boric</name></author><category term="atari-jaguar-sdk" /><summary type="html"><![CDATA[Previously in this series of articles, we loaded by hand into Ghidra aln, an executable artifact in the traditional Unix a.out format, despite Ghidra not offering support for this particular file format. In this article, we will start reverse-engineering aln with the intent to port it to new environments.]]></summary></entry><entry><title type="html">Porting the Atari Jaguar SDK part 1: loading executables into Ghidra the hard way</title><link href="https://boricj.net/atari-jaguar-sdk/2023/12/04/part-1.html" rel="alternate" type="text/html" title="Porting the Atari Jaguar SDK part 1: loading executables into Ghidra the hard way" /><published>2023-12-04T13:00:00+01:00</published><updated>2023-12-04T13:00:00+01:00</updated><id>https://boricj.net/atari-jaguar-sdk/2023/12/04/part-1</id><content type="html" xml:base="https://boricj.net/atari-jaguar-sdk/2023/12/04/part-1.html"><![CDATA[<p>After <a href="/atari-jaguar-sdk/2023/11/27/introduction.html">laying out the context for this series of articles</a>, let’s begin our journey on how to make software ports of programs with no source code.
Our case study will be <code class="language-plaintext highlighter-rouge">aln</code>, the Atari linker for the Atari Jaguar ; specifically, its original Linux port in the a.out executable file format for the 32-bit x86 architecture.</p>

<h2 id="running-aln">Running aln</h2>

<p>Before trying to reverse-engineer the artifact head-on, let’s run it first as-is on a modern x86_64 Linux system.
After <a href="https://github.com/cubanismo/jaguar-sdk/raw/22abac7e0b0e83871e413c328d43ec4fc577036c/jaguar/bin/linux/aln-a.out">downloading</a> the <code class="language-plaintext highlighter-rouge">aln</code> artifact, we’ll use Kees Cook’s <a href="https://github.com/kees/kernel-tools/tree/trunk/a.out">a.out loader</a> to execute it.
Let’s download and build it:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ wget https://raw.githubusercontent.com/kees/kernel-tools/trunk/a.out/aout.c
$ i686-linux-gnu-gcc aout.c -o aout -static
</code></pre></div></div>

<p>Then, we can use the loader to run <code class="language-plaintext highlighter-rouge">aln</code>:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ ./aout ./aln -v
***********************************
*   ATARI LINKER (Mar 17 1995)    *
*  Adds from Atari version 1.11   *
*     and PC/DOS&amp;Linux ports      *
*  Copyright 1993-95 Brainstorm.  *
** Copyright 1987-95 Atari Corp. **
***********************************
No object files to link.
Link aborted.
</code></pre></div></div>

<div class="box box-information">
  <p>a.out QMAGIC executables are based very low in the virtual address space (4096), which is likely below the minimum <code class="language-plaintext highlighter-rouge">mmap()</code> address configured on a modern Linux system.
Either lower that minimum address with <code class="language-plaintext highlighter-rouge">sysctl -w vm.mmap_min_addr=4096</code> or use QEMU’s user-mode emulation to run <code class="language-plaintext highlighter-rouge">aout</code>.</p>
</div>

<p>There is a packaging of a complete <a href="https://github.com/cubanismo/jaguar-sdk">Jaguar SDK by cubanismo</a>, ready to use on modern systems.
It offers the choice between running the original tools and modern replacements, if any.
After setting it up, we can build the sample included with the original <code class="language-plaintext highlighter-rouge">aln</code> executable like so:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ make LINK=aln V=1
rmac -fb -g2 -rd -v +o0 +o1 +o2 startup.s
startup.s 81: Warning: RISC code generated with no origin defined
[Writing object file: startup.o]
TEXT segment: 624 bytes
DATA segment: 0 bytes
BSS  segment: 64080 bytes
Total       : 64704 bytes
TextRel size: 248 bytes
DataRel size: 0 bytes
m68k-aout-gcc -DJAGUAR -I/home/boricj/Documents/jaguar-sdk/jaguar/include -O2 -c -o jag.o jag.c
aln -e -l -g2 -rd -a 4000 x x -v -o jaghello.cof startup.o jag.o
***********************************
*   ATARI LINKER (Mar 17 1995)    *
*  Adds from Atari version 1.11   *
*     and PC/DOS&amp;Linux ports      *
*  Copyright 1993-95 Brainstorm.  *
** Copyright 1987-95 Atari Corp. **
***********************************
Output file is jaghello.cof

Sizes:   Text   Data    Bss   Syms
(hex)     3C0    410   FA54   1C00

Link complete.
$ file jaghello.cof
jaghello.cof: mc68k COFF object not stripped
$ sha256sum jaghello.cof
f9c8269cdc998de01c0ac7a3e815c16b7ced106e25f10f92a7078c722a220dbb  jaghello.cof
</code></pre></div></div>

<p>We’ll assume that our future ports of <code class="language-plaintext highlighter-rouge">aln</code> to be successful if they can replicate <code class="language-plaintext highlighter-rouge">jaghello.cof</code>.
It doesn’t completely prove that the ports are fit for purpose as this is hardly an exhaustive stress testing of the linker, but it will demonstrate at least some basic level of functionality.</p>

<h2 id="loading-aln-into-ghidra">Loading aln into Ghidra</h2>

<p>The Linux port of <code class="language-plaintext highlighter-rouge">aln</code> is a statically-linked QMAGIC a.out executable.
At the time of writing this article, vanilla Ghidra doesn’t have an a.out file loader.
There is a <a href="https://github.com/NationalSecurityAgency/ghidra/pull/5004">pull request</a> that aims to implement it, but we’ll demonstrate how to manually load <code class="language-plaintext highlighter-rouge">aln</code> into Ghidra by hand.</p>

<p>For now, we’ll keep things simple and assume the a.out file format is composed of just four parts:</p>

<ol>
  <li>A header ;</li>
  <li>A <code class="language-plaintext highlighter-rouge">.text</code> segment ;</li>
  <li>A <code class="language-plaintext highlighter-rouge">.data</code> segment ;</li>
  <li>A <code class="language-plaintext highlighter-rouge">.bss</code> segment.</li>
</ol>

<p>The header has the following structure:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">struct</span> <span class="n">aout_header</span>
<span class="p">{</span>
	<span class="kt">uint32_t</span> <span class="n">a_info</span><span class="p">;</span>    <span class="cm">/* machine type, magic, etc */</span>
	<span class="kt">uint32_t</span> <span class="n">a_text</span><span class="p">;</span>    <span class="cm">/* text size */</span>
	<span class="kt">uint32_t</span> <span class="n">a_data</span><span class="p">;</span>    <span class="cm">/* data size */</span>
	<span class="kt">uint32_t</span> <span class="n">a_bss</span><span class="p">;</span>     <span class="cm">/* desired bss size */</span>
	<span class="kt">uint32_t</span> <span class="n">a_syms</span><span class="p">;</span>    <span class="cm">/* symbol table size */</span>
	<span class="kt">uint32_t</span> <span class="n">a_entry</span><span class="p">;</span>   <span class="cm">/* entry address */</span>
	<span class="kt">uint32_t</span> <span class="n">a_trsize</span><span class="p">;</span>  <span class="cm">/* text relocation size */</span>
	<span class="kt">uint32_t</span> <span class="n">a_drsize</span><span class="p">;</span>  <span class="cm">/* data relocation size */</span>
<span class="p">};</span>
</code></pre></div></div>

<p>When running an a.out QMAGIC executable, the Linux kernel will load it as following:</p>

<ul>
  <li>The header and <code class="language-plaintext highlighter-rouge">.text</code> segment are mapped read-execute at offset 4096 ;</li>
  <li>The <code class="language-plaintext highlighter-rouge">.data</code> segment is mapped read-write-execute immediately after the <code class="language-plaintext highlighter-rouge">.text</code> segment ;</li>
  <li>The <code class="language-plaintext highlighter-rouge">.bss</code> segment is mapped read-write immediately after the <code class="language-plaintext highlighter-rouge">.data</code> segment.</li>
</ul>

<p>First, let’s analyze the header:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ hexdump -C aln | head -n 2
00000000  cc 00 64 00 00 c0 01 00  00 10 00 00 94 09 00 00  |..d.............|
00000010  00 00 00 00 20 10 00 00  00 00 00 00 00 00 00 00  |.... ...........|
</code></pre></div></div>

<p>Using the data structure above (and remembering that the executable is in little-endian order), we can read the following interesting values:</p>

<table>
  <thead>
    <tr>
      <th>Field</th>
      <th>Offset</th>
      <th>Value</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">a_text</code></td>
      <td><code class="language-plaintext highlighter-rouge">4</code></td>
      <td><code class="language-plaintext highlighter-rouge">0x0001c000</code></td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">a_data</code></td>
      <td><code class="language-plaintext highlighter-rouge">8</code></td>
      <td><code class="language-plaintext highlighter-rouge">0x00001000</code></td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">a_bss</code></td>
      <td><code class="language-plaintext highlighter-rouge">12</code></td>
      <td><code class="language-plaintext highlighter-rouge">0x00000994</code></td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">a_entry</code></td>
      <td><code class="language-plaintext highlighter-rouge">20</code></td>
      <td><code class="language-plaintext highlighter-rouge">0x00001020</code></td>
    </tr>
  </tbody>
</table>

<p>Now that we have all the information needed from the header, we’ll import this executable inside Ghidra as a raw file and use the following settings:</p>

<ul>
  <li>Language: <code class="language-plaintext highlighter-rouge">x86:LE:32:default:gcc</code> ;</li>
  <li>Block name: <code class="language-plaintext highlighter-rouge">.text</code> ;</li>
  <li>Base address: <code class="language-plaintext highlighter-rouge">0x00001000</code> ;</li>
  <li>Length: <code class="language-plaintext highlighter-rouge">0x0001d000</code>.</li>
</ul>

<p>Open the file and skip auto-analysis for now.
Click on <code class="language-plaintext highlighter-rouge">Window &gt; Memory Map</code> to display the memory map:</p>

<div class="fullwindow-zoom-click">
    <label>
        <input type="checkbox" onclick="onclick_fullWindowZoom(event)" />
        <div class="fullwindow-zoom-click-content">
            <figure>
                <img src="/atari-jaguar-sdk/assets/part-1/memory-map-initial.png" />
                <figcaption>Figure 1: Initial memory map for aln</figcaption>
            </figure>
        </div>
    </label>
</div>

<p>Ghidra has loaded the whole of <code class="language-plaintext highlighter-rouge">aln</code> as one <code class="language-plaintext highlighter-rouge">.text</code> section (as instructed), but this isn’t how this a.out file is actually structured.
We need to fix up by hand the sections so that the memory map of Ghidra matches up with the memory map of the a.out file.</p>

<p>First, we’ll split off <code class="language-plaintext highlighter-rouge">.data</code> from <code class="language-plaintext highlighter-rouge">.text</code>.
Select the <code class="language-plaintext highlighter-rouge">.text</code> section and click on the orange “🟰” button on the top-right corner with the tooltip <code class="language-plaintext highlighter-rouge">Split a block</code>.
Fill in the dialog box as follows:</p>

<ul>
  <li>Block name: <code class="language-plaintext highlighter-rouge">.data</code> ;</li>
  <li>Block length: <code class="language-plaintext highlighter-rouge">0x00001000</code>.</li>
</ul>

<p>The rest of the fields will automatically adjust.
Click on <code class="language-plaintext highlighter-rouge">OK</code> to split the memory block.</p>

<p>Every initialized section is now correctly set up, but we still have <code class="language-plaintext highlighter-rouge">.bss</code> missing, which is an uninitialized section (meaning these bytes aren’t actually stored inside the a.out file).
Click on the green “➕” button on the top-right corner with the tooltip <code class="language-plaintext highlighter-rouge">Add a new block to memory</code>.
Fill in the dialog box as follows:</p>

<ul>
  <li>Block name: <code class="language-plaintext highlighter-rouge">.bss</code> ;</li>
  <li>Start address: <code class="language-plaintext highlighter-rouge">ram:0x0001e000</code> ;</li>
  <li>Block length: <code class="language-plaintext highlighter-rouge">0x00000994</code> ;</li>
  <li>Permissions: read, write ;</li>
  <li>Uninitialized.</li>
</ul>

<p>Click on <code class="language-plaintext highlighter-rouge">OK</code> to add the memory block.
The memory map should now look like this:</p>

<div class="fullwindow-zoom-click">
    <label>
        <input type="checkbox" onclick="onclick_fullWindowZoom(event)" />
        <div class="fullwindow-zoom-click-content">
            <figure>
                <img src="/atari-jaguar-sdk/assets/part-1/memory-map-fixed.png" />
                <figcaption>Figure 2: Adjusted memory map for aln</figcaption>
            </figure>
        </div>
    </label>
</div>

<p>The artifact is now properly loaded, but there is one last piece of information from the header that we can apply: the entrypoint.
Select the address <code class="language-plaintext highlighter-rouge">0x00001020</code> in the listing view and hit <code class="language-plaintext highlighter-rouge">F</code> (or right-click and select <code class="language-plaintext highlighter-rouge">Create Function</code>).
Then, hit the <code class="language-plaintext highlighter-rouge">L</code> key, rename the function to <code class="language-plaintext highlighter-rouge">_start</code> and mark it as an entry point.</p>

<div class="box box-information">
  <p>The files for this case study can be found here: <a href="/atari-jaguar-sdk/assets/case-study.tar.gz" download="">case-study.tar.gz</a></p>
</div>

<h2 id="conclusion">Conclusion</h2>

<p>We have loaded the <code class="language-plaintext highlighter-rouge">aln</code> executable artifact into Ghidra by hand.
Next time, we’ll start reverse-engineering this artifact in order to start porting <code class="language-plaintext highlighter-rouge">aln</code> to new environments.</p>

<div style="display: grid; grid-template-columns: 1fr 1fr; margin-top: 1ch; margin-bottom: 1ch;">
<div style="padding-right: 15px;">


<a class="prev" href="/atari-jaguar-sdk/2023/11/27/introduction.html">&laquo; Porting the Atari Jaguar SDK: introduction</a>

</div>
<div style="padding-left: 15px; text-align: right;">


<a href="/atari-jaguar-sdk/2023/12/11/part-2.html">Porting the Atari Jaguar SDK part 2: abusing Ghidra's version tracking tool for fun and profit &raquo;</a>

</div>
</div>]]></content><author><name>Jean-Baptiste Boric</name></author><category term="atari-jaguar-sdk" /><summary type="html"><![CDATA[After laying out the context for this series of articles, let’s begin our journey on how to make software ports of programs with no source code. Our case study will be aln, the Atari linker for the Atari Jaguar ; specifically, its original Linux port in the a.out executable file format for the 32-bit x86 architecture.]]></summary></entry><entry><title type="html">Porting the Atari Jaguar SDK: introduction</title><link href="https://boricj.net/atari-jaguar-sdk/2023/11/27/introduction.html" rel="alternate" type="text/html" title="Porting the Atari Jaguar SDK: introduction" /><published>2023-11-27T13:00:00+01:00</published><updated>2023-11-27T13:00:00+01:00</updated><id>https://boricj.net/atari-jaguar-sdk/2023/11/27/introduction</id><content type="html" xml:base="https://boricj.net/atari-jaguar-sdk/2023/11/27/introduction.html"><![CDATA[<p>The Atari Jaguar is a video game console marketed from 1993 to 1996, with an active homebrew community.
The original SDK provided by Atari was last updated in the mid-1990s and the source code for these programs was never released.</p>

<p>Over time, the homebrew community has developed <a href="http://rmac.is-slick.com/">modern reimplementations</a> for some of these tools, but there are still extent use-cases for <a href="https://lore.kernel.org/lkml/4c449fab-8135-5057-7d2c-7b948ce130cc@theinnocuous.com/">running the original ones</a>.
Keeping these ancient tools running as-is on modern systems is proving to be increasingly difficult as time goes on and requires an ever-increasing number of workarounds to pull off:</p>
<ul>
  <li>The default <code class="language-plaintext highlighter-rouge">mmap()</code> minimum address has been bumped to 65536, which is too low for a.out executables ; fixing this requires lowering <code class="language-plaintext highlighter-rouge">vm.mmap_min_addr</code>, a privileged operation.</li>
  <li>The Linux kernel dropped support for the traditional a.out executable format in <a href="https://lore.kernel.org/lkml/20220113160115.5375-1-bp@alien8.de/">2022</a>, requiring <a href="https://github.com/cubanismo/jaguar-sdk/commit/22abac7e0b0e83871e413c328d43ec4fc577036c">a user-land utility</a> to act as an a.out executable loader/shim.</li>
</ul>

<p>In this series of articles, we’ll explore an application of the delinking reverse-engineering technique <a href="/reverse-engineering/2023/05/01/introduction.html">previously explained here</a>: to make software ports of programs without having access to the original source code.</p>

<div class="box box-information">
  <p>The files for this case study can be found here: <a href="/atari-jaguar-sdk/assets/case-study.tar.gz" download="">case-study.tar.gz</a></p>
</div>

<ul>
  
    <li>
      <a href="/atari-jaguar-sdk/2023/11/27/introduction.html">Porting the Atari Jaguar SDK: introduction</a>
    </li>
  
    <li>
      <a href="/atari-jaguar-sdk/2023/12/04/part-1.html">Porting the Atari Jaguar SDK part 1: loading executables into Ghidra the hard way</a>
    </li>
  
    <li>
      <a href="/atari-jaguar-sdk/2023/12/11/part-2.html">Porting the Atari Jaguar SDK part 2: abusing Ghidra's version tracking tool for fun and profit</a>
    </li>
  
    <li>
      <a href="/atari-jaguar-sdk/2023/12/18/part-3.html">Porting the Atari Jaguar SDK part 3: makin' elves</a>
    </li>
  
    <li>
      <a href="/atari-jaguar-sdk/2024/01/01/part-4.html">Porting the Atari Jaguar SDK part 4: where we're going, we don't need the C standard library</a>
    </li>
  
    <li>
      <a href="/atari-jaguar-sdk/2024/01/02/part-5.html">Porting the Atari Jaguar SDK part 5: I have a feeling we're not on Linux anymore</a>
    </li>
  
</ul>

<div style="display: grid; grid-template-columns: 1fr 1fr; margin-top: 1ch; margin-bottom: 1ch;">
<div style="padding-right: 15px;">

</div>
<div style="padding-left: 15px; text-align: right;">


<a href="/atari-jaguar-sdk/2023/12/04/part-1.html">Porting the Atari Jaguar SDK part 1: loading executables into Ghidra the hard way &raquo;</a>

</div>
</div>]]></content><author><name>Jean-Baptiste Boric</name></author><category term="atari-jaguar-sdk" /><summary type="html"><![CDATA[The Atari Jaguar is a video game console marketed from 1993 to 1996, with an active homebrew community. The original SDK provided by Atari was last updated in the mid-1990s and the source code for these programs was never released.]]></summary></entry><entry><title type="html">The taxonomy of delinking, or how I misnamed the unlinking technique</title><link href="https://boricj.net/2023/10/02/taxonomy-delinking.html" rel="alternate" type="text/html" title="The taxonomy of delinking, or how I misnamed the unlinking technique" /><published>2023-10-02T14:00:00+02:00</published><updated>2023-10-02T14:00:00+02:00</updated><id>https://boricj.net/2023/10/02/taxonomy-delinking</id><content type="html" xml:base="https://boricj.net/2023/10/02/taxonomy-delinking.html"><![CDATA[<p>It occurred to me that I’ve incorrectly named the main technique shown in my <a href="/reverse-engineering/2023/05/01/introduction.html">series of articles about reverse-engineering</a>.
To explain why, let’s take this diagram from <a href="/reverse-engineering/2023/05/15/part-2.html">part 2</a>:</p>
<div class="fullwindow-zoom-click">
    <label>
        <input type="checkbox" onclick="onclick_fullWindowZoom(event)" />
        <div class="fullwindow-zoom-click-content">
            <figure>
                <img src="/reverse-engineering/assets/diagram-case-study-build-workflow.svg" />
                <figcaption>Figure 1: Diagram of the case study build workflow.</figcaption>
            </figure>
        </div>
    </label>
</div>

<ul>
  <li>First, the source code gets <strong>compiled</strong> into an assembly file ;</li>
  <li>Then, the assembly file gets <strong>assembled</strong> into an object file ;</li>
  <li>Finally, the object files get <strong>linked</strong> into an executable file.</li>
</ul>

<p>Since one <strong>decompiles</strong> an artifact to get source code and one <strong>disassembles</strong> an artifact to get assembly code, it would be logical that one <strong>delinks</strong> an artifact to get object files ; yet I had been calling that operation <em>unlinking</em> until recently.</p>

<p><em>D’oh!</em></p>

<p>I’ve corrected all instances of this verb in this blog and renamed <a href="https://github.com/boricj/ghidra-delinker-extension">my Ghidra extension</a> accordingly.
I’m sorry in advance if that made a mess of the RSS feed ; hopefully I didn’t also make a mess of the terminology too since there are scant few resources online about it.</p>]]></content><author><name>Jean-Baptiste Boric</name></author><summary type="html"><![CDATA[It occurred to me that I’ve incorrectly named the main technique shown in my series of articles about reverse-engineering. To explain why, let’s take this diagram from part 2: Figure 1: Diagram of the case study build workflow.]]></summary></entry></feed>