Previously in this series of articles, we turned aln, a statically-linked Linux a.out program, into a dynamically-linked Linux ELF program with the power of delinking. In this part, we’ll make a native port of aln to Windows, despite not having access to its source code.

This blog post demonstrates a complete disregard towards ABIs, computer engineering conventions and common sense in the name of science. Its contents may be dangerous to junior programmers, academics or the faint of the heart.

As such, this is your one and only warning to look away before I start butcherin’.

Windows? What’s that?

Normally, porting software requires access to its source code in order to build it for a new platform. Here, we do not have such luxuries and must make do with just a binary executable artifact for Linux. From the previous parts we have created aln.o, a “normal” Linux ELF object file of a complete program created with the power of delinking from the carcass of aln, an actual Linux program.

Here comes our first problem: how do we get from an ELF object file for Linux to a Windows executable?

For want of a COFF file

Traditionally, Windows toolchains use the COFF file format for objects. My Ghidra extension currently only has an object file exporter for the ELF file format. In theory, I would have to implement a full-blown COFF object file exporter in order to produce an object file that Windows toolchains can grok, but that’s a lot of work.

Fortunately for me, I’ve forsaken tradition in this series of articles and there is one toolchain targeting Windows out there that can ingest ELF object files: MinGW-w64.

$ sudo apt-get install mingw-w64

Don’t worry, the final artifact will be a bona fide PE executable for Windows, but I’ll take whatever shortcuts I can in order to finish this in a reasonable amount of time.

This is also why I’ll run aln.exe with WINE on Linux at first: I’m not going to bother dealing with Windows until I have to.

Now, we’ll take aln.o and the couple of stubs from the last part and just let’er rip:

$ i686-w64-mingw32-gcc -g -no-pie -fno-pic -o aln.exe aln.o ctype.o FUN_0000fba0.c DAT_0001de20.c
/usr/bin/i686-w64-mingw32-ld: aln.o: in function `FUN_00001584':
aln.o:(.text+0x51f): undefined reference to `IO_printf'
/usr/bin/i686-w64-mingw32-ld: aln.o:(.text+0x529): undefined reference to `puts'
/usr/bin/i686-w64-mingw32-ld: aln.o:(.text+0x533): undefined reference to `puts'
/usr/bin/i686-w64-mingw32-ld: aln.o:(.text+0x53d): undefined reference to `puts'
/usr/bin/i686-w64-mingw32-ld: aln.o:(.text+0x547): undefined reference to `puts'
/usr/bin/i686-w64-mingw32-ld: aln.o:(.text+0x551): undefined reference to `puts'
/usr/bin/i686-w64-mingw32-ld: aln.o:aln.o:(.text+0x55b): more undefined references to `puts' follow
...
/usr/bin/i686-w64-mingw32-ld: aln.o: in function `FUN_0000fa3c':
aln.o:(.text+0xe9b3): undefined reference to `malloc'
/usr/bin/i686-w64-mingw32-ld: /usr/lib/gcc/i686-w64-mingw32/10-win32/../../../../i686-w64-mingw32/lib/../lib/libmingw32.a(lib32_libmingw32_a-crt0_c.o): in function `main':
./build/i686-w64-mingw32-i686-w64-mingw32-crt/./mingw-w64-crt/crt/crt0_c.c:18: undefined reference to `WinMain@16'

Well, basically none of the undefined symbols inside aln.o matched anything inside MinGW-w64 and no, it’s not just because I’m using a Linux object file on a Windows toolchain.

When I originally exported aln.o, I actually stripped the leading underscore from all symbols. That’s useful when going from glibc 1.xx to glibc 2.xx since modern Linux toolchains don’t use an underscore prefix for C symbol names. Unfortunately, that’s not the case for MinGW-w64, which does prefix C symbols with an underscore.

Rather than re-exporting aln.o with the original symbol names left intact, I’ll just use objcopy to remove the leading underscore from all undefined symbols. Since I’ll have to make adjustments anyways, I might as well be lazy about it.

In theory, I’m trying to mix two different things that aren’t ABI compatible. In practice, the calling convention happens to be the same and aln uses a subset of POSIX that MinGW-w64 does provide, so I’ll start by aliasing equivalent symbols together and hope for the best.

Remember, the goal of this series is to trick aln into linking a sample program successfully on new environments. Anything goes as long as it works, no matter how wrong it is.

We’ll skip anything that’s just an underscore prefix away and go through the remaining troublemakers.

FUN_0000fba0() and DAT_0001de20

The linker doesn’t seem to find the stubs from the previous parts, despite the fact that I do provide these source files on the command line.

I’m not sure why, possibly a side-effect of mixing ELF and COFF object files, but I don’t care: I just want to smash the bits together. Let’s just cheese it by prefixing these symbols with an underscore:

Old symbol name New name (MinGW-w64)
DAT_0001de20 _DAT_0001de20
FUN_0000fba0 _FUN_0000fba0

index() & rindex()

MinGW-w64 doesn’t appear to provide these two functions:

char *index(const char *s, int c);
char *rindex(const char *s, int c);

These functions search for a character in an ASCIIZ string and come from 4.3BSD. They were marked legacy in POSIX.1-2001 and removed in POSIX.1-2008, with a recommendation to migrate to strchr() and strrchr() respectively:

char *strchr(const char *s, int c);
char *strrchr(const char *s, int c);

Luckily for us, these are drop-in compatible substitutions and MinGW-w64 provides these replacements. Let’s rename the symbols accordingly:

Old symbol name New name (MinGW-w64)
index _strchr
rindex _strrchr

stdout, fprintf() & Co.

How does the C standard I/O functions work under MinGW-w64? I could go spelunking the source code to find out, but I can just ask the toolchain directly by compiling a sample program and analyzing the artifact:

#include <stdio.h>

int main(int argc, char **argv) {
    fprintf(stdout, "Hello, world!\n");
    return 0;
}

Let’s build it with MinGW-w64 and disassemble the object file with objdump:

$ i686-w64-mingw32-gcc hello-world.c -c -o hello-world.o
$ i686-w64-mingw32-objdump --reloc --disassemble hello-world.o

hello-world.o:     file format pe-i386


Disassembly of section .text:

00000000 <_fprintf>:
   0:   55                      push   %ebp
   1:   89 e5                   mov    %esp,%ebp
   3:   83 ec 28                sub    $0x28,%esp
   6:   8d 45 10                lea    0x10(%ebp),%eax
   9:   89 45 f0                mov    %eax,-0x10(%ebp)
   c:   8b 45 f0                mov    -0x10(%ebp),%eax
   f:   89 44 24 08             mov    %eax,0x8(%esp)
  13:   8b 45 0c                mov    0xc(%ebp),%eax
  16:   89 44 24 04             mov    %eax,0x4(%esp)
  1a:   8b 45 08                mov    0x8(%ebp),%eax
  1d:   89 04 24                mov    %eax,(%esp)
  20:   e8 00 00 00 00          call   25 <_fprintf+0x25>
                        21: DISP32      ___mingw_vfprintf
  25:   89 45 f4                mov    %eax,-0xc(%ebp)
  28:   8b 45 f4                mov    -0xc(%ebp),%eax
  2b:   c9                      leave  
  2c:   c3                      ret    

0000002d <_main>:
  2d:   55                      push   %ebp
  2e:   89 e5                   mov    %esp,%ebp
  30:   83 e4 f0                and    $0xfffffff0,%esp
  33:   83 ec 10                sub    $0x10,%esp
  36:   e8 00 00 00 00          call   3b <_main+0xe>
                        37: DISP32      ___main
  3b:   c7 04 24 01 00 00 00    movl   $0x1,(%esp)
  42:   a1 00 00 00 00          mov    0x0,%eax
                        43: dir32       __imp____acrt_iob_func
  47:   ff d0                   call   *%eax
  49:   c7 44 24 04 00 00 00    movl   $0x0,0x4(%esp)
  50:   00 
                        4d: dir32       .rdata
  51:   89 04 24                mov    %eax,(%esp)
  54:   e8 a7 ff ff ff          call   0 <_fprintf>
  59:   b8 00 00 00 00          mov    $0x0,%eax
  5e:   c9                      leave  
  5f:   c3                      ret

In MinGW-w64 land, stdout appears to be a macro that expands to __imp____acrt_iob_func(1). The standard output stream is retrieved through a function call, unlike with glibc where it is a direct data reference.

Well, that’s not how glibc does things at all. No mere symbol renaming can get us out of this mess, this means it’s time to shim:

#include <fcntl.h>
#include <stdio.h>
#include <stdarg.h>
#include <sys/stat.h>
#include <unistd.h>

void* stdin_placeholder;
void* stdout_placeholder;
void* stderr_placeholder;

static FILE* normalize_file(FILE* fp)
{
    if ((void*)fp == &stdin_placeholder) {
        fp = stdin;
    } else if ((void*)fp == &stdout_placeholder) {
        fp = stdout;
    } else if ((void*)fp == &stderr_placeholder) {
        fp = stderr;
    }

    return fp;
}

int fprintf_shim(FILE* fp, const char* fmt, ...)
{
    va_list argptr;
    va_start(argptr, fmt);

    int ret = vfprintf(normalize_file(fp), fmt, argptr);
    va_end(argptr);

    return ret;
}

int fflush_shim(FILE* fp)
{
    return fflush(normalize_file(fp));
}

Instead of aliasing a symbol directly to a native MinGW-w64 one, we’ll redirect them to ad hoc implementations. With functions in particular, this allows up to bridge the gap between two incompatible ABIs, for example by replacing placeholder values with the real ones before forwarding to the underlying implementation:

Old symbol name New name (MinGW-w64)
stdin _stdin_placeholder
stdout _stdout_placeholder
stderr _stderr_placeholder
_IO_fflush _fflush_shim
_IO_fprintf _fprintf_shim

open()

At this point, the MinGW-w64 toolchain manages to link a Windows program:

$ file aln.o 
aln.o: ELF 32-bit LSB relocatable, Intel 80386, invalid version (SYSV), not stripped
$ i686-linux-gnu-objcopy --redefine-syms=glibc-1xx-to-mingw32.syms aln.o aln.mingw32.o
$ i686-w64-mingw32-gcc -g -no-pie -fno-pic -o aln.exe aln.mingw32.o ctype.o FUN_0000fba0.c DAT_0001de20.c mingw32-shims.c
$ file aln.exe 
aln.exe: PE32 executable (console) Intel 80386, for MS Windows

However, aln.exe can’t actually link the sample program from the Atari Jaguar community SDK:

$ rm -f jaghello.cof && make LINK="wine aln.exe" V=1
wine aln.exe -e -l -g2 -rd -a 4000 x x -v -o jaghello.cof startup.o jag.o
***********************************
*   ATARI LINKER (Mar 17 1995)    *
*  Adds from Atari version 1.11   *
*     and PC/DOS&Linux ports      *
*  Copyright 1993-95 Brainstorm.  *
** Copyright 1987-95 Atari Corp. **
***********************************
Output file is jaghello.cof
File read error on startup.o
Link aborted.
make: *** [Makefile:12: jaghello.cof] Error 1

After digging a bit, this is a triple case of subtle incompatibilities:

  • POSIX specifies a O_BINARY flag for open() to specify that we want to open a file in binary mode and aln doesn’t set it. On Linux there is no difference between text mode and binary mode, but that’s not the case on Windows (the file isn’t read correctly).
  • Even if we account for this, the values of the flags for the second argument of open() are different on Linux and MinGW-w64, leading to further problems down the road. POSIX mandates the behavior of these constants, but does not actually mandate their values.
  • Similarly, the values for the third argument (permissions) are different.

Therefore, even though the function signatures of open() happen to be compatible between the two platforms, we still need to shim it to deal with these mismatches:

int open_shim(const char *filename, int oflag, int pmode)
{
    int realoflag = (oflag & 00000003) | O_BINARY;
    if (oflag & 00000100) {
        realoflag = O_CREAT;
    }
    if (oflag & 00001000) {
        realoflag = O_TRUNC;
    }
    if (oflag & 00002000) {
        realoflag = O_APPEND;
    }
    return open(filename, realoflag, S_IREAD | S_IWRITE);
}
Old symbol name New name (MinGW-w64)
open _open_shim

Besides converting the bitflags, we’ll also fix the first issue by forcing O_BINARY in our shim. Technically this is a bug in aln and this should be fixed at the source (or rather binary patched at the artifact in this case), but I’m not going to bother touching aln.o unless I have to.

Throwing aln at the Windows

After the last round of mutilations, we finally have our chimera up and running:

$ rm -f jaghello.cof && make LINK="wine aln.exe" V=1
wine aln.exe -e -l -g2 -rd -a 4000 x x -v -o jaghello.cof startup.o jag.o
***********************************
*   ATARI LINKER (Mar 17 1995)    *
*  Adds from Atari version 1.11   *
*     and PC/DOS&Linux ports      *
*  Copyright 1993-95 Brainstorm.  *
** Copyright 1987-95 Atari Corp. **
***********************************
Output file is jaghello.cof

Sizes:   Text   Data    Bss   Syms
(hex)     3C0    410   FA54   1C00

Link complete.
$ sha256sum jaghello.cof
f9c8269cdc998de01c0ac7a3e815c16b7ced106e25f10f92a7078c722a220dbb  jaghello.cof

Who needs source code to make software ports anyways? Just throw the bits at the virtual address space and apply glue until everything sticks.

Yes, it does work on Windows too:

The files for this case study can be found here: case-study.tar.gz

Conclusion

We have ported aln from a Linux a.out executable to a native Windows PE executable. This concludes this series of articles about how to leverage the delinking technique to make software ports without having access to the original source code.

Together with my series of series of articles on reverse-engineering, by now hopefully I have demonstrated that delinking is a powerful and practical technique for reverse-engineering. What was an obscure ritual mastered by few is now within reach of the everyday reverse-engineer, thanks to my Ghidra extension which automates the process of crafting object files from bits of programs down to a couple of mouse clicks.