Reverse-engineering part 4: running our case study
Previously in this series of articles, we’ve analyzed a number of binary artifacts by using the various tools of the toolchain. Now, it’s high time we actually run the case study.
A reminder on computer security: be mindful of the origin and purpose of any program you plan to execute during its reverse-engineering and exercise caution if necessary. For example, you most likely do not want to launch a computer virus you are trying to study directly on your own computer, or at least without taking the necessary precautions to do so in a safe manner.
Running the case study
Our case study is a Linux executable for the little-endian, 32-bit MIPS architecture. While it can be run directly on such a platform, it is fairly uncommon and we’ll assume we do not have access to a suitable physical machine to run this executable on. Instead, we will leverage QEMU’s user-mode emulation to run this Linux executable on any Linux machine, regardless of the host platform.
First, we need to install QEMU’s user-mode emulators in our reverse-engineering environment:
$ sudo apt-get install \
qemu-user
Once installed, we can run our case study executable by invoking the command qemu-mipsel ascii-table.elf
, which yields the following output:
$ qemu-mipsel ascii-table.elf
0000 c 0020 p s 0040 @ gp ! 0060 ` gp !
0001 c 0021 ! gp ! 0041 A gp Aa U 0061 a gp Aa l
0002 c 0022 " gp ! 0042 B gp Aa U 0062 b gp Aa l
0003 c 0023 # gp ! 0043 C gp Aa U 0063 c gp Aa l
0004 c 0024 $ gp ! 0044 D gp Aa U 0064 d gp Aa l
0005 c 0025 % gp ! 0045 E gp Aa U 0065 e gp Aa l
0006 c 0026 & gp ! 0046 F gp Aa U 0066 f gp Aa l
0007 c 0027 ' gp ! 0047 G gp Aa U 0067 g gp Aa l
0008 c 0028 ( gp ! 0048 H gp Aa U 0068 h gp Aa l
0009 cs 0029 ) gp ! 0049 I gp Aa U 0069 i gp Aa l
000a cs 002a * gp ! 004a J gp Aa U 006a j gp Aa l
000b cs 002b + gp ! 004b K gp Aa U 006b k gp Aa l
000c cs 002c , gp ! 004c L gp Aa U 006c l gp Aa l
000d cs 002d - gp ! 004d M gp Aa U 006d m gp Aa l
000e c 002e . gp ! 004e N gp Aa U 006e n gp Aa l
000f c 002f / gp ! 004f O gp Aa U 006f o gp Aa l
0010 c 0030 0 gp A d 0050 P gp Aa U 0070 p gp Aa l
0011 c 0031 1 gp A d 0051 Q gp Aa U 0071 q gp Aa l
0012 c 0032 2 gp A d 0052 R gp Aa U 0072 r gp Aa l
0013 c 0033 3 gp A d 0053 S gp Aa U 0073 s gp Aa l
0014 c 0034 4 gp A d 0054 T gp Aa U 0074 t gp Aa l
0015 c 0035 5 gp A d 0055 U gp Aa U 0075 u gp Aa l
0016 c 0036 6 gp A d 0056 V gp Aa U 0076 v gp Aa l
0017 c 0037 7 gp A d 0057 W gp Aa U 0077 w gp Aa l
0018 c 0038 8 gp A d 0058 X gp Aa U 0078 x gp Aa l
0019 c 0039 9 gp A d 0059 Y gp Aa U 0079 y gp Aa l
001a c 003a : gp ! 005a Z gp Aa U 007a z gp Aa l
001b c 003b ; gp ! 005b [ gp ! 007b { gp !
001c c 003c < gp ! 005c \ gp ! 007c | gp !
001d c 003d = gp ! 005d ] gp ! 007d } gp !
001e c 003e > gp ! 005e ^ gp ! 007e ~ gp !
001f c 003f ? gp ! 005f _ gp ! 007f c
Debugging the case study
Running the case study is useful, but we can only observe its visible side-effects when doing so (namely, printing out the ASCII table). If we want to observe the inner workings of the case study as it executes, we’ll need to debug it.
Let’s install a suitable debugger:
$ sudo apt-get install \
gdb-multiarch
Ordinarily we can use the debugger directly on a process, but the program being run from the point of view of the reverse-engineering environment is qemu-mipsel
, not ascii-table.elf
(which itself is being interpreted by qemu-mipsel
).
To debug ascii-table.elf
itself and not qemu-mipsel
, we need to leverage QEMU’s GDB stub server and connect our debugger to it.
Setting up the debugger and the debuggee
First, we need to launch qemu-mipsel
with the GDB stub enabled:
$ qemu-mipsel -g 1234 ./ascii-table.elf
We can observe that the process does not seem to be doing anything for now.
What actually happened is that qemu-mipsel
is waiting for a debugger to connect to its GDB stub, before starting the emulation of ascii-table.elf
.
Let’s launch the debugger:
$ gdb-multiarch ./ascii-table.elf
GNU gdb (Debian 10.1-1.7) 10.1.90.20210103-git
Copyright (C) 2021 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./ascii-table.elf...
(gdb)
Then, let’s connect the debugger to qemu-mipsel
’s GDB stub:
(gdb) set architecture mips:isa32r2
The target architecture is set to "mips:isa32r2".
(gdb) target remote :1234
Remote debugging using :1234
__start () at libstd.c:45
45 int r = main();
(gdb)
We can now debug ascii-table.elf
.
Stepping through
The execution is paused at the entry point of the executable. Let’s ask GDB to disassemble the current function:
(gdb) disassemble
Dump of assembler code for function __start:
=> 0x00400568 <+0>: addiu sp,sp,-24
0x0040056c <+4>: sw ra,20(sp)
0x00400570 <+8>: jal 0x400130 <main>
0x00400574 <+12>: nop
0x00400578 <+16>: jal 0x400560 <exit>
0x0040057c <+20>: move a0,v0
End of assembler dump.
(gdb)
GDB indicates the current instruction pointer (i.e. the next machine instruction to be run) in the disassembly with the =>
symbol.
Let’s step three instructions forward until we jump into the main()
function:
(gdb) stepi
0x0040056c 45 int r = main();
(gdb)
0x00400570 45 int r = main();
(gdb)
main () at ascii-table.c:58
58 for (int i = 0; i < 128; i++) {
(gdb)
We are now at the beginning of the main()
function.
Let’s add a breakpoint for the print_ascii_entry()
function so that the debugger will pause execution when we enter it, then continue the execution of the program:
(gdb) break print_ascii_entry
Breakpoint 1 at 0x40025c: file ascii-table.c, line 38.
(gdb) continue
Continuing.
Breakpoint 1, print_ascii_entry (character=0 '\000', properties=0x400584 <s_ascii_properties>, num_ascii_properties=10) at ascii-table.c:38
38 print_number(character);
(gdb)
The debugger paused the execution because the execution flow reached the breakpoint for the print_ascii_entry()
function.
Let’s finish the execution of the current function:
(gdb) finish
Run till exit from #0 print_ascii_entry (character=0 '\000', properties=0x400584 <s_ascii_properties>, num_ascii_properties=10) at ascii-table.c:38
main () at ascii-table.c:65
65 putchar(i % COLUMNS == COLUMNS - 1 ? '\n' : '\t');
(gdb)
We can observe that the program printed the following on the console, which is the first entry of the ASCII table.
0000 c
Let’s remove the breakpoint for the print_ascii_table()
function, add a breakpoint for the print_number()
function and continue execution:
(gdb) delete 1
(gdb) break print_number
Breakpoint 2 at 0x4001d0: file ascii-table.c, line 27.
(gdb) c
Continuing.
Breakpoint 2, print_number (num=32) at ascii-table.c:27
27 for (int n = 3; n >= 0; n--) {
(gdb)
The debugger paused the execution at the first instruction inside the print_number()
function.
Let’s resume execution forty times, just before the number for the ASCII character *
is printed:
(gdb) continue 40
Will ignore next 39 crossings of breakpoint 2. Continuing.
Breakpoint 2, print_number (num=42) at ascii-table.c:27
27 for (int n = 3; n >= 0; n--) {
(gdb)
The print_number() function
Let’s disassemble the print_number()
function alongside the source code (this is possible because we have the debugging symbols for the executable):
(gdb) disassemble /s
Dump of assembler code for function print_number:
ascii-table.c:
27 for (int n = 3; n >= 0; n--) {
=> 0x004001d0 <+0>: addiu sp,sp,-48
0x004001d4 <+4>: sw s3,40(sp)
0x004001d8 <+8>: li s3,12
0x004001dc <+12>: sw s2,36(sp)
0x004001e0 <+16>: li s2,-4
26 void print_number(int num) {
0x004001e4 <+20>: sw s1,32(sp)
28 int digit = (num >> (4 * n)) % 16;
0x004001e8 <+24>: li s1,16
26 void print_number(int num) {
0x004001ec <+28>: sw s0,28(sp)
0x004001f0 <+32>: move s0,a0
0x004001f4 <+36>: sw ra,44(sp)
28 int digit = (num >> (4 * n)) % 16;
0x004001f8 <+40>: srav v0,s0,s3
0x004001fc <+44>: div zero,v0,s1
0x00400200 <+48>: mfhi v0
29
30 if (digit < 10)
0x00400204 <+52>: andi v1,v0,0xff
0x00400208 <+56>: slti v0,v0,10
0x0040020c <+60>: beqz v0,0x400254 <print_number+132>
0x00400210 <+64>: nop
31 putchar('0' + digit);
0x00400214 <+68>: addiu v1,v1,48
32 else
33 putchar('a' + digit - 10);
0x00400218 <+72>: li a2,1
0x0040021c <+76>: sb v1,16(sp)
0x00400220 <+80>: addiu a1,sp,16
0x00400224 <+84>: li a0,1
0x00400228 <+88>: jal 0x400550 <write>
0x0040022c <+92>: addiu s3,s3,-4
27 for (int n = 3; n >= 0; n--) {
0x00400230 <+96>: bne s3,s2,0x4001fc <print_number+44>
0x00400234 <+100>: srav v0,s0,s3
34 }
35 }
0x00400238 <+104>: lw ra,44(sp)
0x0040023c <+108>: lw s3,40(sp)
0x00400240 <+112>: lw s2,36(sp)
0x00400244 <+116>: lw s1,32(sp)
0x00400248 <+120>: lw s0,28(sp)
0x0040024c <+124>: jr ra
0x00400250 <+128>: addiu sp,sp,48
33 putchar('a' + digit - 10);
0x00400254 <+132>: b 0x400218 <print_number+72>
0x00400258 <+136>: addiu v1,v1,87
End of assembler dump.
(gdb)
We can observe the MIPS calling convention in action when the write()
function gets called at address 0x00400228
.
Here, the first three parameters are passed by registers:
Register | write() function parameter |
Instruction writing it |
---|---|---|
a0 |
Number of bytes to write | 0x00400224 <+84>: li a0,1 |
a1 |
Pointer to buffer to write out | 0x00400220 <+80>: addiu a1,sp,16 |
a2 |
File descriptor to use | 0x00400218 <+72>: li a2,1 |
The iterations of the for() loop
Just before the write()
function gets called, the following registers also contain these values:
Register | Purpose | Constant? |
---|---|---|
v0 |
Set to one if the current digit is less than 10 | No |
v1 |
ASCII character of the current digit | No |
s0 |
Number to print (num ) |
Yes |
s1 |
Value 16 (radix or base of hexadecimal) |
Yes |
s2 |
Value -4 (added to counter after an iteration) |
Yes |
s3 |
Iteration counter | No |
We’ll study the iterations of the for
loop at this point, first by setting up two breakpoints, one for write()
and one at the start of print_number()
’s epilogue:
(gdb) b *0x00400238
Breakpoint 3 at 0x400238: file ascii-table.c, line 35.
(gdb) b write
Breakpoint 4 at 0x400550: file libstd.c, line 13.
(gdb)
Then, we will display on each break the value of the registers in the previous table:
(gdb) display $v0
1: $v0 = 10
(gdb) display/c $v1
2: /c $v1 = -1 '\377'
(gdb) display/x $s0
3: /x $s0 = 0x400584
(gdb) display $s1
4: $s1 = 42
(gdb) display $s2
5: $s2 = 10
(gdb) display $s3
6: $s3 = 32
(gdb)
We’ll ignore the current register values printed since we haven’t started executing the function yet. Finally, we’ll step through the iterations of the loop:
(gdb) c
Continuing.
Breakpoint 3, write (fd=1, buf=0x40800198, len=1) at libstd.c:13
13 register int v0 asm("v0") = __NR_write;
1: $v0 = 1
2: /c $v1 = 48 '0'
3: /x $s0 = 0x2a
4: $s1 = 16
5: $s2 = -4
6: $s3 = 8
(gdb) c
Continuing.
Breakpoint 3, write (fd=1, buf=0x40800198, len=1) at libstd.c:13
13 register int v0 asm("v0") = __NR_write;
1: $v0 = 1
2: /c $v1 = 48 '0'
3: /x $s0 = 0x2a
4: $s1 = 16
5: $s2 = -4
6: $s3 = 4
(gdb) c
Continuing.
Breakpoint 3, write (fd=1, buf=0x40800198, len=1) at libstd.c:13
13 register int v0 asm("v0") = __NR_write;
1: $v0 = 1
2: /c $v1 = 50 '2'
3: /x $s0 = 0x2a
4: $s1 = 16
5: $s2 = -4
6: $s3 = 0
(gdb) c
Continuing.
Breakpoint 3, write (fd=1, buf=0x40800198, len=1) at libstd.c:13
13 register int v0 asm("v0") = __NR_write;
1: $v0 = 0
2: /c $v1 = 97 'a'
3: /x $s0 = 0x2a
4: $s1 = 16
5: $s2 = -4
6: $s3 = -4
(gdb) c
Continuing.
Breakpoint 4, 0x00400238 in print_number (num=42) at ascii-table.c:35
35 }
1: $v0 = 0
2: /c $v1 = 97 'a'
3: /x $s0 = 0x2a
4: $s1 = 16
5: $s2 = -4
6: $s3 = -4
(gdb)
Synthesizing the output in the table yield this execution trace:
Iteration | v0 |
v1 |
s0 |
s1 |
s2 |
s3 |
---|---|---|---|---|---|---|
0 | 1 |
48 '0' |
0x2a |
16 |
-4 |
8 |
1 | 1 |
48 '0' |
0x2a |
16 |
-4 |
4 |
2 | 1 |
50 '2' |
0x2a |
16 |
-4 |
0 |
3 | 0 |
97 'a' |
0x2a |
16 |
-4 |
-4 |
The compiler optimized the usage of the loop counter variable n
to also double duty as the number of bits to shift right per iteration.
This is why the s3
register counts down from 8 to -4 in decrements of 4 instead of counting down from 3 to 0 in decrements of 1, as originally written in the source code.
The reason we observe s3
counting down from 8 and not 12 (as initialized at address 0x004001d8
by the instruction li s3,12
) is because we’re not observing the iterations at the beginning of the loop, but rather in the middle of it, just before write()
is called.
We have collected all the information we wanted with this debugging session, we can therefore quit it:
(gdb) quit
A debugging session is active.
Inferior 1 [process 1] will be killed.
Quit anyway? (y or n) y
$
The files for this case study can be found here: case-study.tar.gz
Conclusion
We have learned how to run our executable with QEMU as well as the basics of debugging with GDB.
We’ve also observed how the for()
loop within the print_number()
function iterates over the digits to print a hexadecimal number.
Next time, we will modify ascii-table.elf
to change the behavior of the print_number()
function through the process of binary patching, without recompiling or relinking the executable itself.