Previously in this series of articles, we’ve analyzed a number of binary artifacts by using the various tools of the toolchain. Now, it’s high time we actually run the case study.

A reminder on computer security: be mindful of the origin and purpose of any program you plan to execute during its reverse-engineering and exercise caution if necessary. For example, you most likely do not want to launch a computer virus you are trying to study directly on your own computer, or at least without taking the necessary precautions to do so in a safe manner.

Running the case study

Our case study is a Linux executable for the little-endian, 32-bit MIPS architecture. While it can be run directly on such a platform, it is fairly uncommon and we’ll assume we do not have access to a suitable physical machine to run this executable on. Instead, we will leverage QEMU’s user-mode emulation to run this Linux executable on any Linux machine, regardless of the host platform.

First, we need to install QEMU’s user-mode emulators in our reverse-engineering environment:

$ sudo apt-get install \
    qemu-user

Once installed, we can run our case study executable by invoking the command qemu-mipsel ascii-table.elf, which yields the following output:

$ qemu-mipsel ascii-table.elf
0000     c              0020    p s             0040 @ gp  !            0060 ` gp  !     
0001     c              0021 ! gp  !            0041 A gp   Aa U        0061 a gp   Aa  l
0002     c              0022 " gp  !            0042 B gp   Aa U        0062 b gp   Aa  l
0003     c              0023 # gp  !            0043 C gp   Aa U        0063 c gp   Aa  l
0004     c              0024 $ gp  !            0044 D gp   Aa U        0064 d gp   Aa  l
0005     c              0025 % gp  !            0045 E gp   Aa U        0065 e gp   Aa  l
0006     c              0026 & gp  !            0046 F gp   Aa U        0066 f gp   Aa  l
0007     c              0027 ' gp  !            0047 G gp   Aa U        0067 g gp   Aa  l
0008     c              0028 ( gp  !            0048 H gp   Aa U        0068 h gp   Aa  l
0009     cs             0029 ) gp  !            0049 I gp   Aa U        0069 i gp   Aa  l
000a     cs             002a * gp  !            004a J gp   Aa U        006a j gp   Aa  l
000b     cs             002b + gp  !            004b K gp   Aa U        006b k gp   Aa  l
000c     cs             002c , gp  !            004c L gp   Aa U        006c l gp   Aa  l
000d     cs             002d - gp  !            004d M gp   Aa U        006d m gp   Aa  l
000e     c              002e . gp  !            004e N gp   Aa U        006e n gp   Aa  l
000f     c              002f / gp  !            004f O gp   Aa U        006f o gp   Aa  l
0010     c              0030 0 gp   A d         0050 P gp   Aa U        0070 p gp   Aa  l
0011     c              0031 1 gp   A d         0051 Q gp   Aa U        0071 q gp   Aa  l
0012     c              0032 2 gp   A d         0052 R gp   Aa U        0072 r gp   Aa  l
0013     c              0033 3 gp   A d         0053 S gp   Aa U        0073 s gp   Aa  l
0014     c              0034 4 gp   A d         0054 T gp   Aa U        0074 t gp   Aa  l
0015     c              0035 5 gp   A d         0055 U gp   Aa U        0075 u gp   Aa  l
0016     c              0036 6 gp   A d         0056 V gp   Aa U        0076 v gp   Aa  l
0017     c              0037 7 gp   A d         0057 W gp   Aa U        0077 w gp   Aa  l
0018     c              0038 8 gp   A d         0058 X gp   Aa U        0078 x gp   Aa  l
0019     c              0039 9 gp   A d         0059 Y gp   Aa U        0079 y gp   Aa  l
001a     c              003a : gp  !            005a Z gp   Aa U        007a z gp   Aa  l
001b     c              003b ; gp  !            005b [ gp  !            007b { gp  !     
001c     c              003c < gp  !            005c \ gp  !            007c | gp  !     
001d     c              003d = gp  !            005d ] gp  !            007d } gp  !     
001e     c              003e > gp  !            005e ^ gp  !            007e ~ gp  !     
001f     c              003f ? gp  !            005f _ gp  !            007f     c       

Debugging the case study

Running the case study is useful, but we can only observe its visible side-effects when doing so (namely, printing out the ASCII table). If we want to observe the inner workings of the case study as it executes, we’ll need to debug it.

Let’s install a suitable debugger:

$ sudo apt-get install \
    gdb-multiarch

Ordinarily we can use the debugger directly on a process, but the program being run from the point of view of the reverse-engineering environment is qemu-mipsel, not ascii-table.elf (which itself is being interpreted by qemu-mipsel). To debug ascii-table.elf itself and not qemu-mipsel, we need to leverage QEMU’s GDB stub server and connect our debugger to it.

Setting up the debugger and the debuggee

First, we need to launch qemu-mipsel with the GDB stub enabled:

$ qemu-mipsel -g 1234 ./ascii-table.elf

We can observe that the process does not seem to be doing anything for now. What actually happened is that qemu-mipsel is waiting for a debugger to connect to its GDB stub, before starting the emulation of ascii-table.elf.

Let’s launch the debugger:

$ gdb-multiarch ./ascii-table.elf
GNU gdb (Debian 10.1-1.7) 10.1.90.20210103-git
Copyright (C) 2021 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./ascii-table.elf...
(gdb) 

Then, let’s connect the debugger to qemu-mipsel’s GDB stub:

(gdb) set architecture mips:isa32r2
The target architecture is set to "mips:isa32r2".
(gdb) target remote :1234
Remote debugging using :1234
__start () at libstd.c:45
45          int r = main();
(gdb) 

We can now debug ascii-table.elf.

Stepping through

The execution is paused at the entry point of the executable. Let’s ask GDB to disassemble the current function:

(gdb) disassemble 
Dump of assembler code for function __start:
=> 0x00400568 <+0>:     addiu   sp,sp,-24
   0x0040056c <+4>:     sw      ra,20(sp)
   0x00400570 <+8>:     jal     0x400130 <main>
   0x00400574 <+12>:    nop
   0x00400578 <+16>:    jal     0x400560 <exit>
   0x0040057c <+20>:    move    a0,v0
End of assembler dump.
(gdb) 

GDB indicates the current instruction pointer (i.e. the next machine instruction to be run) in the disassembly with the => symbol. Let’s step three instructions forward until we jump into the main() function:

(gdb) stepi
0x0040056c      45          int r = main();
(gdb) 
0x00400570      45          int r = main();
(gdb) 
main () at ascii-table.c:58
58          for (int i = 0; i < 128; i++) {
(gdb) 

We are now at the beginning of the main() function. Let’s add a breakpoint for the print_ascii_entry() function so that the debugger will pause execution when we enter it, then continue the execution of the program:

(gdb) break print_ascii_entry
Breakpoint 1 at 0x40025c: file ascii-table.c, line 38.
(gdb) continue
Continuing.

Breakpoint 1, print_ascii_entry (character=0 '\000', properties=0x400584 <s_ascii_properties>, num_ascii_properties=10) at ascii-table.c:38
38          print_number(character);
(gdb) 

The debugger paused the execution because the execution flow reached the breakpoint for the print_ascii_entry() function. Let’s finish the execution of the current function:

(gdb) finish
Run till exit from #0  print_ascii_entry (character=0 '\000', properties=0x400584 <s_ascii_properties>, num_ascii_properties=10) at ascii-table.c:38
main () at ascii-table.c:65
65              putchar(i % COLUMNS == COLUMNS - 1 ? '\n' : '\t');
(gdb) 

We can observe that the program printed the following on the console, which is the first entry of the ASCII table.

0000     c       

Let’s remove the breakpoint for the print_ascii_table() function, add a breakpoint for the print_number() function and continue execution:

(gdb) delete 1
(gdb) break print_number
Breakpoint 2 at 0x4001d0: file ascii-table.c, line 27.
(gdb) c
Continuing.

Breakpoint 2, print_number (num=32) at ascii-table.c:27
27          for (int n = 3; n >= 0; n--) {
(gdb) 

The debugger paused the execution at the first instruction inside the print_number() function. Let’s resume execution forty times, just before the number for the ASCII character * is printed:

(gdb) continue 40
Will ignore next 39 crossings of breakpoint 2.  Continuing.

Breakpoint 2, print_number (num=42) at ascii-table.c:27
27          for (int n = 3; n >= 0; n--) {
(gdb)

The print_number() function

Let’s disassemble the print_number() function alongside the source code (this is possible because we have the debugging symbols for the executable):

(gdb) disassemble /s
Dump of assembler code for function print_number:
ascii-table.c:
27          for (int n = 3; n >= 0; n--) {
=> 0x004001d0 <+0>:     addiu   sp,sp,-48
   0x004001d4 <+4>:     sw      s3,40(sp)
   0x004001d8 <+8>:     li      s3,12
   0x004001dc <+12>:    sw      s2,36(sp)
   0x004001e0 <+16>:    li      s2,-4

26      void print_number(int num) {
   0x004001e4 <+20>:    sw      s1,32(sp)

28              int digit = (num >> (4 * n)) % 16;
   0x004001e8 <+24>:    li      s1,16

26      void print_number(int num) {
   0x004001ec <+28>:    sw      s0,28(sp)
   0x004001f0 <+32>:    move    s0,a0
   0x004001f4 <+36>:    sw      ra,44(sp)

28              int digit = (num >> (4 * n)) % 16;
   0x004001f8 <+40>:    srav    v0,s0,s3
   0x004001fc <+44>:    div     zero,v0,s1
   0x00400200 <+48>:    mfhi    v0

29
30              if (digit < 10)
   0x00400204 <+52>:    andi    v1,v0,0xff
   0x00400208 <+56>:    slti    v0,v0,10
   0x0040020c <+60>:    beqz    v0,0x400254 <print_number+132>
   0x00400210 <+64>:    nop

31                  putchar('0' + digit);
   0x00400214 <+68>:    addiu   v1,v1,48

32              else
33                  putchar('a' + digit - 10);
   0x00400218 <+72>:    li      a2,1
   0x0040021c <+76>:    sb      v1,16(sp)
   0x00400220 <+80>:    addiu   a1,sp,16
   0x00400224 <+84>:    li      a0,1
   0x00400228 <+88>:    jal     0x400550 <write>
   0x0040022c <+92>:    addiu   s3,s3,-4

27          for (int n = 3; n >= 0; n--) {
   0x00400230 <+96>:    bne     s3,s2,0x4001fc <print_number+44>
   0x00400234 <+100>:   srav    v0,s0,s3

34          }
35      }
   0x00400238 <+104>:   lw      ra,44(sp)
   0x0040023c <+108>:   lw      s3,40(sp)
   0x00400240 <+112>:   lw      s2,36(sp)
   0x00400244 <+116>:   lw      s1,32(sp)
   0x00400248 <+120>:   lw      s0,28(sp)
   0x0040024c <+124>:   jr      ra
   0x00400250 <+128>:   addiu   sp,sp,48

33                  putchar('a' + digit - 10);
   0x00400254 <+132>:   b       0x400218 <print_number+72>
   0x00400258 <+136>:   addiu   v1,v1,87
End of assembler dump.
(gdb) 

We can observe the MIPS calling convention in action when the write() function gets called at address 0x00400228. Here, the first three parameters are passed by registers:

Register write() function parameter Instruction writing it
a0 Number of bytes to write 0x00400224 <+84>: li a0,1
a1 Pointer to buffer to write out 0x00400220 <+80>: addiu a1,sp,16
a2 File descriptor to use 0x00400218 <+72>: li a2,1

The iterations of the for() loop

Just before the write() function gets called, the following registers also contain these values:

Register Purpose Constant?
v0 Set to one if the current digit is less than 10 No
v1 ASCII character of the current digit No
s0 Number to print (num) Yes
s1 Value 16 (radix or base of hexadecimal) Yes
s2 Value -4 (added to counter after an iteration) Yes
s3 Iteration counter No

We’ll study the iterations of the for loop at this point, first by setting up two breakpoints, one for write() and one at the start of print_number()’s epilogue:

(gdb) b *0x00400238
Breakpoint 3 at 0x400238: file ascii-table.c, line 35.
(gdb) b write
Breakpoint 4 at 0x400550: file libstd.c, line 13.
(gdb) 

Then, we will display on each break the value of the registers in the previous table:

(gdb) display $v0
1: $v0 = 10
(gdb) display/c $v1
2: /c $v1 = -1 '\377'
(gdb) display/x $s0
3: /x $s0 = 0x400584
(gdb) display $s1
4: $s1 = 42
(gdb) display $s2
5: $s2 = 10
(gdb) display $s3
6: $s3 = 32
(gdb) 

We’ll ignore the current register values printed since we haven’t started executing the function yet. Finally, we’ll step through the iterations of the loop:

(gdb) c
Continuing.

Breakpoint 3, write (fd=1, buf=0x40800198, len=1) at libstd.c:13
13          register int v0 asm("v0") = __NR_write;
1: $v0 = 1
2: /c $v1 = 48 '0'
3: /x $s0 = 0x2a
4: $s1 = 16
5: $s2 = -4
6: $s3 = 8
(gdb) c
Continuing.

Breakpoint 3, write (fd=1, buf=0x40800198, len=1) at libstd.c:13
13          register int v0 asm("v0") = __NR_write;
1: $v0 = 1
2: /c $v1 = 48 '0'
3: /x $s0 = 0x2a
4: $s1 = 16
5: $s2 = -4
6: $s3 = 4
(gdb) c
Continuing.

Breakpoint 3, write (fd=1, buf=0x40800198, len=1) at libstd.c:13
13          register int v0 asm("v0") = __NR_write;
1: $v0 = 1
2: /c $v1 = 50 '2'
3: /x $s0 = 0x2a
4: $s1 = 16
5: $s2 = -4
6: $s3 = 0
(gdb) c
Continuing.

Breakpoint 3, write (fd=1, buf=0x40800198, len=1) at libstd.c:13
13          register int v0 asm("v0") = __NR_write;
1: $v0 = 0
2: /c $v1 = 97 'a'
3: /x $s0 = 0x2a
4: $s1 = 16
5: $s2 = -4
6: $s3 = -4
(gdb) c
Continuing.

Breakpoint 4, 0x00400238 in print_number (num=42) at ascii-table.c:35
35      }
1: $v0 = 0
2: /c $v1 = 97 'a'
3: /x $s0 = 0x2a
4: $s1 = 16
5: $s2 = -4
6: $s3 = -4
(gdb) 

Synthesizing the output in the table yield this execution trace:

Iteration v0 v1 s0 s1 s2 s3
0 1 48 '0' 0x2a 16 -4 8
1 1 48 '0' 0x2a 16 -4 4
2 1 50 '2' 0x2a 16 -4 0
3 0 97 'a' 0x2a 16 -4 -4

The compiler optimized the usage of the loop counter variable n to also double duty as the number of bits to shift right per iteration. This is why the s3 register counts down from 8 to -4 in decrements of 4 instead of counting down from 3 to 0 in decrements of 1, as originally written in the source code.

The reason we observe s3 counting down from 8 and not 12 (as initialized at address 0x004001d8 by the instruction li s3,12) is because we’re not observing the iterations at the beginning of the loop, but rather in the middle of it, just before write() is called.

We have collected all the information we wanted with this debugging session, we can therefore quit it:

(gdb) quit
A debugging session is active.

        Inferior 1 [process 1] will be killed.

Quit anyway? (y or n) y
$ 

The files for this case study can be found here: case-study.tar.gz

Conclusion

We have learned how to run our executable with QEMU as well as the basics of debugging with GDB. We’ve also observed how the for() loop within the print_number() function iterates over the digits to print an hexadecimal number. Next time, we will modify ascii-table.elf to change the behavior of the print_number() function through the process of binary patching, without recompiling or relinking the executable itself.