Making the basic buffer overflow demo more approachable

TLDR

I wrote up a basic buffer overflow to program redirection demo which attempts to improve upon the standard buffer overflow demo in the following ways:

Makes the target address a printable string (in this case "HACK").
Provides feedback on faults so that the students are capable of being successful with the assignment without a debugger.

The demo supports x86, x86_64, and ARM targets.

See it in action:

You can find the code here.

Introduction

I took a Computer Architecture course at Johns Hopkins toward my M.S. in Computer Science this term. The course had an optional final project. This blog post is adapted from that project.

My Professor, Dr. Kann, had a list of recommended projects. On the list was a project to rewrite their buffer overflow example for undergraduates. They had a stack buffer overflow example based in the MARS Simulator. He wanted an updated example which ran as a binary on ARM so that the students could use their Raspberry Pis which was also mindful of the fact that the some of the students might not be familiar with debuggers.

Motivation

The following is a perfectly valid buffer overflow demonstration. The function strcpy is unbounded while the buffer is bounded so we can overflow variables on the stack including the return address saed there. On a modern system it will need a flag to disable stack canaries (-fno-stack-protector), but otherwise this works fine.

int main(int argc, char* argv[]){
    char buf[16];
    strcpy(buf, argv[1]);
    return 0;
}

If we wanted to continue with this demo we might write a target function, expect the students to find the address, and then also expect students to know how to pipe the associated characters from the address into an argument field like this:

./bin `python -c "print('a'*20 + '\x01\x02\x03\x04'[::-1])"`

That's a lot of information to expect someone just starting to know. Next up let's see what happens if they run the binary with a string long enough to overflow the return address on the stack.

$ ./bin AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
Segmentation fault (core dumped)

So informative! Students may realize they did the wrong thing, but gain no information with these crash messages. Of course, students could use a debugger on the run or on the core file, but that's more than we should expect of a student seeing a buffer overflow for the first time.

In the following sections we address both of these problems.

Making the winning string typeable

If we leave it to the compiler an address of a function we may want to hit could be something like 0x08048911. Take a look at each series of bytes: 0x08, 0x04, 0x89, and 0x11 and find the corresponding key on your keyboard. I will assist you with an ASCII chart below. You'll find that none of these will appear on your keyboard: that is, they are not typeable. Most addresses will not be fully typeable. So we've defined our problem: we want an address at a fixed address and whose bytes are all typeable.

Let's start with putting a function at a fixed address. When I first came across this problem I thought it would be easy. The internet was littered with potential options:

mmap a memory region RWX, memcpy a function, and then call a function pointer at that address: link
- I discounted this option because I wanted the functionality in the binary itself to be clear and also for the address of the function to both be available in the symbols output and for that output at that address to be analyze-able in a disassembler.
Various options related to __attribute__((at(address))): link
- I didn't pursue this option because it seems to only be supported in specific embedded compilers and I wanted our system to be able to be built on a generally available compilers.
objcopy the data into the program: link
- objcopy is the traditional way to get binary data into a C program. It works pretty well and if I had thought about it before this next option I might have used it instead. In the end it's less fun than our actual solution so I'm not too worried about it.
Writing a custom linker script: link
- This was very close to what I needed, but unfortunately there is very little documentation on non-embedded customization of linker scripts. It's really not a common use case.

I decided to move forward with a custom linker script. I put some time into attempting to write something custom using the documentation I found online for embedded systems and the linker script docs themselves, but the issue is that the embedded systems writing custom linker scripts are far more simple than the actual requirements for a semi-functioning linux binary. There always seemed to be an issue some section or issue I forgot.

It was around that time that I realized that my machine must be using some default linker script that I could just modify somehow. So I started attempting to locate my linker script. I searched high and low for .ld scripts on my machine: "surely it must be somewhere!"

Eventually I gave up and googled it and I found out a couple things: first, people are a bit frustrated by the lack of linker documentation and that the binary ld, the linker itself, would produce for me its linker script if I asked nicely.

If you run ld --verbose (and add -m to specify machine type) it will produce something like:

GNU ld (GNU Binutils for Ubuntu) 2.30
  Supported emulations:
   elf_x86_64
   ...
   using internal linker script:
==================================================
/* Script for -z combreloc: combine and sort reloc sections */
/* Copyright (C) 2014-2018 Free Software Foundation, Inc.
   Copying and distribution of this script, with or without modification,
   are permitted in any medium without royalty provided the copyright
   notice and this notice are preserved.  */
OUTPUT_FORMAT("elf32-i386", "elf32-i386",
              "elf32-i386")
OUTPUT_ARCH(i386)
ENTRY(_start)

If we tee this to a file and edit it a bit we can see what our system is actually doing under the hood. We see things like ENTRY(_start), which is the declaration that the _start symbol is the actual entry point for our binary. We can see the definition of the text segment relative to the program start and other sections.

PROVIDE (__executable_start = SEGMENT_START("text-segment", 0x08048000)); 
. = SEGMENT_START("text-segment", 0x08048000) + SIZEOF_HEADERS;
.interp         : { *(.interp) }
.note.gnu.build-id : { *(.note.gnu.build-id) }
.hash           : { *(.hash) }
.gnu.hash       : { *(.gnu.hash) }

For those of you who have not had the displeasure of looking through linker scripts I'll provide a very basic primer. There are commands such as SECTIONS, MEMORY, PHDRS, and VERSION which define, as you might guess, the sections and memory and whatnot. We'll only be dealing with the SECTIONS command today. There is also the concept of the '.' variable. This is the location counter: it's value is updated as assignments are made as the linker traverses the command from the top to the bottom.

We would like to put code at a custom address so we need a new section containing just our code. Adding to our linker script is a little bit more difficult that we might expect. We do not want to put our custom values at the beginning because we don't want to disrupt the rest of the binary. We also do not want it at very end of the SECTION command because that is where debug information that is not loaded is placed.

So we place our section declaration right before the _end symbol, set the location tracker to a custom value, and then use the linker syntax to define a custom section.

  . = 0x4b434148;
  .win_sec          : { *(.win_sec) }
  _end = .; PROVIDE (end = .);
  . = DATA_SEGMENT_END (.);

This works well enough. It creates a section, but our winning code is not yet pointing to that section. We need to point our assembly to our new section. In ARM we add:

.section .win_sec, "ax"

Next, we just had to pick a custom string that fit some criteria. In particular, the string is an address so the last byte needs to line up with instructions. On little endian systems the string is read into a word and flipped so it's actually the first character of the custom string that needs to align to boundaries. To support 64-bit machines I made sure that this first character was divisible by 8.

ASCII characters divisible by 8 are pretty limited. There are 3: 'H', 'P', and 'X'. Of course, we jumped on 'H' and chose our string to be HACK. A valid solution to our exercise is:

$ ./demo_amd64 aaaaaaaaaaaaaaaaaaaaaaaaHACK
Hello to the string consumer 3000!
You provided me a string! Yum!
  _____
 /     \
| () () |
 \  ^  /
  |||||
  |||||

Oh no! There's haxx0rs in the mainframe!

Making the system give meaningful feedback

I wanted the students to be able to be successful with this exercise without the use of a debugger. That meant making sure each time they hit a signal stopping the program it gave back meaningful feedback.

Initially I just programmed up a basic signal handler, but eventually decided that I wanted additional information about the crash. The struct sigaction takes a flag option called SA_SIGINFO which provides two additional options that provide a lot of great information about the crash:

void fault_handler(int signo, siginfo_t *info, void *ucontext);

However, every once in a while a crash would come along where the siginfo_t and the ucontext pointers were invalid. I looked through the sigaction documentation thinking I had missed a configuration option until I came across SA_ONSTACK:

Call the signal handler on an alternate signal stack
provided by sigaltstack(2).  If an alternate stack is not
available, the default stack will be used.  This flag is
meaningful only when establishing a signal handler.

The only conclusion I could come to was that when I was smashing the stack the signal handler somehow could not recover it. I followed the example here which uses malloc to define a new stack signaltstack to define an alternate stack. After that the handler worked perfectly.

As for the logic internal to the handler we wanted to give some sense of the error and provide additional logic provided to us in the system (usually in the form of the si_code field). In particular, we separated out SIGSEGV codes to make them clearer and provided the address of the crash so that student can use that as information to make informed guesses. Additionally, we print the register state for each architecture we support.

An error state on ARM looks like:

------------------------FAULT-------------------------

Goodbye cruel world! I was a young program. And I have died too soon!

You can avenge my death! I received a fault.
That means something went wrong. I received a "Segmentation fault".
If you haven't seen that term before google it! The Wikipedia article is pretty good.

Signal Information:
Seems like the faulty address was 0x4140.
Looks like that address isn't mapped.

Register state:
R0           0xfffef0c0
R1           0xfffef3a3
R2           0x0
R3           0x73752f3d
R4           0xfffef0f8
R5           0x0
R6           0x0
R7           0x41414141
R8           0x0
R9           0x0
R10          0xff7ee000
FP           0x0
SP           0xfffef0d8
LR           0x109ed
PC           0x4140

Keep in mind we want our address to be 0x4b434148

A student playing around with this will see that they have modified the Program Counter and with some help they may be able to figure out the padding size and address string should be.

Conclusions

I built a code redirection buffer overflow demo that should be an easier initial target for students; especially those students who are not familiar with debuggers. This blog post was written with the intention that the link be listed in the book such that anyone interested in a deeper explanation of the problem would be able to find it here.