Computers, WTF? Part 5 - Avengers, Assembly!

The previous post was very conceptual. To balance things out, this one is going to be very practical, but only if you think learning to write (very) basic assembly is practical! Personally, I think it’s important for understanding what your programming language code turns into, but I’m not about to spend hours of my time solely writing assembly by hand (but I won’t knock you if you do! To each their own!).

We’ll write some x86-64 assembly in AT&T syntax. There are a few different syntaxes that you can use to write assembly, but this is the only one I’m really familiar with (as of now) so I’m going to stick with it. I’ll also be using gcc to assemble the assembly on my Ubuntu 18.04 64-bit system. If you don’t have that, you may be able to follow these steps on a Mac, but I’ll be making use of some Linux system calls and those won’t necessarily translate. I’d recommend spinnin’ up the ol’ virtual machine that you created to try out sudo rm -rf --no-preserve-root /.

Hello, Ass-World!

Let’s write our first assembly! It’ll be a variation on the typical “Hello, World!” program. I’ll present it here in toto, then go over each line in a tick.

hello.s

.global _start

.text
_start:
  mov $1, %rdi
  mov $message, %rsi
  mov $18, %rdx
  mov $1, %rax
  syscall

  mov $0, %rdi
  mov $60, %rax
  syscall

message:
  .ascii "Hello, Ass-World!\n"

Here’s the line-by-line breakdown:

.global _start

Things that start with . are called Assembler Directives. They tell the assembler to do something special with the following argument(s). In other words, directives mark off things that are not instructions, but still have important meanings during assembly. In this case, .global tells the assembler that the _start label is a symbol that is visible to the linker, the software that connects the assembled object file to other object files to make a single executable.

.text

This directive tells the assembler that what follows should be assembled. You can optionally follow it with a “subsection name” if you want to specify where it should be assembled.

_start:

This is a label, it marks a specific location in the assembly code so that it can be easily referred to in other sections of the assembly (such as in the .global directive). Technically, its value is the memory address of the following instruction or piece of data.

mov $1, %rdi

This is first instruction statement. It combines a CPU instruction, mov with a couple of operands.

mov tells the CPU to copy the value from the first operand to the second operand. The $ character means that we are using the literal value 1. The % character means we are referring to a CPU register. The “r” at the front of “rdi” means that this is a 64-bit register (don’t ask me why it’s an “r”, I’m sure there’s a perfectly good explanation that I don’t know about). In sum, we are storing the integer 1 in register rdi.

A quick aside – a lot of x86 instructions can have a postfix character that specifies what size of data is being manipulated. For example, movb, movw, movl, and movq specify moving a byte, word, long, and quad, respectively. Just using mov without a prefix leaves the assembler to figure out what size the data should be. I’ve heard moving your quads a lot can lead to really toned legs.

mov $message, %rsi

This instruction moves the value of the label message: into register rsi. message: is defined later.

mov $18, %rdx

Again, moving the literal value 18 into register rdx.

mov $1, %rax

Moving another literal 1 into register rax.

syscall

Now, our chickens come home to roost on their nearly assembled eggs, so to speak. syscall tells the CPU to make a system call to the operating system kernel – the big process that makes your computer do the fancy stuff. This is where the OS you are working on often makes a difference, as different kernels will have different syscall semantics.

Syscalls are used for things like I/O and process management things that the kernel handles on your behalf. In linux, which syscall gets used is defined by the value currently in register ax. We’ve moved the value 1 into rax, so the syscall we are using is “write”.

The write syscall takes a few arguments. If you’ve ever used the write function from C’s unistd.h, you’ll recognize these. We load the number of the file descriptor (fd) that we want to write to into register di (we put 1, the fd for standard out (stdout) into register rdi). We load the memory address of the first byte we want to write into register si (we put $message into rsi). We load the amount of bytes we want to write into register dx (we put 18 into rdx). Finally we make the syscall itself.

At this point in execution, our program should output Hello, Ass-World! to stdout, which is 18 bytes (one byte per character, plus the newline at the end).

Back to our line-by-line

mov $0, %rdi

We move literal zero into rdi.

mov $60, %rax

We move literal 60 into rax.

syscall

Here we make another syscall. This time though, we’ve changed the value in rax to 60, which indicates that the syscall we want to make is exit. exit takes a code that it exits with, and in UNIX systems, exiting with code zero means no errors have occurred, so that’s why we moved zero into rdi.

message:

This is the label we talked about earlier – it lets us refer to the memory address of the following data in our instructions by this name.

.ascii "Hello, Ass-World!\n"

This is our payload for our syscall to write. The .ascii directive indicates that it is a string literal.

Build-A-Binary Workshop

So we’ve written our assembly, but we want to be able to run it too! Programs are as disappointing to us as we are to our parents if they don’t run, so let’s give it a shot:

Save the above assembly as hello.s somewhere, anywhere, then assemble and run it using:

$ gcc -nostdlib -no-pie hello.s && ./a.out

You should see this output:

Hello, Ass-World!

Yay! It (hopefully) works! We used gcc, the GNU C and C++ compiler, to assemble our assembly. The -nostdlib flag told it not to link the C standard library, as we haven’t conformed to the requirements for that. The -no-pie flag tells it not to bake us a pie at the end. Actually it tells it not to produce a “position independent executable”, which is a binary that can be used in shared libraries and the like (position-independentally). We likewise haven’t conformed to the requirements to make that work. Unfortunately there is no -extra-pie flag.

The Ol’ Static Shocker

Now that we’ve gotten some (very) basic assembly under our x86 wings, we can take a step back into data and talk a little bit about Static Variables. What does it mean for a variable to be statically allocated? Where’s Waldo? We’ll be jumping back and forth between Java/C code and assembly to answer these questions next time.