Computers, WTF? Part 5 - Avengers, Assembly!
The previous post was very conceptual. To balance things out, this one is going to be very practical, but only if you think learning to write (very) basic assembly is practical! Personally, I think it’s important for understanding what your programming language code turns into, but I’m not about to spend hours of my time solely writing assembly by hand (but I won’t knock you if you do! To each their own!).
We’ll write some x86-64 assembly in AT&T syntax.
There are a few different syntaxes that you can use to write assembly, but
this is the only one I’m really familiar with (as of now) so I’m going to
stick with it. I’ll also be using
gcc to assemble the assembly on my Ubuntu
18.04 64-bit system. If you don’t have that, you may be able to follow
these steps on a Mac, but I’ll be making use of some Linux system calls
and those won’t necessarily translate. I’d recommend spinnin’ up the ol’
virtual machine that you created to try out
sudo rm -rf --no-preserve-root /.
Let’s write our first assembly! It’ll be a variation on the typical “Hello, World!” program. I’ll present it here in toto, then go over each line in a tick.
.global _start .text _start: mov $1, %rdi mov $message, %rsi mov $18, %rdx mov $1, %rax syscall mov $0, %rdi mov $60, %rax syscall message: .ascii "Hello, Ass-World!\n"
Here’s the line-by-line breakdown:
Things that start with
. are called Assembler Directives. They tell the
assembler to do something special with the following argument(s). In other
words, directives mark off things that are not instructions, but still have
important meanings during assembly. In this case,
.global tells the
assembler that the
_start label is a symbol that is visible to the
linker, the software that connects the assembled object file to other
object files to make a single executable.
This directive tells the assembler that what follows should be assembled. You can optionally follow it with a “subsection name” if you want to specify where it should be assembled.
This is a label, it marks a specific location in the assembly code so that
it can be easily referred to in other sections of the assembly (such as in
.global directive). Technically, its value is the memory address of the
following instruction or piece of data.
mov $1, %rdi
This is first instruction statement. It combines a CPU instruction,
a couple of operands.
mov tells the CPU to copy the value from the first operand to the second
$ character means that we are using the literal value
% character means we are referring to a CPU register. The “
r” at the front
rdi” means that this is a 64-bit register (don’t ask me why it’s an
“r”, I’m sure there’s a perfectly good explanation that I don’t know about).
In sum, we are storing the integer
1 in register
A quick aside – a lot of x86 instructions can have a postfix character that
specifies what size of data is being manipulated. For example,
movq specify moving a byte, word, long, and quad, respectively.
mov without a prefix leaves the assembler to figure out what size
the data should be. I’ve heard moving your quads a lot can lead to really
mov $message, %rsi
This instruction moves the value of the label
message: into register
message: is defined later.
mov $18, %rdx
Again, moving the literal value 18 into register
mov $1, %rax
Moving another literal 1 into register
Now, our chickens come home to roost on their nearly assembled eggs, so to
syscall tells the CPU to make a system call to the operating system
kernel – the big process that makes your computer do the fancy stuff. This is
where the OS you are working on often makes a difference, as different kernels
will have different syscall semantics.
Syscalls are used for things like I/O and process management things that the
kernel handles on your behalf. In linux, which syscall gets used is defined
by the value currently in register
ax. We’ve moved the value 1 into
so the syscall we are using is “
write syscall takes a few arguments. If you’ve ever used the
function from C’s
unistd.h, you’ll recognize these. We load the number of
the file descriptor (fd) that we want to write to into register
put 1, the fd for standard out (
stdout) into register
rdi). We load
the memory address of the first byte we want to write into register
rsi). We load the amount of bytes we want to write into
dx (we put 18 into
rdx). Finally we make the syscall itself.
At this point in execution, our program should output
Hello, Ass-World! to
stdout, which is 18 bytes (one byte per character, plus the newline at the
Back to our line-by-line
mov $0, %rdi
We move literal zero into
mov $60, %rax
We move literal 60 into
Here we make another syscall. This time though, we’ve changed the value in
rax to 60, which indicates that the syscall we want to make is
exit takes a code that it exits with, and in UNIX systems, exiting with code
zero means no errors have occurred, so that’s why we moved zero into
This is the label we talked about earlier – it lets us refer to the memory address of the following data in our instructions by this name.
.ascii "Hello, Ass-World!\n"
This is our payload for our syscall to
indicates that it is a string literal.
So we’ve written our assembly, but we want to be able to run it too! Programs are as disappointing to us as we are to our parents if they don’t run, so let’s give it a shot:
Save the above assembly as
hello.s somewhere, anywhere, then assemble and
run it using:
$ gcc -nostdlib -no-pie hello.s && ./a.out
You should see this output:
Yay! It (hopefully) works! We used
gcc, the GNU C and C++ compiler, to
assemble our assembly. The
-nostdlib flag told it not to link the C standard
library, as we haven’t conformed to the requirements for that. The
flag tells it not to bake us a pie at the end. Actually it tells it not to
produce a “position independent executable”, which is a binary that can be
used in shared libraries and the like (position-independentally). We likewise
haven’t conformed to the requirements to make that work. Unfortunately there
The Ol’ Static Shocker
Now that we’ve gotten some (very) basic assembly under our x86 wings, we can take a step back into data and talk a little bit about Static Variables. What does it mean for a variable to be statically allocated? Where’s Waldo? We’ll be jumping back and forth between Java/C code and assembly to answer these questions next time.