Variables, you probably know about them already, right? You know them as the names of values you use in your programs. You declare them, define them, and get really annoyed when JavaScript tells you that they’re undefined. Today we are going to go a little bitter deeper into variables, specifically those of the statically allocated type. You may already be familiar with the keyword static as used in some languages. Depending on the language, this is a related concept. Also depending on the language, it is not a related concept (yay, semantic inconsistency).

What I’m trying to lay down here is the concept of static allocation. Allocation is the decision of where in memory a value will be stored. Static allocation is when this decision is made once for the value, then is not changed for the rest of the runtime of the program. This generally means that the memory address is decided by the compiler when it turns your code into an executable.

Freeze, Sucka

Here’s some C. I know you were begging for some code after looking at assembly last time.

int a;
int b[10];

int main(void) {
  a = 0;
  b[a] = a;
}

In the code above, a and b are initialized as an int and an array of ten ints at the top. Then, in main, some of the values are changed. However, these references to a and b still refer to the exact same variables. The location and size of these variables can be inferred at compilation time, so the compiler statically allocates them. The x86 assembly for this allocation might look like:

.comm   a,4
.comm   b,40

.comm means a “common symbol”, which can also be thought of as a kind of global variable. a,4 is a common symbol named a of size 4 bytes. b,40 is a common symbol named b of size 40 bytes (10 4-byte ints).

Alternatively, we can also just use labels to statically allocate our variables, but this requires we give them initial values:

a:
  .int 0
b:
  .int 0
  .int 1
  .int 2
  # <3..8>
  .int 9

The memory addresses allocated for the members of the array b are sequential – they come one after another. When a member of b is accessed, say, like b[2], the integer provided is multiplied by the size of one member of b to get the memory offset from the start of b for that member. Say b is laid out in memory like so:

address   value
0x1000    b[0]
0x1004    b[1]
0x1008    b[2]
...
0x1024    b[9]

The aforementioned b[2] gets translated to “the data at memory address 0x1000 + 2×4 = 0x1000 + 8 = 0x1008”, as 4 bytes is the size of a member of b, an int. This is why both the type of an array and its size must be known at compile time for it to be statically allocated.

In x86-64 assembly, the instructions for accessing b[2] might look like this, assuming we have a label b:

mov $b, %rax       # Store the address of b in rax
mov 8(%rax), %rax  # Store b[2] in rax

The 8(%rax) syntax tells the CPU to move the value found in memory 8 bytes after the start of the address stored in rax into rax. 8 bytes is the size of two ints, so we get b[2].

Lost in Translation

Next time, we’ll try translating the above C code into assembly. The code itself doesn’t have any output, and neither will our assembly(!), but it should run just fine, and it’s good practice because I say it is. Later!