Aiden Fox Ivey

To begin this blog post, I see it as most fitting to place a definition. The reasoning behind which I will elaborate upon shortly.

Linguae Francae: A common language used by people of diverse backgrounds to communicate with one another, often a basic form of speech with simplified grammar, particularly, one that is not the first language of any of its speakers.

In the context of modern computers, it is hard to imagine any programming language as more fitting for the title of linguae francae than the C programming language.

I will elaborate on this point, however I first need to provide some context as to the history of C and justification for still using it.

To some extent, C’s popularity can be credited with Dennis Ritchie (the creator of C) reimplementing the kernel of Research Unix in C. It is also likely a byproduct of the GNU userspace, the Linux kernel, and common runtimes for languages such as Ruby or Python being implemented in C.

It’s also worth mentioning that virtually all modern operating systems define their interfaces in terms of C’s abstract model. Between this fact and a faulty understanding of how modern computers work, this is how people casually assume that C is a low level language.

The result of this choice of that when a non trivial program is to be run in a language other than C, there must be some connection with C code at some point down the line.

A secondary reason to want to interface with C is that a large number of the most widely useful (or hard to replace) libraries are written in C.

libcurl, openssl, and zlib are all great examples of this. Respectively, they provide a way to interact with network protocols, use cryptographic algorithms, or compress data. Since the time and effort to make sure that these implementations are correct and efficient is immense, it would suit a clever software engineer well to reuse these libraries.

If you’ve ever worked with a library in C before, you can probably get the gist of how someone might use a big C library like this within C code. (Don’t worry if you haven’t. Just know that it implies using #include and then adding a flag when you compile it.)

However, this doesn’t leave much room for programmers who want to use a language other than C. This is where the namesake of this blog post comes in. The foreign function interface is the idea of making an interface between different programming languages. Usually the term implicitly describes interfacing a given language with the C programming language, although this is not necessary for the definition.

(In this article, I will use FFI as shorthand to refer to some general foreign function interface between a given language and C.)

A quick example

First, let’s write a simple definition for some code that calculates the nth value of the Fibonacci sequence.

The crux of a Fibonacci sequence is that a given Fibonacci number is defined as the sum of the two Fibonacci numbers before it. As a result, it lends itself to be written recursively. We will write it instead, imperatively. In the spirit of neuroticism, we will first figure out what the most appropriate C language data type is to store the loop counter. Normal people might opt to use the integer data type, but we will consider this more carefully. (Do not do this when you are programming normally. It is an anti-pattern.)

On my platform, unsigned long long has a max size of 2^64 - 1. Let’s determine the largest Fibonacci number that fits in a 64-bit integer. I’ll use SageMath for this. (SageMath is kind of confusing - it’s like a mathematics system that integrates lots of open source libraries with a language that is essentially a modified form of Python.)

limit = 2^64 - 1
x = 0
while fibonacci(x) <= limit:
    x += 1
print(x - 1)

This prints 93, which means that the 93rd Fibonacci number is the largest number we can store within our 64-bit unsigned integer. Thus, we can use a char to store our loop counter variable. This is because the max size of the signed char, which is the default when not specified as unsigned or signed, is 127.

unsigned long long fibonacci(unsigned long long n) {
    char i = 0;
    unsigned long long a = 0, b = 1, tmp;

    for (; i < n; i++) {
        tmp = a;
        a = b;
        b = tmp + b;
    }

    return a;
}

A foray into compilers

Normally, as I mentioned before, Fibonacci would be recursively defined, but this makes more sense in functional languages like Haskell where their compilers are written to rewrite recursive definitions into imperative code. In a language where the compilers lack this feature, this code example would incur unnecessary stack frames while executing.

Consider this recursive definition.

unsigned long long fibonacci(unsigned long long n) {
    if (n == 0 || n == 1) {
        return n;
    }
    return fibonacci(n - 1) + fibonacci(n - 2);
}

After running clang -S -O3 recursive_fib.c, we are treated with the following (slightly cleaned up) ARM64 assembly code.

fibonacci:
	stp	x20, x19, [sp, #-32]!
	stp	x29, x30, [sp, #16]
	add	x29, sp, #16
	.cfi_def_cfa w29, 16
	.cfi_offset w30, -8
	.cfi_offset w29, -16
	.cfi_offset w19, -24
	.cfi_offset w20, -32
	mov	x19, x0
	cmp	x0, #2
	b.hs	LBB0_2
	mov	x20, #0
	add	x0, x19, x20
	ldp	x29, x30, [sp, #16]
	ldp	x20, x19, [sp], #32
	ret
LBB0_2:
	mov	x20, #0
LBB0_3:
	sub	x0, x19, #1
	bl	fibonacci
	sub	x19, x19, #2
	add	x20, x0, x20
	cmp	x19, #1
	b.hi	LBB0_3
	add	x0, x19, x20
	ldp	x29, x30, [sp, #16]
	ldp	x20, x19, [sp], #32
	ret

First, the code moves the argument register x0 to x19 to save it and restore before returning. cmp x0, #2 coupled with b.hs jumps to the label LBB0_2 if the value is greater than or equal to 2. Otherwise, we just return the value of the argument that was held in x0. In the other case, we calculate fibonacci(n-1), and then sum it, checking for when we reach the n == 1 || n == 0 case.

Interestingly, the compiler has outwitted us here, seeing as though we expected that we would get a naive implementation that called the Fibonacci sequence twice recursively rather than once recursively within a loop.

This is certainly better than what we forecast, but still not optimal, given that a truly optimized iterative solution would just use a single loop without any recursion and, as a result, any additional stack frames.

We can now dispel this temporary foray into compiler optimization and head back to the land of foreign function interfaces.

Back on track

Let’s take the iterative example I wrote and turn it into a shared object. In order to do this, I run clang -shared -o libfibonacci.so -fPIC fib.c.

A shared object (.so) is a kind of object file that can be loaded into a program at runtime rather than at compile time. In order to use our Fibonacci implementation inside Python code, we must write it in the form of a shared object, since the Python implementation will load it at runtime and Python does not have a separate compilation step in the same sense as C.

Notably, we do have to either output an object file or a shared object file here, because the file we wrote earlier does not have a main function. As a result, trying to turn it into an executable will fail, since there is no place for code execution to start from.

On more esoteric targets like microcontrollers, frequently one can specify the name of the entry point they would like to invoke rather than, on modern operating systems, which usually just assume “main” is the name of the entry point (at least during compilation).

Also, note that we have the flag -fPIC. This means that any given code needs to be generated such that it doesn’t depend on being at a specific address to work. Instead, the code will opt to generate instructions that jump to offsets with respect to labels such that it can be mapped anywhere in memory.

The last thing that might be worthwhile to mention is that if you are on a Mac and you’ve poked around the disk a lot, you may have noticed files called “dylib”. These are a more modern but still compatible form of the shared object that we referred to earlier. They’re commonly called dynamic libraries.

Now let’s write a Python file that invokes the shared library we wrote and executes it:

import ctypes
import time

def fibonacci(n):
    i, a, b = 0, 0, 1

    while i < n:
        a, b = b, a + b
        i += 1

    return a

def main():
    lib = ctypes.CDLL('./libfibonacci.so')
    lib.fibonacci.argtypes = [ctypes.c_ulonglong]
    lib.fibonacci.restype = ctypes.c_ulonglong

    python_start = time.time()
    python_result = fibonacci(93)
    python_duration = time.time() - python_start

    c_start = time.time()
    c_result = lib.fibonacci(93)
    c_duration = time.time() - c_start

    print("In C: ", c_result, c_duration)
    print("In Python: ", python_result, python_duration)

main()

We find that both solutions are the identical value of 12200160415121876738, with the C implementation taking 3.10e-06 seconds and the Python implementation taking 5.96e-06 seconds.

We are now calling into C from Python! If you’ve been following along, you can safely say that you’ve used a foreign function interface.

One point of curiosity, though, is that the C code (when run through the FFI) is not an order of magnitude faster than the Python implementation.

Let’s then see how long this given code takes to run without using the foreign function interface by adapting our C code into its own standalone executable.

#include <stdio.h>
#include <time.h>

unsigned long long fibonacci(unsigned long long n) {
    char i = 0;
    unsigned long long a = 0, b = 1, tmp;

    for (; i < n; i++) {
        tmp = a;
        a = b;
        b = tmp + b;
    }

    return a;
}

int main(void) {
    time_t start, end;
    double cpu_time_used;
    unsigned long long result;

    start = clock();
    result = fibonacci(93);
    end = clock();

    cpu_time_used = ((double)(end - start)) / CLOCKS_PER_SEC;
    printf("%llu which took %e\n", result, cpu_time_used);

    return 0;
}

On my machine, this took somewhere around 1.50e-06 seconds to execute, which is notably less than the C code running through the foreign function interface.