Un-locking Zig

For the Zigtoberfest 2025 I gave a talk about lock-free — or non-blocking — data structures. This page provides a small summary of the talk.

Not in a reading mood? Let me do the talking over on YouTube:

Intro

Do you like cooking? Also with multiple people at the same time? If so, you might have experienced the problem of multiple cooks trying to use the same utensil at the same time. You are constantly blocking each other from using the single available knife, someone is always cutting the veggies wrong, and in the end, it takes longer than if you just cooked alone.

It’s the same with programming. When multiple threads try to access the same data structure at the same time, they need to coordinate their access. Usually, this is done with locks like mutexes, semaphores, or read-write locks. And while locks are a simple and often times effective solution, one can easily overlook their flaws.

Just to name a few hidden costs:

So … can we synchronize access without using locks?

How to go lock-free

To get rid of locks, we need to rethink the critical section. If it were so small, that it wouldn’t be possible for multiple threads to enter it at the same time, locks would be obsolete. This is where atomic operations come into play.

Atomic operations are hardware-supported, meaning that the CPU directly ensures that no intermediate results are visible to other threads. Zig provides access to those instructions through built-ins like @atomicLoad, @atomicStore, and @cmpxchgStrong. The latter is especially interesting, as it powers most lock-free data structures. We call it Compare And Swap (CAS). This is how it could be implemented non-atomically:

fn cas(comptime T: type, ptr: *T, expected: T, new: T) ?T {
    if (ptr.* != expected) {
        return null;
    }
    const old = ptr.*;
    ptr.* = new;
    return old;
}

The idea is simple: We want to update the value at ptr to new if, and only if, its current value is expected (so nobody else changed the state underneath us). If the current value is not expected, we fail and don’t change anything. In reality, Zig’s @cmpxchgStrong will directly compile down to the respective CPU instruction without any of the above logic happening in software.

Typically, CAS is further composed into a CAS-loop. I’ll show this using a Stack’s push operation:

fn push(self: *Stack, value: T) !void {
    var new_head = try self.allocator.create(Node);
    new_head.* = Node{
        .value = value,
        .next = null,
    };

    while (true) {
        const old_head = @atomicLoad(&self.head, .acquire);
        new_head.next = old_head;
        if (@cmpxchgStrong(&self.head, old_head, new_head,
            .release, .acquire) == null) {
            return; // success, exit the function
        }
        // failed, try again
    }
}

Now that you have a rough idea of how lock-free data structures work, let’s further look at some properties and considerations.

Freedom

In contrast to lock-based data structures, where arguing about progression guarantees is tricky, non-blocking data structures can fulfill different levels of freeness.

With this hierarchy, stronger guarantees automatically entail weaker ones.

Every non-blocking data structure is always at least obstruction-free. Achieving lock-freedom is also easy, as most CAS-loops are lock-free by design. Think about it: A CAS of one thread can only fail if another thread successfully changed the state by progressing. Wait-freedom, however, is harder to achieve and often requires more complex algorithms or additional helping mechanisms. E.g., threads could reserve certain memory slots for themselves to ensure they can always make progress when accessing those. Or Threads could also help each other by completing each other’s operations ( non-blocking implementations of queues often uses this technique).

Considerations

Another thing we can reason about is the correct memory ordering of our operations. In essence, we have the problem that the CPU as well as the compiler will reorder instruction to, e.g., hide latencies. While this is usually beneficial for performance, it can lead to unexpected behavior in concurrent programs, as semantic data dependencies might not be obvious to the optimizer.

To have control over this, we can introduce certain types of memory barriers or orderings to our atomic operations. Zig (through LLVM) supports the following:

Know what? Most of the time, this should work: Put a .release on your writes and a .acquire on your reads. When in doubt, go with .seq_cst, this will cost you a bit of performance, but ensure correctness.

Challenges

While lock-free data structures have many advantages, they also come with their own set of challenges. A common one is the ABA problem. Consider the following illustration:

Here, the driver is waiting at a red light until it turns green. However, since they have to wait for so long, they get sleepy and don’t notice when the light turns green. Only after the light has turned red again do they wake up. From the driver’s perspective, the light never changed, even though it did.

This can also happen in lock-free data structures. Imagine a thread reads a value A from a shared variable. Before it can perform a CAS operation to update the value, another thread changes the value from A to B and then back to A. When the first thread performs the CAS operation, it sees that the value is still A and successfully updates it, unaware that the value had changed in the meantime.

When working with pointers, this can quickly corrupt the data structure, leading to undefined and definitely unwanted behavior.

To mitigate the ABA problem, several techniques can be employed:

TL;DR

Now how do I decide whether to use lock-free data structures or not? Here are some guidelines:

Don’t be afraid to use locks, but don’t limit yourself to them either!