[Unlocking Performance: Introduction to High-Speed Mutual Exclusion with Atomic]

2025年12月31日

The most common way to share data safely between threads is locking with Mutex. However, locks can cause thread waiting (blocking) and often become performance bottlenecks. In this article, I will explain atomic operations, a lower-level and faster mechanism that guarantees data safety without using locks, and std::atomic, which implements this in C++.

What is an Atomic Operation?

An atomic operation (indivisible operation) is an operation guaranteed never to be interrupted by other threads in the middle of processing. For example, if a series of processes—”read a counter value, add 1, and write it back”—is executed atomically, there is no worry that another thread will read an intermediate value.

The standard library provides a class template called std::atomic<T>. By wrapping a variable declaration with this template, such as atomic<int> or atomic<bool>, operations (assignment, reading, etc.) on that variable can be performed atomically. Many types, including integer types, floating-point number types, and pointer types, are supported.

Memory Order: Controlling “Arbitrary Optimization” by Compiler and CPU

The most important concept in correctly understanding atomic operations is memory order. To improve performance, compilers and CPUs may swap the order of code instructions if they judge that it does not affect the program’s behavior. This is fine in a single-threaded environment, but in a multi-threaded environment, this “arbitrary reordering” can cause fatal bugs. Memory order is an instruction to finely control how much of this reordering is permitted.

Practice: Synchronization Between Threads with release-acquire

One of the most typical uses of memory order is synchronization between a Producer thread and a Consumer thread. The following code is an example where one thread prepares data and the other thread reads that data safely.

#include <iostream>
#include <thread>
#include <atomic>
#include <string>
#include <vector>

// Data shared between threads
std::string shared_message;
// Atomic flag used for synchronization between threads
std::atomic<bool> data_is_ready{false};

// "Producer" thread that prepares data and sets the flag
void producer_task() {
    std::cout << "Producer: Preparing data...\n";
    shared_message = "This message is passed safely.";
    
    // Release store: Guarantees that all memory writes prior to this operation
    // are completed before this store operation is completed.
    data_is_ready.store(true, std::memory_order_release);
    std::cout << "Producer: Data preparation complete.\n";
}

// "Consumer" thread that monitors the flag and reads data
void consumer_task() {
    std::cout << "Consumer: Waiting for data...\n";
    // Acquire load: Waits until the corresponding release store is performed.
    // Guarantees that all memory reads after this operation
    // are executed after this load operation is completed.
    while (!data_is_ready.load(std::memory_order_acquire)) {
        // Wait until data is ready (Busy loop)
    }
    
    // At this point, it is guaranteed that shared_message can be read safely
    std::cout << "Consumer: Received data: \"" << shared_message << "\"\n";
}

int main() {
    std::thread producer(producer_task);
    std::thread consumer(consumer_task);

    producer.join();
    consumer.join();

    return 0;
}

Code Point Explanation

memory_order_release (Store Operation): Used in the producer side’s store. This memory order imposes a strong constraint: “I am about to set the flag to true, but guarantee that all writes, such as the assignment to shared_message written before this, are completed before this flag writing.” This prevents the worst-case reordering where data_is_ready becomes true before shared_message is rewritten.
memory_order_acquire (Load Operation): Used in the consumer side’s load. This memory order imposes a constraint: “Guarantee that all reads performed after I read this flag as true are performed after this flag reading.”

By pairing this release and acquire, synchronization is established: “If the consumer reads true, all writes performed by the producer before writing true are visible to the consumer.” This allows data transfer safely without using Mutex.

Other Memory Orders

There are several other memory orders.

memory_order_relaxed: The loosest constraint; it guarantees no order. Used when only pure atomicity is needed, such as atomic counters.
memory_order_seq_cst: The strongest constraint; it guarantees that the order of processing looks the same in all threads. It is the default memory order and the safest, but it may have the largest performance overhead.

Summary

std::atomic is a very powerful tool when you want to avoid the overhead of locks by Mutex. It performs especially well in synchronizing simple flags and counters. However, to bring out its power correctly, you need to understand the concept of memory order accurately. Incorrect memory order causes very troublesome bugs with low reproducibility. It is better to start with the safest memory_order_seq_cst and carefully consider weaker memory orders based on performance measurements.

よかったらシェアしてね！