Talk about atomic operations

2016/10/01 09:43
Reading count 457

Atomic operation, the most fine-grained synchronous operation of data interaction between threads, can ensure the atomicity of reading and writing a certain value between threads.

Because it does not need to add a heavyweight mutex for synchronization, it is very lightweight, and does not need to switch scheduling back and forth between the kernels. The efficiency is very high..

How to use atomic operations? There are related apis on all platforms that provide support, as well as compiler level __builtin interfaces for gcc and clang compilers

  1. Interlockedxxx and Interlockedxxx64 series apis of Windows
  2. The OSAtomicXXX series api of macosx
  3. Gcc __sync_val_compare_and_swap and __sync_val_compare_and_swap_8 __builtin interface
  4. X86 and x86_64 architecture lock Assembly instruction
  5. Cross platform atomic interface of tbox

Use of tbox interface

Take the tbox first tb_atomic_fetch_and_add As the name suggests, the API will first read the original value, and then add a value to it:

 //Equivalent to atom: b=* a++; tb_atomic_t a = 0; tb_long_t   b = tb_atomic_fetch_and_add(&a, 1);

If you need to perform the add calculation first and then return the results, you can use:

 //Equivalent to atom: b=++* a; tb_atomic_t a = 0; tb_long_t   b = tb_atomic_add_and_fetch(&a, 1);

Or it can be simplified to:

 tb_long_t b = tb_atomic_fetch_and_inc(&a); tb_long_t b = tb_atomic_inc_and_fetch(&a);

How does tbox adapt to various platforms internally? We can simply look at it. Basically, it just wraps the native api.

Windows interface encapsulation

 static __tb_inline__ tb_long_t tb_atomic_fetch_and_add_windows(tb_atomic_t* a, tb_long_t v) { return (tb_long_t)InterlockedExchangeAdd((LONG __tb_volatile__*)a, v); } static __tb_inline__ tb_long_t tb_atomic_inc_and_fetch_windows(tb_atomic_t* a) { return (tb_long_t)InterlockedIncrement((LONG __tb_volatile__*)a); }

Encapsulation of gcc interface

 static __tb_inline__ tb_long_t tb_atomic_fetch_and_add_sync(tb_atomic_t* a, tb_long_t v) { return __sync_fetch_and_add(a, v); }

X86 and x86_64 architecture assembly implementation

 static __tb_inline__ tb_long_t tb_atomic_fetch_and_add_x86(tb_atomic_t* a, tb_long_t v) { /* * xaddl v, [a]: * * o = [a] * [a] += v; * v = o; * * cf,  ef, of, sf, zf, pf... maybe changed */ __tb_asm__ __tb_volatile__  ( #if TB_CPU_BITSIZE == 64 "lock xaddq %0, %1 \n"          //!<  xaddq v,  [a] #else "lock xaddl %0, %1 \n"          //!<  xaddl v,  [a] #endif : "+r" (v)  : "m" (*a)  : "cc", "memory" ); return v; }

In addition to the addition, subtraction, multiplication and division of int32 and int64 values, atomic operations can also perform xor, or, and and other logical calculations. The usage is similar, so I won't say much about it here.

Now let's take another simple example to make practical use of atomic application scenarios, such as:

  • Used to realize spin lock
  • Used to implement lockless queues
  • Status synchronization between threads
  • Used to implement a single example

wait..

Implementation of spin lock

Let's first look at how to implement a simple spin lock. In order to standardize the demonstration code, the following code uses the atomic interface provided by tbox as an example:

 static __tb_inline_force__ tb_bool_t tb_spinlock_init(tb_spinlock_ref_t lock) { // init  *lock = 0; // ok return tb_true; } static __tb_inline_force__ tb_void_t tb_spinlock_exit(tb_spinlock_ref_t lock) { // exit  *lock = 0; } static __tb_inline_force__ tb_void_t tb_spinlock_enter(tb_spinlock_ref_t lock) { /*Try to read the status value of the lock. If the lock (status 0) has not been obtained, obtain it (set to 1) *If the other thread has obtained the lock (status 1), it will wait circularly to try to obtain it again * *Note: The whole state reading and setting are atomic and cannot be interrupted */ tb_size_t tryn = 5; while (tb_atomic_fetch_and_pset((tb_atomic_t*)lock, 0, 1)) { //If the lock is not obtained, after five attempts, it is not successful, then let the CPU switch to other threads to run, and then try to obtain the lock again if (! tryn--) { // yield tb_sched_yield(); // reset tryn tryn = 5; } } } static __tb_inline_force__ tb_void_t tb_spinlock_leave(tb_spinlock_ref_t lock) { //Release the lock. No atom is needed here. Set it to half broken. The value position is 0. The other thread is still waiting. No effect is received *((tb_atomic_t*)lock) = 0; }

This implementation is very simple, but in the tbox, the spinlock is basically used by default, because most of the multithreaded implementations in the tbox have a very fine granularity

In most cases, the spin lock is OK, and there is no need to enter the kernel state to switch and wait..

The usage is as follows:

 //Get lock tb_spinlock_enter(&lock); //Some synchronization operations // .. //Release lock tb_spinlock_leave(&lock);

In the above code, the init and exit operations are omitted. In actual use, the corresponding processing should be done at the place of response initialization and release..

class pthread_once Implementation of

pthread_once It can be used in a multi-threaded function to ensure that the incoming function is called only once. It can generally be used to initialize the global singleton or TLS key initialization

Taking the interface of tbox as an example, let me see how to use this function:

 //Initialization function can only be called once static tb_void_t tb_once_func(tb_cpointer_t priv) { //Initialize some singleton objects and global variables //Or execute some initialization calls } //Thread function static tb_int_t tb_thread_func(tb_cpointer_t priv) { //Global storage lock, initialized to 0 static tb_atomic_t lock = 0; if (tb_thread_once(&lock, tb_once_func, "user data")) { // ok } }

Here we take atomic operations to simply simulate the implementation of this function:

 tb_bool_t tb_thread_once(tb_atomic_t* lock, tb_bool_t (*func)(tb_cpointer_t), tb_cpointer_t priv) { // check tb_check_return_val(lock && func, tb_false); /*The atom obtains the lock status * *0: func has not been called *1: Lock has been obtained, func is being called by other threads *2: func has been called, and func returns OK *- 2: func has been called, and func returns failed */ tb_atomic_t called = tb_atomic_fetch_and_pset(lock, 0, 1); //Func has been called by other threads? Direct return if (called && called !=  1)  { return called == 2; } //Func has not been called yet? Then call it else if (! called) { //Call function tb_bool_t ok = func(priv); //Set Return Status tb_atomic_set(lock, ok? 2 : -1); // ok? return ok; } //The lock is being obtained by other threads, and func is being called, which has not been completed yet? Try to wait for lock else { //Here, I simply do some sleep loop waiting until the other thread finishes the func execution tb_size_t tryn = 50; while ((1 == tb_atomic_get(lock)) && tryn--) { // wait some time tb_msleep(100); } } /*Retrieve the lock status to determine whether it is successful *  *Success: 2 *Timeout: 1 *Failed: - 2 * *If it is not 2 here, it is a failure */ return tb_atomic_get(lock) == 2; }

64 bit atomic operation

64 bit operations are used in exactly the same way as 32-bit interfaces, except for the difference in variable types:

  1. Type in tbox is tb_atomic64_t , the interface is changed to tb_atomic64_xxxx
  2. The type in gcc is volatile long long , the interface is changed to __sync_xxxx_8 series
  3. Interlockedxxx64 on windows

Please refer to 32-bit for specific usage methods, which will not be described in detail here..


Personal homepage: TBOOX open source project

Expand to read the full text
Loading
Click to lead the topic 📣 Post and join the discussion 🔥
Reward
zero comment
two Collection
zero fabulous
 Back to top
Top