Table of Contents
- Syscalls, memory, and your first therad
- The pointer to self and thread-local storage
- Futexes, mutexes, and memory sychronization
- Joining threads and dynamic initialization
- Cancellation
- Scheduling and task priority
- RW Locks
- Condition variables
- Final thoughts
Intro
The second part of our threading journey covers the thread-local storage. I do
not mean here the compiler generated TLS that I mentioned in the first part. I
mean the stuff that involves pthread_setspecific
and friends. I don't think
it's the most critical part of all this threading business. I barely ever use it
in practice. However, we will need to refer to the current thread all over the
place, and this requires the TLS. It's better to deal with it once and for all,
especially that it's not particularly complicated.
Self
How does one store something in a fixed location and make it distinct for each
execution thread? The only two answers that come to my mind are either using
syscalls to associate something with kernel's task_struct
or using CPU
registers. The first approach requires context switches to retrieve the value,
so it's rather inefficient. The second option, though, should be pretty fast.
Conveniently, on x86_64, some registers are left unused (see
StackOverflow). In fact, the SETTLS
option to clone
takes a pointer
and puts it in the file segment register for us, so we don't even need to make
an extra syscall just for that.
Since fs
is a segment register, we cannot retrieve it's absolute value without
engaging the operating system. Linux on x86_64 uses the arch_prctl
syscall for
this purpose:
1#include "tb.h"
2#include <asm/prctl.h>
3
4tbthread_t tbthread_self()
5{
6 tbthread_t self;
7 SYSCALL2(__NR_arch_prctl, ARCH_GET_FS, &self);
8 return self;
9}
This syscall seems expensive (link) and making it defies the reason for using a register in the first place. We can read the memory at an address relative to the value of the register, though. Using this fact, we can make our thread struct point to itself and then retrieve this pointer using inline assembly. Here's how:
1tbthread_t tbthread_self()
2{
3 tbthread_t self;
4 asm("movq %%fs:0, %0\n\t" : "=r" (self));
5 return self;
6}
For the main thread, the linker initializes the TLS segment for pthreads
automatically using arch_prctl
. We cannot use it for our purposes, but we can
count on tbthread_self
returning a unique, meaningful value.
See the full patch here.
TLS
The actual TLS is handled by the following four functions: pthread_key_create
,
pthread_key_delete
, pthread_getspecific
, pthread_setspecific
. I will not
explain what they do because it should pretty self-evident. If it's not, see the
man pages.
In principle, we could just have a hash table in the tbthread
struct. We could
then use setspecific
and getspecific
to set and retrieve the value
associated with each given key. Calling setspecific
with a NULL pointer would
delete the key. Unfortunately, the designers of pthreads made it a bit more
complicated by having separate key_create
and key_delete
functions, with the
delete function invalidating the key in all the threads. Glibc uses a global
array of keys and sequence numbers in a clever way to solve this problem. We
will take almost the same approach in a less efficient but a bit clearer way.
We will have a global array representing keys. Each element of this array will
host a pointer to a destructor and a sequence number. The index in the array
will be the actual key passed around between the TLS functions. Both
key_create
and key_delete
will bump the sequence number associated with a
key in an atomic way. If the number is even, the key is not allocated, if the
number is odd, it is.
1static struct
2{
3 uint64_t seq;
4 void (*destructor)(void *);
5} keys[TBTHREAD_MAX_KEYS];
6
7#define KEY_UNUSED(k) ((keys[k].seq&1) == 0)
8#define KEY_ACQUIRE(k) (__sync_bool_compare_and_swap(&(keys[k].seq), keys[k].seq, keys[k].seq+1))
9#define KEY_RELEASE(k) (__sync_bool_compare_and_swap(&(keys[k].seq), keys[k].seq, keys[k].seq+1))
10
11int tbthread_key_create(tbthread_key_t *key, void (*destructor)(void *))
12{
13 for(int i = 0; i < TBTHREAD_MAX_KEYS; ++i) {
14 if(KEY_UNUSED(i) && KEY_ACQUIRE(i)) {
15 *key = i;
16 keys[i].destructor = destructor;
17 return 0;
18 }
19 }
20 return -ENOMEM;
21}
Each tbthread
struct will hold an array of the same size as the global array
of keys. Each element in this array will hold a data pointer and a sequence
number. Storing data will set the sequence number as well. Retrieving the data
will check whether the local and the global sequence number match before
proceeding.
1void *tbthread_getspecific(tbthread_key_t key)
2{
3 if(key >= TBTHREAD_MAX_KEYS || KEY_UNUSED(key))
4 return 0;
5
6 tbthread_t self = tbthread_self();
7 if(self->tls[key].seq == keys[key].seq)
8 return self->tls[key].data;
9 return 0;
10}
See the full patch here.
Atomics
KEY_ACQUIRE
and KEY_RELEASE
use gcc atomic builtins described here.