Table of Contents
- Syscalls, memory, and your first therad
- The pointer to self and thread-local storage
- Futexes, mutexes, and memory sychronization
- Joining threads and dynamic initialization
- Cancellation
- Scheduling and task priority
- RW Locks
- Condition variables
- Final thoughts
Thread Scheduling
The scheduler makes a decision of which thread to run next based on two parameters: scheduling policy and priority.
Conceptually, the scheduler maintains a list of runnable threads for each possible sched_priority value. In order to determine which thread runs next, the scheduler looks for the nonempty list with the highest static priority and selects the thread at the head of this list.
A thread's scheduling policy determines where it will be inserted into the list of threads with equal static priority and how it will move inside this list. the sched(7) man page
The supported policies are:
SCHED_NORMAL
- It's the default Linux policy for threads not requiring any real-time machinery. All the threads have priority of 0, and the decision of which thread gets run next is based on the nice mechanism.SCHED_FIFO
- The threads have priority from 1 (low) to 99 (high). When aSCHED_FIFO
thread becomes runnable, it will preempt any thread of lower priority. There is no time slicing, so the hight priority thread runs as long as it must.SCHED_RR
- RR stands for round-robin. It's the same asSCHED_FIFO
except each thread is allowed to run for a limited time quantum.
See the sched(7) man page for more details.
Setting the scheduling parameters of a kernel task boils down to invoking the
sys_sched_setscheduler
syscall, as shown below.
1struct tb_sched_param
2{
3 int sched_priority;
4};
5
6int tb_set_sched(tbthread_t thread, int policy, int priority)
7{
8 struct tb_sched_param p; p.sched_priority = priority;
9 int ret = SYSCALL3(__NR_sched_setscheduler, thread->tid, policy, &p);
10 if(!ret)
11 thread->sched_info = SCHED_INFO_PACK(policy, priority);
12 return ret;
13}
There is a bit more fun to it, though. As you can see, the kernel can only schedule a task that already exists. Therefore, we need to have a way to set the thread's priority before this thread invokes the user function. The reason for this is that we may need to abort this thread immediately should the scheduler setting fail. We do it by having another futex that we wake when we know whether the thread can run or not:
1if(th->start_status != TB_START_OK) {
2 SYSCALL3(__NR_futex, &th->start_status, FUTEX_WAIT, TB_START_WAIT);
3 if(th->start_status == TB_START_EXIT)
4 SYSCALL1(__NR_exit, 0);
5}
This futex is initialized in the tbthread_create
function depending on the
thread attributes:
1if(!attr->sched_inherit)
2 (*thread)->start_status = TB_START_WAIT;
3else {
4 tbthread_t self = tbthread_self();
5 (*thread)->sched_info = self->sched_info;
6}
And then set to either TB_START_OK
or TB_START_EXIT
after we spawn the
thread but before tbthread_create
exits:
1if(!attr->sched_inherit) {
2 ret = tb_set_sched(*thread, attr->sched_policy, attr->sched_priority);
3
4 if(ret) (*thread)->start_status = TB_START_EXIT;
5 else (*thread)->start_status = TB_START_OK;
6 SYSCALL3(__NR_futex, &(*thread)->start_status, FUTEX_WAKE, 1);
7
8 if(ret) {
9 wait_for_thread(*thread);
10 goto error;
11 }
12}
See the patch at GitHub.
Priority mutexes
Priority mutexes come in three varieties:
TBTHREAD_PRIO_NONE
- Acquiring this type of mutex does not change the scheduling characteristics of the thread.TBTHREAD_PRIO_INHERIT
- When the mutex owner blocks another thread with a higher priority, the owner inherits the priority of the thread it blocks if it's higher than its own.TBTHREAD_PRIO_PROTECT
- Acquiring this kind of mutex raises the priority of the owner to the prioceiling value of the mutex.
Thread Bites implements this functionality by keeping lists of PRIO_INHERIT
and PRIO_PROTECT
mutexes. It then calculates the highest possible priority
taking into account the priority of the mutexes and the priority set by the
user.
The implementation of PRIO_PROTECT
is relatively straightforward. Whenever a
thread acquires this kind of mutex, it is added to the list, and the priority
of the thread is recalculated:
1static int lock_prio_protect(tbthread_mutex_t *mutex)
2{
3 lock_prio_none(mutex);
4 tb_protect_mutex_sched(mutex);
5 return 0;
6}
7
8static int unlock_prio_protect(tbthread_mutex_t *mutex)
9{
10 tbthread_t self = tbthread_self();
11 tb_protect_mutex_unsched(mutex);
12 unlock_prio_none(mutex);
13 return 0;
14}
Implementing PRIO_INHERIT
is a lot more tricky. We add the mutex to the
appropriate list when a thread acquires it. Whenever a higher priority thread
tries to lock the mutex, it bumps the priority of the blocker. But the priority
recalculation is done only at this point. Implementing it like this covers all
the main cases and is not horrendously hard. It allows for simple recursion: if
the owner of a mutex gets blocked, the blocker inherits the priority that comes
with the first mutex. It also has a couple of drawbacks:
- It assumes that the kernel will always wake the highest priority thread. It makes sense and is most likely the case. However, I have not tested it.
- If the owner of a
PRIO_INHERIT
mutex is already blocked on another mutex of the same kind and it's priority gets bumped later, the last thread in the line won't be affected.
1static int lock_prio_inherit(tbthread_mutex_t *mutex)
2{
3 tbthread_t self = tbthread_self();
4
5 while(1) {
6 int locked = 0;
7 tb_futex_lock(&mutex->internal_futex);
8 if(mutex->futex == 0) {
9 locked = 1;
10 mutex->owner = self;
11 mutex->futex = 1;
12 tb_inherit_mutex_add(mutex);
13 }
14 else
15 tb_inherit_mutex_sched(mutex, self);
16 tb_futex_unlock(&mutex->internal_futex);
17 if(locked)
18 return 0;
19 SYSCALL3(__NR_futex, &mutex->futex, FUTEX_WAIT, 1);
20 }
21}
22
23static int unlock_prio_inherit(tbthread_mutex_t *mutex)
24{
25 tb_futex_lock(&mutex->internal_futex);
26 tb_inherit_mutex_unsched(mutex);
27 mutex->owner = 0;
28 mutex->futex = 0;
29 SYSCALL3(__NR_futex, &mutex->futex, FUTEX_WAKE, 1);
30 tb_futex_unlock(&mutex->internal_futex);
31 return 0;
32}
It was by far the most challenging part so far. See the patch at GitHub.
Remaining functions
pthread_setconcurrency
- It defines how many kernel tasks should be created to handle the user-level threads. It does not make sense in our case because we create a kernel task for every thread.pthread_attr_setscope
- It defines the set of threads against which the thread will compete for resources. There are two settings:PTHREAD_SCOPE_SYSTEM
meaning all the threads in the entire system andPTHREAD_SCOPE_PROCESS
meaning only the threads within the process. The man page says that Linux only supportsPTHREAD_SCOPE_SYSTEM
, but I am not sure whether it's still the case with all the cgroups stuff.