Content from 2016-12

Table of Contents

  1. Compiling and start-up code
  2. Hardware Abstraction Layer and UART
  3. Debugging, display, heap and fonts
  4. Timers and ADC
  5. DAC, Sound and Nokia Tunes
  6. Random Number Generator, Rendering Engine and the Game
  7. Operating System


The game code up until this point abuses timers a lot. It has a timer to handle rendering and to refresh the display, and a timer to change notes of a tune. These tasks are not very time sensitive. A couple of milliseconds delay here or there is not going to be noticeable to users. The timer interrupts are more appropriate for things like maintaining a sound wave of the proper frequency. A slight delay here lowers the quality of the user experience significantly.

We could, of course, do even more complex time management to handle both the graphics and the sound in one loop, but that would be painful. It's much nicer to have a scheduling system that can alternate between multiple threads of execution. It is what I will describe in this post.

Thread Control Block and the Stack

Since there's usually only one CPU, the threads need to share it. The easiest way to achieve time sharing is to have a fixed time slice at the end of which the system will switch to another thread. The systick interrupt perfect for this purpose. Not only is it invoked periodically, but it can also by requested manually by manipulating a register. This property will be useful in implementation of sleeping and blocking.

But first things first: we need to have a structure that will describe a thread, a. k. a. a Thread Control Block:

1 struct IO_sys_thread {
2   uint32_t             *stack_ptr;
3   uint32_t              flags;
4   void (*func)();
5   struct IO_sys_thread *next;
6   uint32_t              sleep;
7   IO_sys_semaphore     *blocker;
8   uint8_t               priority;
9 };
  • stack_ptr - points to the top of the thread's stack
  • flags - properties describing the thread; we will need just one to indicate whether the thread used the floating-point coprocessor
  • func - thread's entry point
  • next - pointer to the next thread in the queue (used for scheduling)
  • sleep - number of milliseconds the thread still needs to sleep
  • blocker - a pointer to a semaphore blocking the thread (if any)
  • priority - thread's priority

When invoking an interrupt handler, the CPU saves most of the running state of the current thread to the stack. Therefore, the task of the interrupt handler boils down to switching the stack pointers. The CPU will then pop all the registers back from the new stack. This behavior means that we need to do some initialization first:

 1 void IO_sys_stack_init(IO_sys_thread *thread, void (*func)(void *), void *arg,
 2   void *stack, uint32_t stack_size)
 3 {
 4   uint32_t sp1 = (uint32_t)stack;
 5   uint32_t sp2 = (uint32_t)stack;
 6   sp2 += stack_size;
 7   sp2 = (sp2 >> 3) << 3;          // the stack base needs to be 8-aligned
 8   if(sp1 % 4)
 9     sp1 = ((sp1 >> 2) << 2) + 4;  // make the end of the stack 4-aligned
10   stack_size = (sp2 - sp1) / 4;   // new size in double words
12   uint32_t *sp = (uint32_t *)sp1;
13   sp[stack_size-1] = 0x01000000;          // PSR with thumb bit
14   sp[stack_size-2] = (uint32_t)func;      // program counter
15   sp[stack_size-3] = 0xffffffff;          // link register
16   sp[stack_size-8] = (uint32_t)arg;       // r0 - the argument
17   thread->stack_ptr = &sp[stack_size-16]; // top of the stack
18 }

The ARM ABI requires that the top of the stack is 8-aligned and we will typically push and pop 4-byte words. The first part of the setup function makes sure that the stack boundaries are right. The second part sets the initial values of the registers. Have a look here for details.

  • the PSR register needs to have the Thumb bit switched on
  • we put the startup function address to the program counter
  • we put 0xffffffff to the link register to avoid confusing stack traces in GDB
  • r0 gets the argument to the startup function
  • an interrupt pushes 16 words worth of registers to the stack, so the initial value of the stack pointer needs to reflect that

This function is typically called as:

1 IO_sys_stack_init(thread, thread_wrapper, thread, stack, stack_size);

Note that we do not call the user thread function directly. Rather we have a wrapper function that gets the TBC as its argument. It is because we need to remove the thread from the scheduling queue if the user-specified function returns.

The context switcher

Let's now have a look at the code that does the actual context switching. Since it needs to operate directly on the stack, it needs to be written in assembly. It is not very complicated, though. What it does is:

  • pushing some registers to the stack
  • storing the current stack pointer in the stack_ptr variable of the current TCB
  • calling the scheduler to select the next thread
  • loading the stack pointer from the new thread's TCB
  • popping some registers from the new stack
 1 #define OFF_STACK_PTR 0
 2 #define OFF_FLAGS     4
 3 #define FLAG_FPU      0x01
 5   .thumb
 6   .syntax unified
 8   .global IO_sys_current
 9   .global IO_sys_schedule
11   .text
13   .global systick_handler
14   .type systick_handler STT_FUNC
15   .thumb_func
16   .align  2
17 systick_handler:
18   cpsid i                     ; disable interrupts
19   push  {r4-r11}              ; push r4-11
20   ldr   r0, =IO_sys_current   ; pointer to IO_sys_current to r1
21   ldr   r1, [r0]              ; r1 = OS_current
23   ubfx  r2, lr, #4, #1        ; extract the fourth bit from the lr register
24   cbnz  r2, .Lsave_stack      ; no FPU context to save
25   vstmdb sp!, {s16-s31}       ; push FPU registers, this triggers pushing of
26                               ; s0-s15
27   ldr   r2, [r1, #OFF_FLAGS]  ; load the flags
28   orr   r2, r2, #FLAG_FPU     ; set the FPU context flag
29   str   r2, [r1, #OFF_FLAGS]  ; store the flags
31 .Lsave_stack:
32   str   sp, [r1, #OFF_STACK_PTR] ; store the stack pointer at *OS_current
34   push  {r0, lr}              ; calling c code, so store r0 and the link
35                               ; register
36   bl    IO_sys_schedule       ; call the scheduler
37   pop   {r0, lr}              ; restore r0 and lr
39   ldr   r1, [r0]              ; load the new TCB pointer to r1
40   ldr   sp, [r1, #OFF_STACK_PTR] ; get the stack pointer of the new thread
42   orr   lr, lr, #0x10         ; clear the floating point flag in EXC_RETURN
43   ldr   r2, [r1, #OFF_FLAGS]  ; load the flags
44   tst   r2, #0x01             ; see if we have the FPU context
45   beq   .Lrestore_regs        ; no FPU context
46   vldmia sp!, {s16-s31}       ; pop the FPU registers
47   bic   lr, lr, #0x10         ; set the floating point flag in EXC_RETURN
49 .Lrestore_regs:
50   pop   {r4-r11}              ; restore regs r4-11
51   cpsie i                     ; enable interrupts
52   bx    lr                    ;  exit the interrupt, restore r0-r3, r12, lr, pc,
53                               ; psr

The only complication here is that we sometimes need to store the floating point registers in addition to the regular ones. It is, however, only necessary if the thread used the FPU. The fourth bit of EXC_RETURN, the value in the LR register, indicates the status of the FPU. Go here and here for more details. If the value of the bit is 0, we need to save the high floating-point registers to the stack and set the FPU flag in the TCB.

Also, after selecting the new thread, we check if its stack contains the FPU registers by checking the FPU flag in its TCB. If it does, we pop these registers and change EXC_RETURN accordingly.

The Lazy Stacking is taken care of by simply pushing and popping the high registers - it counts as an FPU operation.

Semaphores, sleeping and idling

We can now run threads and switch between them, but it would be useful to be able to put threads to sleep and make them wait for events.

Sleeping is easy. We just need to set the sleep field in the TCB of the current thread and make the scheduler ignore threads whenever their sleep field is not zero:

1 void IO_sys_sleep(uint32_t time)
2 {
3   IO_sys_current->sleep = time;
4   IO_sys_yield();
5 }

The ISR that handles the system time can loop over all threads and decrement this counter every millisecond.

Waiting for a semaphore works in a similar way. We mark the current thread as blocked:

 1 void IO_sys_wait(IO_sys_semaphore *sem)
 2 {
 3   IO_disable_interrupts();
 4   --*sem;
 5   if(*sem < 0) {
 6     IO_sys_current->blocker = sem;
 7     IO_sys_yield();
 8   }
 9   IO_enable_interrupts();
10 }

The purpose of IO_sys_yield is to indicate that the current thread does not need to run anymore and force a context switch. The function resets the systick counter and forces the interrupt:

1 void IO_sys_yield()
2 {
3   STCURRENT_REG = 0;          // clear the systick counter
4   INTCTRL_REG   = 0x04000000; // trigger systick
5 }

Waking a thread waiting for a semaphore is somewhat more complex:

 1 void IO_sys_signal(IO_sys_semaphore *sem)
 2 {
 3   IO_disable_interrupts();
 4   ++*sem;
 5   if(*sem <= 0 && threads) {
 6     IO_sys_thread *t;
 7     for(t = threads; t->blocker != sem; t = t->next);
 8     t->blocker = 0;
 9   }
10   IO_enable_interrupts();
11 }

If the value of the semaphore was negative, we find a thread that it was blocking and unblock it. It will make the scheduler consider this thread for running in the future.

None of the user-defined threads may be runnable at the time the scheduler makes its decision. All of them may be either sleeping or waiting for a semaphore. In that case, we need to keep the CPU occupied with something, i.e., we need a fake thread:

1 static void iddle_thread_func(void *arg)
2 {
3   (void)arg;
4   while(1) IO_wait_for_interrupt();
5 }


The system maintains a circular linked list of TCBs called threads. The job of the scheduler is to loop over this list and select the next thread to run. It places its selection in a global variable called IO_sys_current so that other functions may access it.

 1 void IO_sys_schedule()
 2 {
 3   if(!threads) {
 4     IO_sys_current = &iddle_thread;
 5     return;
 6   }
 8   IO_sys_thread *stop = IO_sys_current->next;
10   if(IO_sys_current == &iddle_thread)
11     stop = threads;
13   IO_sys_thread *cur  = stop;
14   IO_sys_thread *sel  = 0;
15   int            prio = 266;
17   do {
18     if(!cur->sleep && !cur->blocker && cur->priority < prio) {
19       sel = cur;
20       prio = sel->priority;
21     }
22     cur = cur->next;
23   }
24   while(cur != stop);
26   if(!sel)
27     sel = &iddle_thread;
29   IO_sys_current = sel;
30 }

This scheduler is simple:

  • whenever there is nothing to run, select the idle thread
  • otherwise select the next highest priority thread that is neither sleeping nor blocked on a semaphore

Starting up the beast

So how do we get this whole business running? We need to invoke the scheduler that will preempt the current thread and select the next one to run. The problem is that we're running using the stack provided by the bootstrap code and don't have a TCB. Nothing prevents us from creating a dummy one, though. We can create it on the current stack (it's useful only once) and point it to the beginning of our real queue of TCBs:

1 IO_sys_thread dummy;
2 = threads;
3 IO_sys_current = &dummy;

We then set the systick up:

1 STCTRL_REG     = 0;            // turn off
2 STCURRENT_REG  = 0;            // reset
3 SYSPRI3_REG   |= 0xE0000000;   // priority 7
4 STRELOAD_REG   = time_slice-1; // reload value

And force its interrupt:

1 IO_sys_yield();
2 IO_enable_interrupts();


Tests 11 and 12 run a dummy calculation for some time and then return. After this happens, the system can only run the idle thread. If we plug-in the profiler code, we can observe the timings on a logic analyzer:

Test #11
Test #11

Test 13 is more complicated than the two previous ones. Three threads are running in a loop, sleeping, and signaling semaphores. Two more threads are waiting for these semaphores, changing some local variables and signaling other semaphores. Finally, there is the writer thread that blocks on the last set of semaphores and displays the current state of the environment. The output from the logic analyzer shows that the writer thread needs around 3.3 time-slices to refresh the screen:

Test #13
Test #13

Silly Invaders

How all this makes Silly Invaders better? The main advantage is that we don't need to calculate complex timings for multiple functions of the program. We create two threads, one for rendering of the scene and another one for playing the music tune. Each thread cares about its own timing. Everything else takes care of itself with good enough time guarantees.

 1 IO_sys_thread game_thread;
 2 void game_thread_func()
 3 {
 4   while(1) {
 5     SI_scene_render(&scenes[current_scene].scene, &display);
 6     IO_sys_sleep(1000/scenes[current_scene].scene.fps);
 7   }
 8 }
10 IO_sys_thread sound_thread;
11 void sound_thread_func()
12 {
13   IO_sound_player_run(&sound_player);
14 }

The threads are registered with the system in the main function:

1 IO_sys_thread_add(&game_thread,  game_thread_func,  2000, 255);
2 IO_sys_thread_add(&sound_thread, sound_thread_func, 1000, 255);
3 IO_sys_run(1000);

For the complete code see:


There is a great course on EdX called Realtime Bluetooth Networks explaining the topic in more details. I highly recommend it.