Table of Contents
- Compiling and degubbing code for Tiva
- Hardware Abstraction Layer
- Debugging, heap, and display
- Analog Digital Converter and Timers
- Playing Nokia Tunes
- Random Number Generator, Rendering Engine, and the Game
- A real-time operating system
Intro
The game code up until this point abuses timers a lot. It has a timer to handle rendering and to refresh the display, and a timer to change notes of a tune. These tasks are not very time sensitive. A couple of milliseconds delay here or there is not going to be noticeable to users. The timer interrupts are more appropriate for things like maintaining a sound wave of the proper frequency. A slight delay here lowers the quality of the user experience significantly.
We could, of course, do even more complex time management to handle both the graphics and the sound in one loop, but that would be painful. It's much nicer to have a scheduling system that can alternate between multiple threads of execution. It is what I will describe in this post.
Thread Control Block and the Stack
Since there's usually only one CPU, the threads need to share it. The easiest
way to achieve time sharing is to have a fixed time slice at the end of which
the system will switch to another thread. The systick
interrupt perfect for
this purpose. Not only is it invoked periodically, but it can also by requested
manually by manipulating a register. This property will be useful in
implementation of sleeping and blocking.
But first things first: we need to have a structure that will describe a thread, a. k. a. a Thread Control Block:
1struct IO_sys_thread {
2 uint32_t *stack_ptr;
3 uint32_t flags;
4 void (*func)();
5 struct IO_sys_thread *next;
6 uint32_t sleep;
7 IO_sys_semaphore *blocker;
8 uint8_t priority;
9};
stack_ptr
- points to the top of the thread's stackflags
- properties describing the thread; we will need just one to indicate whether the thread used the floating-point coprocessorfunc
- thread's entry pointnext
- pointer to the next thread in the queue (used for scheduling)sleep
- number of milliseconds the thread still needs to sleepblocker
- a pointer to a semaphore blocking the thread (if any)priority
- thread's priority
When invoking an interrupt handler, the CPU saves most of the running state of the current thread to the stack. Therefore, the task of the interrupt handler boils down to switching the stack pointers. The CPU will then pop all the registers back from the new stack. This behavior means that we need to do some initialization first:
1void IO_sys_stack_init(IO_sys_thread *thread, void (*func)(void *), void *arg,
2 void *stack, uint32_t stack_size)
3{
4 uint32_t sp1 = (uint32_t)stack;
5 uint32_t sp2 = (uint32_t)stack;
6 sp2 += stack_size;
7 sp2 = (sp2 >> 3) << 3; // the stack base needs to be 8-aligned
8 if(sp1 % 4)
9 sp1 = ((sp1 >> 2) << 2) + 4; // make the end of the stack 4-aligned
10 stack_size = (sp2 - sp1) / 4; // new size in double words
11
12 uint32_t *sp = (uint32_t *)sp1;
13 sp[stack_size-1] = 0x01000000; // PSR with thumb bit
14 sp[stack_size-2] = (uint32_t)func; // program counter
15 sp[stack_size-3] = 0xffffffff; // link register
16 sp[stack_size-8] = (uint32_t)arg; // r0 - the argument
17 thread->stack_ptr = &sp[stack_size-16]; // top of the stack
18}
The ARM ABI requires that the top of the stack is 8-aligned and we will typically push and pop 4-byte words. The first part of the setup function makes sure that the stack boundaries are right. The second part sets the initial values of the registers. Have a look here for details.
- the
PSR
register needs to have the Thumb bit switched on - we put the startup function address to the program counter
- we put
0xffffffff
to the link register to avoid confusing stack traces in GDB r0
gets the argument to the startup function- an interrupt pushes 16 words worth of registers to the stack, so the initial value of the stack pointer needs to reflect that
This function is typically called as:
1IO_sys_stack_init(thread, thread_wrapper, thread, stack, stack_size);
Note that we do not call the user thread function directly. Rather we have a wrapper function that gets the TBC as its argument. It is because we need to remove the thread from the scheduling queue if the user-specified function returns.
The context switcher
Let's now have a look at the code that does the actual context switching. Since it needs to operate directly on the stack, it needs to be written in assembly. It is not very complicated, though. What it does is:
- pushing some registers to the stack
- storing the current stack pointer in the
stack_ptr
variable of the current TCB - calling the scheduler to select the next thread
- loading the stack pointer from the new thread's TCB
- popping some registers from the new stack
1#define OFF_STACK_PTR 0
2#define OFF_FLAGS 4
3#define FLAG_FPU 0x01
4
5 .thumb
6 .syntax unified
7
8 .global IO_sys_current
9 .global IO_sys_schedule
10
11 .text
12
13 .global systick_handler
14 .type systick_handler STT_FUNC
15 .thumb_func
16 .align 2
17systick_handler:
18 cpsid i ; disable interrupts
19 push {r4-r11} ; push r4-11
20 ldr r0, =IO_sys_current ; pointer to IO_sys_current to r1
21 ldr r1, [r0] ; r1 = OS_current
22
23 ubfx r2, lr, #4, #1 ; extract the fourth bit from the lr register
24 cbnz r2, .Lsave_stack ; no FPU context to save
25 vstmdb sp!, {s16-s31} ; push FPU registers, this triggers pushing of
26 ; s0-s15
27 ldr r2, [r1, #OFF_FLAGS] ; load the flags
28 orr r2, r2, #FLAG_FPU ; set the FPU context flag
29 str r2, [r1, #OFF_FLAGS] ; store the flags
30
31.Lsave_stack:
32 str sp, [r1, #OFF_STACK_PTR] ; store the stack pointer at *OS_current
33
34 push {r0, lr} ; calling c code, so store r0 and the link
35 ; register
36 bl IO_sys_schedule ; call the scheduler
37 pop {r0, lr} ; restore r0 and lr
38
39 ldr r1, [r0] ; load the new TCB pointer to r1
40 ldr sp, [r1, #OFF_STACK_PTR] ; get the stack pointer of the new thread
41
42 orr lr, lr, #0x10 ; clear the floating point flag in EXC_RETURN
43 ldr r2, [r1, #OFF_FLAGS] ; load the flags
44 tst r2, #0x01 ; see if we have the FPU context
45 beq .Lrestore_regs ; no FPU context
46 vldmia sp!, {s16-s31} ; pop the FPU registers
47 bic lr, lr, #0x10 ; set the floating point flag in EXC_RETURN
48
49.Lrestore_regs:
50 pop {r4-r11} ; restore regs r4-11
51 cpsie i ; enable interrupts
52 bx lr ; exit the interrupt, restore r0-r3, r12, lr, pc,
53 ; psr
The only complication here is that we sometimes need to store the floating point
registers in addition to the regular ones. It is, however, only necessary if the
thread used the FPU. The fourth bit of EXC_RETURN
, the value in the LR
register, indicates the status of the FPU. Go here and here for more
details. If the value of the bit is 0
, we need to save the high floating-point
registers to the stack and set the FPU flag in the TCB.
Also, after selecting the new thread, we check if its stack contains the FPU
registers by checking the FPU flag in its TCB. If it does, we pop these
registers and change EXC_RETURN
accordingly.
The Lazy Stacking is taken care of by simply pushing and popping the high registers - it counts as an FPU operation.
Semaphores, sleeping and idling
We can now run threads and switch between them, but it would be useful to be able to put threads to sleep and make them wait for events.
Sleeping is easy. We just need to set the sleep
field in the TCB of the
current thread and make the scheduler ignore threads whenever their sleep
field is not zero:
1void IO_sys_sleep(uint32_t time)
2{
3 IO_sys_current->sleep = time;
4 IO_sys_yield();
5}
The ISR that handles the system time can loop over all threads and decrement this counter every millisecond.
Waiting for a semaphore works in a similar way. We mark the current thread as blocked:
1void IO_sys_wait(IO_sys_semaphore *sem)
2{
3 IO_disable_interrupts();
4 --*sem;
5 if(*sem < 0) {
6 IO_sys_current->blocker = sem;
7 IO_sys_yield();
8 }
9 IO_enable_interrupts();
10}
The purpose of IO_sys_yield
is to indicate that the current thread does not
need to run anymore and force a context switch. The function resets the
systick
counter and forces the interrupt:
1void IO_sys_yield()
2{
3 STCURRENT_REG = 0; // clear the systick counter
4 INTCTRL_REG = 0x04000000; // trigger systick
5}
Waking a thread waiting for a semaphore is somewhat more complex:
1void IO_sys_signal(IO_sys_semaphore *sem)
2{
3 IO_disable_interrupts();
4 ++*sem;
5 if(*sem <= 0 && threads) {
6 IO_sys_thread *t;
7 for(t = threads; t->blocker != sem; t = t->next);
8 t->blocker = 0;
9 }
10 IO_enable_interrupts();
11}
If the value of the semaphore was negative, we find a thread that it was blocking and unblock it. It will make the scheduler consider this thread for running in the future.
None of the user-defined threads may be runnable at the time the scheduler makes its decision. All of them may be either sleeping or waiting for a semaphore. In that case, we need to keep the CPU occupied with something, i.e., we need a fake thread:
1static void iddle_thread_func(void *arg)
2{
3 (void)arg;
4 while(1) IO_wait_for_interrupt();
5}
Scheduler
The system maintains a circular linked list of TCBs called threads
. The job of
the scheduler is to loop over this list and select the next thread to run. It
places its selection in a global variable called IO_sys_current
so that other
functions may access it.
1void IO_sys_schedule()
2{
3 if(!threads) {
4 IO_sys_current = &iddle_thread;
5 return;
6 }
7
8 IO_sys_thread *stop = IO_sys_current->next;
9
10 if(IO_sys_current == &iddle_thread)
11 stop = threads;
12
13 IO_sys_thread *cur = stop;
14 IO_sys_thread *sel = 0;
15 int prio = 266;
16
17 do {
18 if(!cur->sleep && !cur->blocker && cur->priority < prio) {
19 sel = cur;
20 prio = sel->priority;
21 }
22 cur = cur->next;
23 }
24 while(cur != stop);
25
26 if(!sel)
27 sel = &iddle_thread;
28
29 IO_sys_current = sel;
30}
This scheduler is simple:
- whenever there is nothing to run, select the idle thread
- otherwise select the next highest priority thread that is neither sleeping nor blocked on a semaphore
Starting up the beast
So how do we get this whole business running? We need to invoke the scheduler that will preempt the current thread and select the next one to run. The problem is that we're running using the stack provided by the bootstrap code and don't have a TCB. Nothing prevents us from creating a dummy one, though. We can create it on the current stack (it's useful only once) and point it to the beginning of our real queue of TCBs:
1IO_sys_thread dummy;
2dummy.next = threads;
3IO_sys_current = &dummy;
We then set the systick
up:
1STCTRL_REG = 0; // turn off
2STCURRENT_REG = 0; // reset
3SYSPRI3_REG |= 0xE0000000; // priority 7
4STRELOAD_REG = time_slice-1; // reload value
And force its interrupt:
1IO_sys_yield();
2IO_enable_interrupts();
Tests
Tests 11 and 12 run a dummy calculation for some time and then return. After this happens, the system can only run the idle thread. If we plug-in the profiler code, we can observe the timings on a logic analyzer:
Test 13 is more complicated than the two previous ones. Three threads are running in a loop, sleeping, and signaling semaphores. Two more threads are waiting for these semaphores, changing some local variables and signaling other semaphores. Finally, there is the writer thread that blocks on the last set of semaphores and displays the current state of the environment. The output from the logic analyzer shows that the writer thread needs around 3.3 time-slices to refresh the screen:
Silly Invaders
How all this makes Silly Invaders better? The main advantage is that we don't need to calculate complex timings for multiple functions of the program. We create two threads, one for rendering of the scene and another one for playing the music tune. Each thread cares about its own timing. Everything else takes care of itself with good enough time guarantees.
1IO_sys_thread game_thread;
2void game_thread_func()
3{
4 while(1) {
5 SI_scene_render(&scenes[current_scene].scene, &display);
6 IO_sys_sleep(1000/scenes[current_scene].scene.fps);
7 }
8}
9
10IO_sys_thread sound_thread;
11void sound_thread_func()
12{
13 IO_sound_player_run(&sound_player);
14}
The threads are registered with the system in the main
function:
1IO_sys_thread_add(&game_thread, game_thread_func, 2000, 255);
2IO_sys_thread_add(&sound_thread, sound_thread_func, 1000, 255);
3IO_sys_run(1000);
For the complete code see:
Referrences
There is a great course on EdX called Realtime Bluetooth Networks explaining the topic in more details. I highly recommend it.
If you like this kind of content, you can subscribe to my newsletter, follow me on Twitter, or subscribe to my RSS channel.