Content from 2016-05

Intro

My medium-term goal is to port my Silly Invaders game to a Real Time Operating System. Zephyr seems to be a good choice. It's open source, operates under the auspices of the Linux Foundation and has an active community with many developers from Intel committing the code.

They, unfortunately, do not support Tiva so I will need to port the OS before I can proceed with the application. I decided to buy the Freescale K64F board, which is supported, to familiarize myself a little with Zephyr before I start the porting work. The howto page for setting up K64F seems to be terribly complicated and requires a JTAG programmer. I summarize here a simpler way using cmsis-dap over USB.

Setup

I updated the MBED interface firmware following the instructions on this site. I also build my own OpenOCD from the head of the master branch using the following configuration options:

./configure --prefix=/home/ljanyst/Apps/openocd  --enable-cmsis-dap

Things may work fine with the stock firmware and the stock OpenOCD as well, but I did not try that. It's also probably a good idea to add the following udev rule so that you don't have to run things as root:

]==> cat /etc/udev/rules.d/99-openocd.rules
# frdm-k64f
ATTRS{idVendor}=="0d28", ATTRS{idProduct}=="0204", GROUP="plugdev", MODE="0660"
]==> sudo udevadm control --reload-rules

Hello world!

I use the ARM cross-compiler provided by Debian to compile Zephyr and then just copy the resulting binary to the MBED disk:

]==> cd samples/hello_world/nanokernel
]==> make BOARD=frdm_k64f CROSS_COMPILE=arm-none-eabi- CFLAGS=-O0
]==> cp outdir/zephyr.bin /media/ljanyst/MBED/

You can see the effects in the UART console using screen:

]==> screen /dev/ttyACM0 115200,cs8
Hello World!

I then run OpenOCD using the following script:

]==> cat k64f.cfg
set CHIPNAME k60
source [find target/kx.cfg]

$_TARGETNAME configure -event gdb-attach {
  halt
}

]==> openocd -s /home/ljanyst/Apps/openocd/share/openocd/scripts/ -c "interface cmsis-dap" -f k64f.cfg

And GDB:

]==> cat remote1.conf
target extended-remote :3333
monitor reset init
break main
continue
]==> arm-none-eabi-gdb  --command=remote1.conf outdir/zephyr.elf
...
Breakpoint 1 at 0x129c: file /home/ljanyst/Projects/zephyr/zephyr-project/samples/hello_world/nanokernel/src/main.c, line 37.
Note: automatically using hardware breakpoints for read-only addresses.

Breakpoint 1, main () at /home/ljanyst/Projects/zephyr/zephyr-project/samples/hello_world/nanokernel/src/main.c:37
37              PRINT("Hello World!\n");
(gdb) s
printk (fmt=0x2c90 "Hello World!\n") at /home/ljanyst/Projects/zephyr/zephyr-project/misc/printk.c:164
164             va_start(ap, fmt);
(gdb) s
165             _vprintk(fmt, ap);
(gdb) s
_vprintk (fmt=0x2c90 "Hello World!\n", ap=...) at /home/ljanyst/Projects/zephyr/zephyr-project/misc/printk.c:75
75              int might_format = 0; /* 1 if encountered a '%' */
(gdb) where
#0  _vprintk (fmt=0x2c90 "Hello World!\n", ap=...) at /home/ljanyst/Projects/zephyr/zephyr-project/misc/printk.c:75
#1  0x00001b46 in printk (fmt=0x2c90 "Hello World!\n") at /home/ljanyst/Projects/zephyr/zephyr-project/misc/printk.c:165
#2  0x000012a2 in main () at /home/ljanyst/Projects/zephyr/zephyr-project/samples/hello_world/nanokernel/src/main.c:37

The problem

I have been playing with one of my toy projects and ended up having to shuffle text many times in a file that is around 10MB long. A naïve implementation chopping a string and gluing it together again ended up being painfully slow:

$$ O(n \cdot k) $$

where n is the length of the string and k is the number of shuffling operations.

A solution

It turns out that the problem can be efficiently solved using splay trees, where each node holds:

  • the starting offset in the source string
  • the length of the substring it represents
  • the offset from the beginning of the substring taking into account all of the children on the left-hand-side
  • total length of the substring taking into account all the children on the right-hand side

The find operation looks for a node containing the offset and splits it in two, if the node does not start with the offset.

 1 static Node *Find(uint32_t offset, Node *root)
 2 {
 3   if(!root)
 4     return 0;
 5   if(offset >= root->offset && offset < (root->offset + root->length))
 6     return root;
 7   if(offset < root->offset)
 8     return Find(offset, root->left);
 9   return Find(offset-root->offset-root->length, root->right);
10 }
11 
12 static Node *STFind(uint32_t offset, Node *root)
13 {
14   Node *n = Find(offset, root);
15   Splay(n);
16 
17   if(n->offset < offset) {
18     Node *newNode = new Node();
19     newNode->start  = n->start;
20     newNode->length = offset - n->offset;
21     n->start  = newNode->start + newNode->length;
22     n->length -= newNode->length;
23     newNode->left = n->left;
24     update(newNode);
25     n->left = newNode;
26     update(n);
27   }
28   return n;
29 }

The actual shuffling boils down to splitting and merging sub-trees and producing the result to traversing the whole tree in-order.

 1 void process(int i, int j, int k)
 2 {
 3   Node *n, *t, *left, *right = 0;
 4   STSplit(left, n, i, pRoot);
 5   if(j+1 < pS.length())
 6     STSplit(n, right, j+1-i, n);
 7   t = STMerge(left, right);
 8   if(k < (t->offset + t->total_length))
 9     STSplit(left, right, k, t);
10   else {
11     right = 0;
12     left = t;
13   }
14   pRoot = STMerge(left, n);
15   pRoot = STMerge(pRoot, right);
16 }

The amortized complexity is:

$$ O(k \cdot log(k) + n) $$

See rope-splay.cpp.

A test

It all runs pretty nicely for a test case with 10M characters and 100k shuffle operations:

]==> ./testgen 10000000 100000 > test
]==> time ./rope-naive < test > test-o-naive
./rope-naive < test > test-o-naive  386.17s user 66.19s system 99% cpu 7:32.86 total
]==> time ./rope-splay < test > test-o-splay
./rope-splay < test > test-o-splay  0.71s user 0.04s system 99% cpu 0.752 total
]==> diff test-o-splay test-o-naive

Table of Contents

  1. Compiling and start-up code
  2. Hardware Abstraction Layer and UART
  3. Debugging, display, heap and fonts
  4. Timers and ADC
  5. DAC, Sound and Nokia Tunes
  6. Random Number Generator, Rendering Engine and the Game
  7. Operating System

Random Number Generator

To make the game more engaging, we introduce some randomness into it. We don't need anything cryptographically secure, so a Linear Congruential Generator will do just fine. We count the time from the start-up in millisecond-long jiffies and wait for a first button press to select the seed.

 1 void button_event(IO_io *io, uint16_t event)
 2 {
 3   uint64_t btn;
 4   IO_get(io, &btn);
 5   if(btn)
 6     button_value = 1;
 7 
 8   if(!rng_initialized) {
 9     rng_initialized = 1;
10     IO_rng_seed(IO_time());
11   }
12 }

Rendering Engine

The rendering engine takes a scene descriptor, a display device, and a timer. Based on this information it computes new positions of objects, draws them on the screen if necessary and checks for collisions.

 1 struct SI_scene {
 2   SI_object **objects;
 3   void      (*pre_render)(struct SI_scene *);
 4   void      (*collision)(SI_object *obj1, SI_object *obj2);
 5   uint8_t     fps;
 6   uint8_t     num_objects;
 7   uint8_t     flags;
 8 };
 9 
10 void SI_scene_render(SI_scene *scene, IO_io *display, IO_io *timer);

Each SI_scene holds a list of "polymorphic" objects that should be rendered, a pointer to a pre_render function that calculates a new position of each object, and a pointer to a collision callback that is invoked when the scene renderer detects an overlap between two objects. The SI_scene_render function runs after every interrupt:

1 while(1) {
2     SI_scene_render(&scenes[current_scene].scene, &display, &scene_timer);
3     IO_wait_for_interrupt();
4   }

Whether it gets executed or not, depends on the flag parameter of the scene. If it's set to SI_SCENE_IGNORE, the renderer returns immediately. On the other hand, if it's set to SI_SCENE_RENDER, the renderer calls the pre_render callback, draws the objects on the screen, and computes the object overlaps notifying the collision callback if necessary. After each frame, the scene is disabled (SI_SCENE_IGNORE). It is re-enabled by the timer interrupt in a time quantum that depends on the fps parameter.

See SI_scene.h and SI_scene.c.

Each object has a draw function that enables the renderer to draw it on the screen. There are three types of objects: a generic object, a bitmap object, and a text object:

 1 struct SI_object {
 2   uint16_t x;
 3   uint16_t y;
 4   uint16_t width;
 5   uint16_t height;
 6   uint8_t  flags;
 7   uint8_t  user_flags;
 8   void (*draw)(struct SI_object *this, IO_io *display);
 9 };
10 
11 struct SI_object_bitmap {
12   SI_object        obj;
13   const IO_bitmap *bmp;
14 };
15 
16 struct SI_object_text {
17   SI_object      obj;
18   const char    *text;
19   const IO_font *font;
20 };

The object array in the scene is initialized with the SI_object pointers:

1 static SI_object         score_obj;
2 static SI_object_bitmap  invader_obj[5];
3 scene->objects[1] = &score_obj;
4 scene->objects[i+5] = &invader_obj[i].obj;

See SI_scene_game.c.

The renderer calls the draw function of each SI_OBJECT_VISIBLE object:

1 obj->draw(obj, display);

Finally, each draw method uses the CONTAINER_OF macro to compute the pointer to the actual object of concrete type:

 1 #define CONTAINER_OF(TYPE, MEMBER, MEMBER_ADDR) \
 2   ((TYPE *) ( (char *)MEMBER_ADDR - offsetof(TYPE, MEMBER)))
 3 
 4 void SI_object_bitmap_draw(SI_object *obj, IO_io *display)
 5 {
 6   SI_object_bitmap *this = CONTAINER_OF(SI_object_bitmap, obj, obj);
 7   IO_display_print_bitmap(display, obj->x, obj->y, this->bmp);
 8 }
 9 
10 void SI_object_text_draw(SI_object *obj, IO_io *display)
11 {
12   SI_object_text *this = CONTAINER_OF(SI_object_text, obj, obj);
13   IO_display_set_font(display, this->font);
14   IO_display_cursor_goto(display, obj->x, obj->y);
15   IO_print(display, "%s", this->text);
16 }

The Game

All this seems to work pretty well when put together:

The Game

See silly-invaders.c.