Content tagged lisp

Intro

Writing this blog became increasingly tedious over time. The reason for this was the slowness of the rendering tool I use - coleslaw. It seemed to work well for other people, though, so I decided to investigate what I am doing wrong. The problem came from the fact that the code coloring implementation (which I co-wrote) spawned a Python process every time it received a code block to handle. The coloring itself was fast. Starting and stopping Python every time was the cause of the issue. A solution for this malady is fairly simple. You keep the Python process running at all times and communicate with it via standard IO.

Surprisingly enough, I could not find an easy and convenient way to do it. The dominant paradigm of uiop:run-program seems to be spawn-process-close, and it does not allow for easy access to the actual streams. sb-ext:run-program does hand me the stream objects that I need, but it's not portable. While reading the code of uiop trying to figure out how to extract the stream objects from run-program, I accidentally discovered uiop:launch-program which does exactly what I need in a portable manner. It was implemented in asdf-3.1.7.39 released on Dec 1st, 2016 (a month and a half ago!). This post is meant as a piece of documentation that can be indexed by search engines to help spread my happy discovery. :)

Python

The Python code reads commands from standard input and writes the responses to standard output. Both, commands and response headers are followed by newlines and an optional payload.

The commands are:

  • exit - what it does is self-evident
  • pygmentize|len|lang[|opts]:
    • len is the length of the code snippet
    • lang is the language to colorize
    • optional parameter opts is the configuration of the HTML formatter
    • after the newline, len utf-8 characters of the code block need to follow

There's only one response: colorized|len, followed by a newline and len utf-8 characters of the colorized code as an HTML snippet.

Python's automatic inference of standard IO's encoding is still pretty messed up, even in Python 3. It's a good idea to create wrapper objects and interact only with them:

1 input  = io.TextIOWrapper(sys.stdin.buffer,  encoding='utf-8')
2 output = io.TextIOWrapper(sys.stdout.buffer, encoding='utf-8')

Printing diagnostic messages to standard error output is useful for debugging:

1 def eprint(*args, **kwargs):
2     print(*args, file=sys.stderr, **kwargs)

Lisp

OK, I have a python script that does the coloring. Before I can use it, I need to tell ASDF about it and locate where it is in the filesystem. The former is done by using the :static-file qualifier in the :components list. The latter is a bit more complicated. Since the file's location is known relative to the lisp file it will be used with, it's doable.

1 (defvar *pygmentize-path*
2   (merge-pathnames "pygmentize.py"
3                    #.(or *compile-file-truename* *load-truename*))
4   "Path to the pygmentize script")

The trick here is to use #. to execute the statement at read-time. You can see the full explanation here.

With that out of the way, I can start the renderer with:

1 (defmethod start-concrete-renderer ((renderer (eql :pygments)))
2   (setf *pygmentize-process* (uiop:launch-program
3                               (list *python-command*
4                                     (namestring *pygmentize-path*))
5                               :input :stream
6                               :output :stream)))

For debugging purposes, it's useful to add :error-output "/tmp/debug", so that the diagnostics do not get eaten up by /dev/null.

To stop the process, we send it the exit command, flush the stream, and wait until the process dies:

1 (defmethod stop-concrete-renderer ((renderer (eql :pygments)))
2   (write-line "exit" (process-info-input *pygmentize-process*))
3   (force-output  (process-info-input *pygmentize-process*))
4   (wait-process *pygmentize-process*))

The Lisp part of the colorizer sends the pygmentize command together with the code snippet to Python and receives the colorized HTML:

 1 (defun pygmentize-code (lang params code)
 2   (let ((proc-input (process-info-input *pygmentize-process*))
 3         (proc-output (process-info-output *pygmentize-process*)))
 4     (write-line (format nil "pygmentize|~a|~a~@[|~a~]"
 5                         (length code) lang params)
 6                 proc-input)
 7     (write-string code proc-input)
 8     (force-output proc-input)
 9     (let ((nchars (parse-integer
10                    (nth 1
11                         (split-sequence #\| (read-line proc-output))))))
12       (coerce (loop repeat nchars
13                  for x = (read-char proc-output)
14                  collect x)
15               'string))))

See the entire pull request here.

Stats

I was able to get down from well over a minute to less that three seconds with the time it takes to generate this blog.

]==> time ./coleslaw-old.x /path/to/blog/
./coleslaw-old.x /path/to/blog/  66.40s user 6.19s system 98% cpu 1:13.55 total
]==> time ./coleslaw-new-no-renderer.x /path/to/blog/
./coleslaw-new-no-renderer.x /path/to/blog/  65.50s user 6.03s system 98% cpu 1:12.53 total
]==> time ./coleslaw-new-renderer.x /path/to/blog/
./coleslaw-new-renderer.x /path/to/blog/  2.78s user 0.27s system 106% cpu 2.849 total
  • coleslaw-old.x is the original code
  • coleslaw-new-no-renderer.x starts and stops the renderer with every code snippet
  • coleslaw-new-renderer.x starts the renderer beforehand and stops it after all the job is done

Within a couple weeks of learning Lisp I found programming in any other language unbearably constraining. Paul Graham

The more I play with Lisp, the more I agree with the statement above. The truth is, however, that the barrier to entry is incredibly high for Lisp. This is probably due to the vicious circle: there is fairly few people programming in it in comparison to other languages, so there is relatively little documentation over the Internet; the lack of documentation, in turn, makes it hard for new people to start using Lisp. The effect of this is twofold. If you search for the answers, they are likely not readily available, except in the code itself - this may be pretty frustrating and cause projects to move slowly. On the other hand, the people who make it past the obstacles are pretty good at what they do, so the community is great and the quality of the work they do can be mind-blowing.

But let's get back to the beginning. I want to start writing web apps in Common Lisp. After some research, it turned out that people recommend using one of the frameworks based on Clack for this purpose. I went to the web site, found an example and typed it into repl.

1 (ql:quickload :clack)
2 
3 (clack:clackup
4   (lambda (env)
5     '(200
6       (:content-type "text/plain")
7       ("Hello, Clack!"))))

Within seconds I had Hunchentoot running and all was fine and dandy. It was, in fact, much easier than what I typically have to do when I am dealing with Python. Before I could go any further, though, I needed to figure out how I deploy this in production in my hosting environment, with Apache as the web server. And here's where the fun begins. :) Apparently, I can run Clack in FastCGI mode, so, in principle, I can just pass :server :fcgi to clackup and be all set. Not so, unfortunately. Clack's fastcgi module, by default, opens a TCP listening socket and expects the web server to connect to it. Apache, however, can communicate with FastCGI only via Unix domain sockets. This is not a problem, because I can pass an :fd parameter to clackup telling it to communicate over this descriptor instead of a TCP connection. All I need to do, is to figure out the path to the socket, open it, and pass the descriptor to Clack.

So, how do you figure out where Apache keeps it's FastCGI unix sockets? The documentation won't tell you, and even if it does, how do you know which one you should connect to? Let's look at the relevant part of the source of Apache's mod_fcgid:

1 || (rv = apr_os_file_put(&file, &unix_socket, 0,
2                                  procnode->proc_pool)) != APR_SUCCESS
3         || (rv = apr_procattr_child_in_set(procattr, file, NULL)) != APR_SUCCESS) {
4         ap_log_error(APLOG_MARK, APLOG_ERR, rv, procinfo->main_server,
5                      "mod_fcgid: couldn't set child process attributes: %s",
6                      unix_addr.sun_path);
7         close(unix_socket);
8         return rv;

It seems that Apache will spawn a child process, replace it's stdin with the unix socket and execve the child's executable. All I need to do then, is pass :fd 0 to Clack to make things work. And indeed they do. After compiling the script below, I could see "Hello, Clack!" with Apache.

 1 (ql:quickload 'clack :silent t)
 2 (ql:quickload 'lack-middleware-backtrace :silent t)
 3 (ql:quickload 'clack-handler-fcgi :silent t)
 4 
 5 (defun main ()
 6   (clack:clackup
 7     (lambda (env)
 8       '(200
 9         (:content-type "text/plain")
10         ("Hello, Clack!")))
11    :server :fcgi
12    :use-thread nil
13    :fd 0))
14 
15 (sb-ext:save-lisp-and-die "test.fcgi" :toplevel #'main :executable t)

This is a fairly trivial thing and it still took me around half an hour to figure out. I am a fairly experienced engineer, but this would likely be an impossible obstacle for someone rather new to programming. By comparison, you can write a web "Hello, World!" in Python or Ruby within minutes without much prior experience.

Intro

Following a piece of advice from a friend, I decided to buy this new domain name and start writing down all the cool things I do. I have written a bit in a bunch of other places before and have other quasi-failed blogs, so I actually already have a bit of content to bootstrap this one.

There's plenty of blogware options out there, but, as a programmer, I like the ones that keep the content in a version control system intended for software. From the alternatives available in this department, I decided to go for c()╬╗eslaw. It's kind of similar to Jekyll, which I have used before, and it's written in Common Lisp, which, typically, is a good sign in general.

There is very little instruction over the Internet on how to use it, but it's not hard to figure out after reading the code. This post is a brief summary of what I have done to create this website and convert the content from Jekyll.

Site Structure

The first thing that you need to do is to create a .coleslawrc file describing the layout of the site, the theme to be used to render the final HTML, and other such things. There's a good example here and you can get the full picture by reading the source. :) I like to change the separator (:separator "---"), so that --- is used to distinguish the metadata from the content section in source files, this makes things look the Jekyll way. The static-pages plugin, makes it possible to create content other that blog posts and indices.

Coleslaw will search the repo for files ending with .post (and .page if the static-pages plugin is enabled) and run them through the renderer selected in the page's metadata section. It will generate the indices automatically and copy verbatim everything it finds in the static directory.

You can create our own theme following the rules described here or choose something from the built-in options. I built the theme you see here more or less from scratch using Bootstrap and the live customizer to tweak the colors. It was a fairly easy and pleasant exercise.

In the end, the resulting directory structure looks roughly like this:

==> find
./.coleslawrc
./about.page
./pictures.page
./talks.page
./posts/0027-blogging-with-coleslaw.post
...
./static/pictures/pic_0001.jpg
...
./static/scripts/jquery.min.js
./static/images/profile.jpg
./static/images/favicon.png
...
./plugins/markdown.lisp
./plugins/preprocessor.lisp
./plugins/deploy-rsync.lisp
./themes/jany-st/base.tmpl
./themes/jany-st/index.tmpl
./themes/jany-st/post.tmpl
./themes/jany-st/js/bootstrap.min.js
./themes/jany-st/css/bootstrap.min.css
./themes/jany-st/css/custom.css
./themes/jany-st/css/syntax.css

The first few lines of the post you are reading right now look like this:

---
title: Blogging with Coleslaw
date: 2015-12-07
tags: blogging, lisp, programming, linux, sbcl
format: md
---

Intro
-----

Following a piece of advice from a friend, I decided to by this new domain name

Patches

Coleslaw and the packages it depends on work pretty well to begin with, but I made a couple of improvements to make them fit my particular tastes better:

  1. Some themes and plugins are site specific and cannot be generalized. There is very little point in keeping them in the coleslaw source tree when they really belong with the site content. I submitted patches to make it possible to define themes and plugins in the content repo. See PR-98 and PR-101.
  2. I like to have the HTML files named in a certain way in the resulting web site, so it's convenient for me to be able to specify lambdas in .coleslawrc mapping the content metadata to file names. I made a pull request to allow that (PR-100), but Brit, the maintainer of coleslaw, has different ideas on how to approach this problem.
  3. I think pygments have no real competition if it comes to coloring source code, so I made changes to 3bmd - the markdown rendering library used by coleslaw - allowing it to use pygments. See PR-24.
  4. It's nice to be able to control how the rendered HTML tables look. In order to do that, you need to be able to specify the css class for the table. See PR-25.

Customization

3bmd makes it fairly easy to customize how the final HTML is rendered. For instance, you can change the resulting markup for images by defining a method :around print-tagged-element. I want the images on this web site to have frames and captions, so I did this:

 1 (defmethod print-tagged-element :around ((tag (eql :image)) stream rest)
 2   (setf rest (cdr (first rest)))
 3   (let ((fmt (concatenate 'string
 4                           "<div class=\"center-wrapper\">"
 5                           "  <div class=\"img\">"
 6                           "    <img src=\"~a\" ~@[alt=\"~a\"~] />"
 7                           "    <div class=\"img-caption\">~@[~a~]</div>"
 8                           "  </div>"
 9                           "</div>"))
10         (caption (with-output-to-string (s)
11                    (mapcar (lambda (a) (print-element a s))
12                            (getf rest :label)))))
13     (format stream
14             fmt
15             (getf rest :source)
16             caption
17             caption)))

Being able to use $config.domain and other variables in the markdown makes it possible to define relative paths to images and other resources. This comes handy if you want to test the web site using different locations. In order to acheve this you can define a method :around render-text in the following way:

 1 (defmethod render-text :around (text format)
 2   (let ((processed
 3          (reduce #'funcall
 4                  (list
 5                   #'process-embeds
 6                   (lambda (text)
 7                     (regex-replace-all "{\\\$config.domain}"
 8                                        text
 9                                        (domain *config*)))
10                   (lambda (text)
11                     (regex-replace-all "{\\\$config.repo-dir}"
12                                        text
13                                        (namestring (repo-dir *config*))))
14                   text)
15                  :from-end t)))
16     (call-next-method processed format)))

Deployment

I use DreamHost for my web hosting and want to use sbcl as the lisp implementation. Unfortunately, all of my attempts to run sbcl there ended up with error messages like this one:

mmap: wanted 1040384 bytes at 0x20000000, actually mapped at 0x3cfc6467000
ensure_space: failed to validate 1040384 bytes at 0x20000000
(hint: Try "ulimit -a"; maybe you should increase memory limits.)

After some investigation, it turned out that DreamHost uses grsecurity kernel patches and, it looks like, their implementation of ASLR (Address Space Layout Randomization) does not respect the ADDR_NO_RANDOMIZE personality that is indeed set by sbcl at startup. They still allow the memory to be mapped at a specific location, which is a requirement for sbcl, if the MAP_FIXED flag is passed to mmap. The patch fixing this problem was a fairly simple one once I figured out what's going on. It looks like it will be included in sbcl 1.3.2. Until then, you will have to recompile the sources yourself.

Let's see if we get a speedup if we compile the code. The snippets below list the contents of col1.lisp and col2.lisp respectively:

(require 'coleslaw)
(coleslaw:main "/path/to/repo/")
(uiop:quit)
(require 'coleslaw)
(defun main () (coleslaw:main (nth 1 *posix-argv*)))
(sb-ext:save-lisp-and-die "coleslaw.x" :toplevel #'main :executable t)

And this is what you get:

]==> time sbcl --noinform --load col1.lisp
sbcl --load col2.lisp  6.39s user 1.05s system 97% cpu 7.609 total

]==> sbcl --noinform --load col2.lisp
[undoing binding stack and other enclosing state... done]
[saving current Lisp image into coleslaw.x:
writing 4944 bytes from the read-only space at 0x20000000
writing 3168 bytes from the static space at 0x20100000
writing 85229568 bytes from the dynamic space at 0x1000000000
done]

]==> time ./coleslaw.x /path/to/repo/
./coleslaw.x /path/to/repo/  3.37s user 0.74s system 95% cpu 4.304 total

]==> du -sh ./coleslaw.x
83M     ./coleslaw.x

The compiled code runs almost twice as fast, but the executable weights 83M!

I wrote the following post-receive hook in order to have the site rendered automatically every time I push the new content to the master branch of the repo.

 1 CLONE_DIR=`mktemp -d`
 2 
 3 echo "Cloning the repository..."
 4 git clone $PWD $CLONE_DIR > /dev/null | exit 1
 5 
 6 while read oldrev newrev refname; do
 7   if [ $refname = "refs/heads/master" ]; then
 8     echo "Running coleslaw..."
 9     coleslaw.x $CLONE_DIR/ > /dev/null
10   fi
11 done
12 
13 rm -rf $CLONE_DIR

Conclusion

Building this web site was quite an instructive experience, especially that it was my first non-toy project done in Common Lisp. It showed me how easy it is to use and hack on CL projects and how handy QuickLisp is. There's plenty of good libraries around and, if they have areas in which they are lacking, it's quite a bit of fun to fill the gaps. The library environment definitely is not as mature as the one of Python or Ruby, so new users may find it difficult, but, overall, I think it's worth it to spend the time getting comfortable with Common Lisp. I finally feel emotionally prepared to go through Peter Norvig's Paradigms of Artificial Intelligence Programming. :)