Content tagged stdio

Intro

Writing this blog became increasingly tedious over time. The reason for this was the slowness of the rendering tool I use - coleslaw. It seemed to work well for other people, though, so I decided to investigate what I am doing wrong. The problem came from the fact that the code coloring implementation (which I co-wrote) spawned a Python process every time it received a code block to handle. The coloring itself was fast. Starting and stopping Python every time was the cause of the issue. A solution for this malady is fairly simple. You keep the Python process running at all times and communicate with it via standard IO.

Surprisingly enough, I could not find an easy and convenient way to do it. The dominant paradigm of uiop:run-program seems to be spawn-process-close, and it does not allow for easy access to the actual streams. sb-ext:run-program does hand me the stream objects that I need, but it's not portable. While reading the code of uiop trying to figure out how to extract the stream objects from run-program, I accidentally discovered uiop:launch-program which does exactly what I need in a portable manner. It was implemented in asdf-3.1.7.39 released on Dec 1st, 2016 (a month and a half ago!). This post is meant as a piece of documentation that can be indexed by search engines to help spread my happy discovery. :)

Python

The Python code reads commands from standard input and writes the responses to standard output. Both, commands and response headers are followed by newlines and an optional payload.

The commands are:

  • exit - what it does is self-evident
  • pygmentize|len|lang[|opts]:
    • len is the length of the code snippet
    • lang is the language to colorize
    • optional parameter opts is the configuration of the HTML formatter
    • after the newline, len utf-8 characters of the code block need to follow

There's only one response: colorized|len, followed by a newline and len utf-8 characters of the colorized code as an HTML snippet.

Python's automatic inference of standard IO's encoding is still pretty messed up, even in Python 3. It's a good idea to create wrapper objects and interact only with them:

1 input  = io.TextIOWrapper(sys.stdin.buffer,  encoding='utf-8')
2 output = io.TextIOWrapper(sys.stdout.buffer, encoding='utf-8')

Printing diagnostic messages to standard error output is useful for debugging:

1 def eprint(*args, **kwargs):
2     print(*args, file=sys.stderr, **kwargs)

Lisp

OK, I have a python script that does the coloring. Before I can use it, I need to tell ASDF about it and locate where it is in the filesystem. The former is done by using the :static-file qualifier in the :components list. The latter is a bit more complicated. Since the file's location is known relative to the lisp file it will be used with, it's doable.

1 (defvar *pygmentize-path*
2   (merge-pathnames "pygmentize.py"
3                    #.(or *compile-file-truename* *load-truename*))
4   "Path to the pygmentize script")

The trick here is to use #. to execute the statement at read-time. You can see the full explanation here.

With that out of the way, I can start the renderer with:

1 (defmethod start-concrete-renderer ((renderer (eql :pygments)))
2   (setf *pygmentize-process* (uiop:launch-program
3                               (list *python-command*
4                                     (namestring *pygmentize-path*))
5                               :input :stream
6                               :output :stream)))

For debugging purposes, it's useful to add :error-output "/tmp/debug", so that the diagnostics do not get eaten up by /dev/null.

To stop the process, we send it the exit command, flush the stream, and wait until the process dies:

1 (defmethod stop-concrete-renderer ((renderer (eql :pygments)))
2   (write-line "exit" (process-info-input *pygmentize-process*))
3   (force-output  (process-info-input *pygmentize-process*))
4   (wait-process *pygmentize-process*))

The Lisp part of the colorizer sends the pygmentize command together with the code snippet to Python and receives the colorized HTML:

 1 (defun pygmentize-code (lang params code)
 2   (let ((proc-input (process-info-input *pygmentize-process*))
 3         (proc-output (process-info-output *pygmentize-process*)))
 4     (write-line (format nil "pygmentize|~a|~a~@[|~a~]"
 5                         (length code) lang params)
 6                 proc-input)
 7     (write-string code proc-input)
 8     (force-output proc-input)
 9     (let ((nchars (parse-integer
10                    (nth 1
11                         (split-sequence #\| (read-line proc-output))))))
12       (coerce (loop repeat nchars
13                  for x = (read-char proc-output)
14                  collect x)
15               'string))))

See the entire pull request here.

Stats

I was able to get down from well over a minute to less that three seconds with the time it takes to generate this blog.

]==> time ./coleslaw-old.x /path/to/blog/
./coleslaw-old.x /path/to/blog/  66.40s user 6.19s system 98% cpu 1:13.55 total
]==> time ./coleslaw-new-no-renderer.x /path/to/blog/
./coleslaw-new-no-renderer.x /path/to/blog/  65.50s user 6.03s system 98% cpu 1:12.53 total
]==> time ./coleslaw-new-renderer.x /path/to/blog/
./coleslaw-new-renderer.x /path/to/blog/  2.78s user 0.27s system 106% cpu 2.849 total
  • coleslaw-old.x is the original code
  • coleslaw-new-no-renderer.x starts and stops the renderer with every code snippet
  • coleslaw-new-renderer.x starts the renderer beforehand and stops it after all the job is done