Content tagged pygments
Writing this blog became increasingly tedious over time. The reason for this was the slowness of the rendering tool I use - coleslaw. It seemed to work well for other people, though, so I decided to investigate what I am doing wrong. The problem came from the fact that the code coloring implementation (which I co-wrote) spawned a Python process every time it received a code block to handle. The coloring itself was fast. Starting and stopping Python every time was the cause of the issue. A solution for this malady is fairly simple. You keep the Python process running at all times and communicate with it via standard IO.
Surprisingly enough, I could not find an easy and convenient way to do it. The
dominant paradigm of
uiop:run-program seems to be spawn-process-close, and it
does not allow for easy access to the actual streams.
hand me the stream objects that I need, but it's not portable. While reading the
code of uiop trying to figure out how to extract the stream objects from
run-program, I accidentally discovered
uiop:launch-program which does
exactly what I need in a portable manner. It was implemented in asdf-184.108.40.206
released on Dec 1st, 2016 (a month and a half ago!). This post is meant as a
piece of documentation that can be indexed by search engines to help spread my
happy discovery. :)
The Python code reads commands from standard input and writes the responses to standard output. Both, commands and response headers are followed by newlines and an optional payload.
The commands are:
exit- what it does is self-evident
There's only one response:
colorized|len, followed by a newline and
utf-8 characters of the colorized code as an HTML snippet.
Python's automatic inference of standard IO's encoding is still pretty messed up, even in Python 3. It's a good idea to create wrapper objects and interact only with them:
1 input = io.TextIOWrapper(sys.stdin.buffer, encoding='utf-8') 2 output = io.TextIOWrapper(sys.stdout.buffer, encoding='utf-8')
Printing diagnostic messages to standard error output is useful for debugging:
1 def eprint(*args, **kwargs): 2 print(*args, file=sys.stderr, **kwargs)
OK, I have a python script that does the coloring. Before I can use it, I need
to tell ASDF about it and locate where it is in the filesystem. The former is
done by using the
:static-file qualifier in the
:components list. The latter
is a bit more complicated. Since the file's location is known relative to the
lisp file it will be used with, it's doable.
1 (defvar *pygmentize-path* 2 (merge-pathnames "pygmentize.py" 3 #.(or *compile-file-truename* *load-truename*)) 4 "Path to the pygmentize script")
The trick here is to use
#. to execute the statement at read-time. You can see
the full explanation here.
With that out of the way, I can start the renderer with:
1 (defmethod start-concrete-renderer ((renderer (eql :pygments))) 2 (setf *pygmentize-process* (uiop:launch-program 3 (list *python-command* 4 (namestring *pygmentize-path*)) 5 :input :stream 6 :output :stream)))
For debugging purposes, it's useful to add
:error-output "/tmp/debug", so that
the diagnostics do not get eaten up by
To stop the process, we send it the
exit command, flush the stream, and wait
until the process dies:
1 (defmethod stop-concrete-renderer ((renderer (eql :pygments))) 2 (write-line "exit" (process-info-input *pygmentize-process*)) 3 (force-output (process-info-input *pygmentize-process*)) 4 (wait-process *pygmentize-process*))
The Lisp part of the colorizer sends the
pygmentize command together with the
code snippet to Python and receives the colorized HTML:
1 (defun pygmentize-code (lang params code) 2 (let ((proc-input (process-info-input *pygmentize-process*)) 3 (proc-output (process-info-output *pygmentize-process*))) 4 (write-line (format nil "pygmentize|~a|~a~@[|~a~]" 5 (length code) lang params) 6 proc-input) 7 (write-string code proc-input) 8 (force-output proc-input) 9 (let ((nchars (parse-integer 10 (nth 1 11 (split-sequence #\| (read-line proc-output)))))) 12 (coerce (loop repeat nchars 13 for x = (read-char proc-output) 14 collect x) 15 'string))))
See the entire pull request here.
I was able to get down from well over a minute to less that three seconds with the time it takes to generate this blog.
]==> time ./coleslaw-old.x /path/to/blog/ ./coleslaw-old.x /path/to/blog/ 66.40s user 6.19s system 98% cpu 1:13.55 total ]==> time ./coleslaw-new-no-renderer.x /path/to/blog/ ./coleslaw-new-no-renderer.x /path/to/blog/ 65.50s user 6.03s system 98% cpu 1:12.53 total ]==> time ./coleslaw-new-renderer.x /path/to/blog/ ./coleslaw-new-renderer.x /path/to/blog/ 2.78s user 0.27s system 106% cpu 2.849 total
coleslaw-old.xis the original code
coleslaw-new-no-renderer.xstarts and stops the renderer with every code snippet
coleslaw-new-renderer.xstarts the renderer beforehand and stops it after all the job is done