Content tagged python
Writing this blog became increasingly tedious over time. The reason for this was the slowness of the rendering tool I use - coleslaw. It seemed to work well for other people, though, so I decided to investigate what I am doing wrong. The problem came from the fact that the code coloring implementation (which I co-wrote) spawned a Python process every time it received a code block to handle. The coloring itself was fast. Starting and stopping Python every time was the cause of the issue. A solution for this malady is fairly simple. You keep the Python process running at all times and communicate with it via standard IO.
Surprisingly enough, I could not find an easy and convenient way to do it. The
dominant paradigm of
uiop:run-program seems to be spawn-process-close, and it
does not allow for easy access to the actual streams.
hand me the stream objects that I need, but it's not portable. While reading the
code of uiop trying to figure out how to extract the stream objects from
run-program, I accidentally discovered
uiop:launch-program which does
exactly what I need in a portable manner. It was implemented in asdf-18.104.22.168
released on Dec 1st, 2016 (a month and a half ago!). This post is meant as a
piece of documentation that can be indexed by search engines to help spread my
happy discovery. :)
The Python code reads commands from standard input and writes the responses to standard output. Both, commands and response headers are followed by newlines and an optional payload.
The commands are:
exit- what it does is self-evident
There's only one response:
colorized|len, followed by a newline and
utf-8 characters of the colorized code as an HTML snippet.
Python's automatic inference of standard IO's encoding is still pretty messed up, even in Python 3. It's a good idea to create wrapper objects and interact only with them:
1 input = io.TextIOWrapper(sys.stdin.buffer, encoding='utf-8') 2 output = io.TextIOWrapper(sys.stdout.buffer, encoding='utf-8')
Printing diagnostic messages to standard error output is useful for debugging:
1 def eprint(*args, **kwargs): 2 print(*args, file=sys.stderr, **kwargs)
OK, I have a python script that does the coloring. Before I can use it, I need
to tell ASDF about it and locate where it is in the filesystem. The former is
done by using the
:static-file qualifier in the
:components list. The latter
is a bit more complicated. Since the file's location is known relative to the
lisp file it will be used with, it's doable.
1 (defvar *pygmentize-path* 2 (merge-pathnames "pygmentize.py" 3 #.(or *compile-file-truename* *load-truename*)) 4 "Path to the pygmentize script")
The trick here is to use
#. to execute the statement at read-time. You can see
the full explanation here.
With that out of the way, I can start the renderer with:
1 (defmethod start-concrete-renderer ((renderer (eql :pygments))) 2 (setf *pygmentize-process* (uiop:launch-program 3 (list *python-command* 4 (namestring *pygmentize-path*)) 5 :input :stream 6 :output :stream)))
For debugging purposes, it's useful to add
:error-output "/tmp/debug", so that
the diagnostics do not get eaten up by
To stop the process, we send it the
exit command, flush the stream, and wait
until the process dies:
1 (defmethod stop-concrete-renderer ((renderer (eql :pygments))) 2 (write-line "exit" (process-info-input *pygmentize-process*)) 3 (force-output (process-info-input *pygmentize-process*)) 4 (wait-process *pygmentize-process*))
The Lisp part of the colorizer sends the
pygmentize command together with the
code snippet to Python and receives the colorized HTML:
1 (defun pygmentize-code (lang params code) 2 (let ((proc-input (process-info-input *pygmentize-process*)) 3 (proc-output (process-info-output *pygmentize-process*))) 4 (write-line (format nil "pygmentize|~a|~a~@[|~a~]" 5 (length code) lang params) 6 proc-input) 7 (write-string code proc-input) 8 (force-output proc-input) 9 (let ((nchars (parse-integer 10 (nth 1 11 (split-sequence #\| (read-line proc-output)))))) 12 (coerce (loop repeat nchars 13 for x = (read-char proc-output) 14 collect x) 15 'string))))
See the entire pull request here.
I was able to get down from well over a minute to less that three seconds with the time it takes to generate this blog.
]==> time ./coleslaw-old.x /path/to/blog/ ./coleslaw-old.x /path/to/blog/ 66.40s user 6.19s system 98% cpu 1:13.55 total ]==> time ./coleslaw-new-no-renderer.x /path/to/blog/ ./coleslaw-new-no-renderer.x /path/to/blog/ 65.50s user 6.03s system 98% cpu 1:12.53 total ]==> time ./coleslaw-new-renderer.x /path/to/blog/ ./coleslaw-new-renderer.x /path/to/blog/ 2.78s user 0.27s system 106% cpu 2.849 total
coleslaw-old.xis the original code
coleslaw-new-no-renderer.xstarts and stops the renderer with every code snippet
coleslaw-new-renderer.xstarts the renderer beforehand and stops it after all the job is done
Pretty interesting talk on how to prevent squirrels from stealing bird food using python and computer vision.
Steps to recognize a squirrel on a picture:
- Subtract background.
- Compute average value of each pixel over time to build a background profile.
- Detect blobs.
- Discriminate blobs. Distinguish between squirrels from other things. The
author used support vector machines.
- Blob size
- Color histograms
- Entropy detection (squirrel tail)
Other interesting stuff mentioned:
Why, oh why?
Using IMAP, of course. There's a couple of articles on the web describing how to do that, but they all have one problem. They tell you to copy your emails from the "All Mail" folder in the gmail account, which means that you will lose all your labels, and that's probably not acceptable for someone with thousands of conversation threads all placed in the boxes they belong to. I have written a couple of python scripts that should help solving this problem. You can get them from github. Remember: You use them at your own risk. Read on, if you want to know what they do.
Problem with labels
There are some issues with labels when accessing gmail through IMAP. Since the labels are mapped to the IMAP folders, you might, at first, expect that, when you open the folder and browse it, you will see all the emails in all the conversations bearing the label. This is not true. GMail operates and labels things at the level of conversations, IMAP, however, operates at the level of messages, so you will only see the messages that were part of the conversation when you assigned the label but not the new ones. This makes some sense if you use gmail with a normal IMAP client, ie. you don't see the same new messages arriving in many folders, an you avoid having the archived messages back in the IMAP inbox when somebody responds to a message in an archived conversation.
This is a real problem when you'd like to move your nicely sorted conversations out of gmail to some other service. There is a suggestion at the Google Product Forums on how to deal with it: you should go to the web browser, open the conversation, remove the label, and then re-apply it. It's not really a viable solution if you have more than a couple of conversations, fortunately Google provides an API that will enable us to do that auto-magically.
This is what the gmail_label_remap.py does, it finds every conversation and re-applies the labels. You should see something like this when you run it from your terminal:
]==> ./gmail_label_remap.py username (won't be echoed): password (won't be echoed): [i] Identify the IMAP folder name for "All Mail"... [i] Done: [Gmail]/Tous les messages [i] Opening [Gmail]/Tous les messages [i] [Gmail]/Tous les messages contains 15299 messages [i] Identify all the threads (may take some time) [i] Found 4182 threads [i] Processing thread 4182 of 4182 [i] Orphaned threads: 414 [i] Sent only threads: 255 [i] ALL DONE
It will create two extra labels:
- orphaned - for the conversations without labels
- sent_only - for the emails that you have sent but got no answer to them, so they have no assigned label
Moving emails to a different IMAP account
Now that all the label issues are sorted out, you need to copy the emails. You may use an IMAP client of your preference, or the imap_copy.py script. Using the script you may get the list of all your IMAP folders:
]==> ./imap_copy.py --list --source=imap.gmail.com:993 imap.gmail.com's username (won't be echoed): imap.gmail.com's password (won't be echoed): Fun (1641) INBOX (114) Private (484) Work (1010) [Gmail]/Brouillons (0) [Gmail]/Corbeille (563) [Gmail]/Important (5204) [Gmail]/Messages envoy&AOk-s (4440) [Gmail]/Spam (16) [Gmail]/Suivis (59) [Gmail]/Tous les messages (14736)
It will also let you copy the messages from one IMAP account to another:
]==> ./imap_copy.py --copy=INBOX.gmail-inbox,Fun \ --source=imap.gmail.com:993 \ --destination=imap.somewhere.else:993 imap.gmail.com's username (won't be echoed): imap.gmail.com's password (won't be echoed): imap.somewhere.else's username (won't be echoed): imap.somewhere.else's password (won't be echoed): Copying INBOX => gmail-inbox Copying 114 of 114 Copying Fun => Fun Copying 1641 of 1641 ALL DONE
What you specify after the 'equal' sign of the --copy parameter is a comma separated list of folders. It may just be a folder name, like Fun in the example above. Or something like: INBOX.gmail-inbox which will take the INBOX folder at the source and copy its contents to the gmail-inbox folder at the destination.
For the time being I will use KMail for e-mails and Psi for jabber, they are pretty decent, but I liked having everything (conversation history, settings, contacts) in one place on the web, so I will probably look for, or help developing, a solution that will give me all that and more. RoundCube looks promising.
Edit 28-04-2012: Continuation post has been added.