Safety And The Lisp Read Function

For all my blathering about security, here’s a potential exploit in Common Lisp that I never thought about. William Halliburton has a nice explanation of Lisp’s read function and the assoicated read-macro #.. When read sees the sequence #. it will evaluate the next Lisp object that it reads. For example, given the following inputs to read the corresponding object is returned as the result of the read:

(+ 1 (/ 8 4)) → (+ 1 (/ 8 4))
#.(+ 1 (/ 8 4)) → 3
(+ 1 #.(/ 8 4)) → (+ 1 2)

That’s pretty neat but as Halliburton points out it also poses a security threat because a user can use it to execute arbitrary code in a Lisp program that naively reads data from the user.

Follow the link for a more complete explanation of the #. read-macro, an example of using it for an exploit, and a simple safe-read function that prevents the problem.

If you’d like to play around with read and #. to see how they act and what they return, here’s a function that acts as a REPL without the evaluation that you can use.

(defun rpl ()
  "A REPL without the evaluate. Type (quit) to exit."
  (princ "rpl> ")
  (let ((obj (read)))
    (prin1 obj)
    (terpri)
    (unless (equal obj '(quit))
      (rpl))))
Posted in Programming | Tagged | 1 Comment

Blogging With Emacs—Another Method

Today, someone on Hacker News pointed to an old post on Blogging With Emacs over at Work In Progress. Since some of my readers had expressed an interest in how I use Emacs and Org-mode to blog, I thought there might be some interest in seeing how someone else does the same thing.

The blog’s author takes a very different approach to blogging with Emacs than I do. First of all he uses Jekyll to generate a static site rather than using WordPress. That means that he doesn’t have the wonderful org2blog at his disposal but he solves that problem by using Org mode’s project publishing facility. I’ve never used that part of Org mode but it seems to be pretty nice. Bernt Hansen has a nice writeup on Publishing and Exporting with Org mode that you should read if you want to use this facility.

The Work In Progress post is interesting and shows how easily Emacs can handle different work flows to solve the same problem. My only quibble with the post is in the first sentence where he says, “I don’t want to waste time explaining why I use Emacs all day, every day, and eschew more traditional development environments.” I would argue that Emacs is the traditional editor—or at least that Emacs shares that honor with Vi. Yes, yes, I know all about ed but let’s be serious.

By the way, DJCB over at EMACS-FU also has an old post on using Org mode to blog, this time with Blogger. As always, there’s a lot to learn at EMACS-FU.

Posted in Blogging | Tagged , | Leave a comment

Working With Matching Lines In Emacs

This is another note to myself. For some reason I have a hard time remembering the handful of commands that do things to the lines in a buffer that match a regular expression. The table below are the ones that I know about but have a hard time remembering. If you have any similar commands that you find useful, leave a comment.

Command Alias Action
keep-lines delete-non-matching-lines Delete lines not matching regexp
flush-lines delete-matching-lines Delete lines matching regexp
how-many count-matches Count lines matching regexp
occur list-matching-lines Show lines matching regexp in another buffer
highlight-lines-matching-regexp hi-lock-line-face-buffer Highlight lines matching regexp

Update Phil reminds me of highlight-lines-matching-regexp, which I meant to include but forgot. (See what I mean?) He also mentions multi-occur (just like occur but it acts on multiple buffers) and multi-occur-in-matching-buffers (like multi-occur but you specify the buffers with a regexp), which he says he finds particularly useful.

Posted in General | Tagged | 8 Comments

The Popularity Of Programming Languages

TIOBE Software has published its list of programming language popularity for January 2012 and awarded Objective-C the TIOBE Programming Language Award for 2011. The award is given for the language with the most growth in market share for the year. Objective-C’s growth is driven, of course, by the popularity of the iPhone and iPad and their App Store infrastructure.

There aren’t a lot of surprises. Java is in first place (but losing share) and C is second and gaining share. Surprisingly, Python lost a significant amount of share and dropped from 5th place last year to 8th this year. Lisp has positive growth and maintained its 13th place from last year.

I don’t take the list too seriously but it does, as TIOBE says, give programmers a view of what languages are “important” at the moment and an indication of whether or not their skills need updating. Other than that the list is interesting on its own terms. Who would have thought, for instance, that Objective Pascal is in 11th place ahead of Ruby in 12th?

Posted in Programming | Leave a comment

The Emacs count-lines-region Command

I’ve been working on a Lisp program that generates combinations of certain objects. During development, I like to print them out to see what, exactly, is getting generated. Often, I’ll realize that I also need to know how many objects got generated. Because I’m using slime, all the output is going into an Emacs buffer so I have all the normal Emacs tools available.

I didn’t know a command for counting lines but I was pretty sure there would be one and after a quick trip to the built-in documentation I found count-lines-region, which was just what I needed. All I had to do was mark the output and call count-lines-region. Happily, there’s even a shortcut for it so I can just mark the region and type 【Meta+=】 to get a line count of the output.

Posted in General | Tagged | 3 Comments

Common Lisp Pitfalls

Here is a list of Common Lisp pitfalls compiled by Jeff Dalton. They were originally posted to comp.lang.lisp in 1995. If, like me, you’re a Lisp programmer but you don’t write in Common Lisp everyday, it’s probably a good idea to read over this list now and then to remind yourself of the gotchas that are lurking in Common Lisp.

Posted in Programming | Tagged | Leave a comment

Writing Log Files In JSON

Readers of my series on writing log files as Lisp code (1, 2, 3, 4, 5, 6, 7) may enjoy Paul Querna’s post over at Paul’s Journal entitled Write Logs for Machines, use JSON. In it he says that it’s time give up the traditional printf style log entries that are meant primarily for humans and to write them instead in JSON so that they can be processed by machines.

I agree that that idea has a lot of merit. I think you could make a good case that JSON makes more sense than s-expressions because there are tools to work with them in virtually every language whereas sexprs pretty much require some sort of Lisp language to handle well.

Querna’s post is well worth some of your time if you’re involved with writing or processing log files. He goes over some of the tools available to help with the processing and talks about some of his home-grown ones.

Posted in Programming | Leave a comment

Lexical Scoping In Emacs Lisp

I’ve said before that Emacs Lisp is pretty much like other Lisps but one area where that is not true is in variable scoping. Common Lisp and Scheme both enjoy lexical scoping whereas Emacs Lisp has dynamic scoping. Until you actually use lexical scoping and see what it can do, this seems like an esoteric point that most programmers needn’t worry about. Emacs 24 introduces support for lexical scoping so Elisp programmers can finally make use of this powerful mechanism.

Yoo Box has a long post on lexical scoping and dynamic scoping in Emacs Lisp. It turns out that you can enable lexical scoping for an Elisp file by adding

;;; -*- lexical-binding: t -*-

to the top of the file. If you’re already familiar with lexical scoping that’s just about everything you need to know. If not, then Box has an excellent and detailed explanation of the differences between lexical and dynamic scoping with plenty of examples. The examples are short and seem simple but they require careful study to get their full value.

I really recommend Box’s post if lexical scoping is new to you. A careful reading will pay dividends. For the rest of us, it’s really great that Emacs is finally supporting lexical scoping. It’s still not Common Lisp but it’s a huge improvement.

Posted in Programming | Tagged , | 6 Comments

A Puzzle

After stumbling onto Justin Heyes-Jones’ compile-command tip that I wrote about previously, I decided to trawl through JustinHJ’s Coding Blog to see what other goodies I could find. One interesting post, Word numbers programming puzzle, discusses one of the ITA candidate puzzles from before they were acquired by Google. These were non-trivial programming challenges that candidates were to submit with their applications.

This particular challenge asks you to generate all the numbers from 1 to 999,999,999 as English words, sort them, and then find the 50,000,000,000 character excluding spaces, commas, dashes, and the word “and”. When Justin tried the obvious solution, he quickly ran out of heap space and even smaller problems took considerable run time. That’s not surprising. Even from the statement of the problem we know that the resulting sorted text exceeds 50 gigabytes, far more memory than anyone reading this blog is apt to have available.

It’s pretty clear that to solve the problem you have to generate each of the numbers in alphabetical order and feed them to the counting routine one at a time. My idea was to use 3 lists of 999 entries each. One list contained the numbers one to ninehundredninetynine in alphabetical order. The seconds list was generated from the first by appending “thousand” to each entry and resorting it. The third list was like the second but with “million” appended.

Most of the numbers are of the form: XmillionYthousandZ so it’s tempting to write something like

(dolist (m third-list)
  (dolist (t second-list)
    (dolist (o first-list)
      (count (concatenate 'string m t o)))))

but unfortunately a small number (2,997,000) of the words don’t follow that pattern. You’ve got numbers like ten and fourmillionthree, so you’ve got to generate them separately and in the right order. After some careful thought (and false starts) I realized that these numbers occur in two ways: without a “million” prefix and after a “million” prefix but without any “thousand” term. The numbers 1 through 999,999 are the first case and the numbers 1 through 999 are the second. To deal with the first case I made a new list by merging the first and second lists. The second case is handled by reusing the first list.

With that introduction, here is the code

(defparameter *word-count* 0)       ;count of generated number words
(defparameter *ones* nil)           ;number words 1..999 sorted
(defparameter *thousands* nil)      ;number words 1..999 thousand sorted
(defparameter *millions* nil)       ;number words 1..999 million sorted
(defparameter *prefixless* nil)     ;number words with no million prefix

(defun make-char-counter (goal)
  "Report the goalth character of the generated number words."
  (let ((cnt 0))
    (lambda (p pl w)
      (incf cnt (+ pl (cdr w)))
      (when (>= cnt goal)
        (let* ((word (concatenate 'string p (car w)))
               (word-len (length word))
               (goal-char (+ (- word-len cnt 1) goal)))
          (format t "The goal character is ~c~%"
                  (subseq word goal-char (1+ goal-char)))
          (throw 'done t))))))

(defun run-prefixless (action)
  "Generate a prefixless word."
  (if (numberp (cdar *prefixless*))
      (funcall action "" 0 (pop *prefixless*))
      (run-single-thousand action (car (pop *prefixless*)))))

(defun run-ones (action prefix)
  "Add the 1..999 suffixes in alphabetical order."
  (let ((prefix-len (length prefix)))
    (dolist (w *ones*)
      (funcall action prefix prefix-len w))))

(defun run-single-thousand (action prefix)
  "Generate all the words for Xthousand..Xthousandninehundredninetynine."
  (funcall action prefix (length prefix) '("" . 0))
  (run-ones action prefix))

(defun run-thousands (action prefix thousands ones)
  "Generate all the thousands for a particular million."
  (unless (null thousands)
    (while (and ones (string-lessp (caar ones) (car thousands)))
      (funcall action prefix (length prefix) (pop ones)))
    (let ((prefix (concatenate 'string prefix (car thousands))))
      (run-single-thousand action prefix))
    (run-thousands action prefix (cdr thousands) ones)))

(defun run-millions (action millions)
  "Generate all the number words in alphabetical order."
  (unless (null millions)
    (let ((prefix (car millions)))
      (while (and *prefixless* (string-lessp (caar *prefixless*) prefix))
        (run-prefixless action))
      (funcall action prefix (length prefix) '("" . 0))
      (run-thousands action prefix *thousands* *ones*))
    (run-millions action (cdr millions)))
  (while *prefixless*
    (run-prefixless action)))

(defun generate-words (&optional (i 1) (w nil))
  "Generate one..ninehundredninetynine in alphabetical order."
  (if (< i 1000)
      (generate-words
       (1+ i)
       (cons (remove-if-not #'alpha-char-p (format nil "~R" i)) w))
      (sort w #'string-lessp)))

(defun generate (action)
  "Initialize globals and call run-millions to generate the number words."
  (let ((words (generate-words)))
    (setq *ones* (mapcar (lambda (w) (cons w (length w))) words))
    (setq *thousands* (sort (mapcar
                             (lambda (w) (concatenate 'string w "thousand"))
                             words) #'string-lessp))
    (setq *millions* (sort (mapcar
                            (lambda (w) (concatenate 'string w "million"))
                            words) #'string-lessp)))
  (setq *prefixless* (merge 'list (copy-list *ones*)
                            (mapcar (lambda (w) (cons w t)) *thousands*)
                            #'string-lessp :key #'car))
  (setq *word-count* 0)
  (catch 'done (run-millions action *millions*)))

The main function is generate, which you can think of as a machine that generates each number word in the correct order and passes them to the action function. It starts out by generating the four lists I talked about in the introduction and then calls run-millions to get things rolling. The run-millions routine implements the idea of the three nested dolists from the introduction but also looks for those numbers with no “million” part. Similarly, run-thousands takes care of adding the “thousand” component but also checks to see if it’s time to put in one of the numbers without a “thousand” component.

The action function that counts the characters is returned by make-char-counter. When it finds that the character count has exceeded the goal it looks at the current word and backtracks to the correct character. It outputs that character and then throws an exception to exit the program.

None of the code is very complicated so I’ll leave it to those interested to dig out the details. One point that’s not obvious is why I passed in the action function. That’s because I had several debugging functions that I used during development. For example, make-checker returns a function that checks that the generated words are in alphabetical order and that the right number of words is generated.

(defun make-checker ()
  "Ensure the correct number of words are generated words and in order."
  (let ((last ""))
    (lambda (p pl w)
      (declare (ignore pl))
      (let ((num (concatenate 'string p (car w))))
        (unless (string-lessp last num)
          (cerror "~s >= ~s~%" 'abort last num))
        (setf last num)
        (incf *word-count*)))))

Interestingly, checking that the generator outputs the correct words takes much longer to run than the actual character counting. With make-char-counter ((generator (make-char-counter 50000000000))), the run time is just under 12 seconds (on a 2.66 GHz Intel Core i5 iMac). Running with make-checker ((generator (make-checker))) takes a bit more than 11 minutes, 15 seconds. That’s not too surprising since most of the lengths are precalculated so there’s not much work to do for finding the 50 billionth character but the checker has to concatenate the three parts of each word and then do a string compare.

Update: Rereading Justin’s post, I see that I solved a slightly different problem. I’ll leave it as an exercise to the reader, as they say, to make the changes necessary to solve the actual problem. Since the hard part of generating those numbers one at a time in alphabetical order is already done, the necessary changes are pretty easy.

Posted in Programming | Tagged | 2 Comments

Free Programming eBooks

Michael Kohl over at citizen428.blog() has a nice list of free programming ebooks. Many of these are for Lisp—including Emacs Lisp—but there’s a nice selection of other languages including Ruby, Javascript, Haskell, and many others. There are also books on miscellaneous subjects such as algorithms, parsing techniques, and functional programming.

I’ve seen this list mentioned a couple of times lately, but it offers a good selection of books so I’m passing it on in case any of my readers missed the other references.

Posted in Programming | 1 Comment