Heaps and Priority Queues

In a previous post, I talked about FIFO queues and gave an implementation. Now I want to talk about another type of queue: the priority queue. In these queues every element has a priority and elements are removed from the queue in order of their priority. Thus the element with the lowest (or highest) priority is removed first regardless of the order in which it was inserted. A naïve way to implement a priority queue is simply to sort it by priority after the elements are inserted. The problem with that implementation is that sorting can be a heavyweight operation. That would be OK if you only had to do it once, but usually items are being popped off and pushed on the queue repeatedly so that it would have to be resorted every time a new element is added.

The traditional way of implementing priority queues is with a binary heap. This has the advantage that removal and insertion are O(log n). Here’s an implementation of a binary heap from my standard library.

 1:  ;; Make a heap. Size is the total number of entries that the heap will
 2:  ;; hold, less? is a function to compare the values of two entries, and
 3:  ;; move-hook is a function that will be called everytime an object is
 4:  ;; moved in the heap. It is called with the arguments object and slot-number.
 5:  ;; Arguments to the resulting heap are:
 6:  ;; 'push obj -- Push obj onto the heap and move it to its correct position
 7:  ;; 'pop -- pop off the top object and fix the heap
 8:  ;; 'recalc slot -- the object in slot slot has changed its value. Move it to
 9:  ;;                 its correct position
10:  ;; 'map f -- Apply f to (slot . object) for every entry in the heap
11:  ;; 'show -- dump the current size of the heap and the heap array
12:  
13:  (define make-heap
14:    (lambda (size less? . move-hook)
15:      (let ((h (make-vector (1+ size))) (used 0)
16:            (hook (if (null? move-hook) (lambda x #t) (car move-hook))))
17:        (letrec ((up-heap (lambda (c t)
18:                            (let ((p (quotient c 2)))
19:                              (if (or (= c 1)
20:                                      (less? (vector-ref h p) t))
21:                                  (begin
22:                                    (vector-set! h c t)
23:                                    (hook t c))
24:                                  (begin
25:                                    (vector-set! h c (vector-ref h p))
26:                                    (hook (vector-ref h p) c)
27:                                    (up-heap p t))))))
28:                 (down-heap (lambda (p t)
29:                              (let ((lc (* p 2)))
30:                                (if (> lc used)
31:                                    (begin
32:                                      (vector-set! h p t)
33:                                      (hook t p))
34:                                    (let ((rc (1+ lc)))
35:                                      (if (and (<= rc used)
36:                                               (less? (vector-ref h rc)
37:                                                      (vector-ref h lc)))
38:                                          (set! lc rc))
39:                                      (if (less? t (vector-ref h lc))
40:                                          (begin
41:                                            (vector-set! h p t)
42:                                            (hook t p))
43:                                          (begin
44:                                            (vector-set! h p (vector-ref h lc))
45:                                            (hook (vector-ref h lc) p)
46:                                            (down-heap lc t)))))))))
47:          (lambda (cmd . data)
48:            (case cmd
49:              ((push) (set! used (1+ used))
50:               (if (> used size)
51:                   (error #f "HEAP: heap full")
52:                   (up-heap used (car data))))
53:              ((pop) (let ((result (vector-ref h 1)))
54:                       (set! used (1- used))
55:                       (if (negative? used)
56:                           #f
57:                           (begin
58:                             (down-heap 1 (vector-ref h (1+ used)))
59:                             result))))
60:              ((recalc) (let ((i (car data)))
61:                          (cond
62:                           ((= i 1) (down-heap 1 (vector-ref h 1)))
63:                           ((= i used) (up-heap used (vector-ref h used)))
64:                           ((less? (vector-ref h i) (vector-ref h (quotient i 2)))
65:                            (up-heap i (vector-ref h i)))
66:                           (else (down-heap i (vector-ref h i))))))
67:              ((map) (let ((f (car data)))
68:                       (let mapper ((ix 1) (result '()))
69:                         (if (> ix used)
70:                             (reverse result)
71:                             (mapper (1+ ix)
72:                                     (cons (f (cons ix (vector-ref h ix)))
73:                                           result))))))
74:              ((show) (list used h))
75:              (else (error "HEAP: illegal command:" cmd))))))))

Sometimes events external to the queue will cause the priority of one of its entries to change—this happens with the Dijkstra’s shortest path algorithm, for example. To handle these cases, you can specify a function to be called whenever an object moves in the heap. That way you can keep track of its position in case you need to adjust its priority. The recalc method will move an entry whose priority has been changed to its new correct position in the heap.

Despite its name, less? can be written to choose the element with the larger priority instead of the smaller. In that case, entries with the higher priority are removed first.

All of the real work is done in up-heap (lines 17–27) and down-heap (lines 28–46). Items are inserted into the queue with the push command (line 49) by putting them at the end of the queue and calling up-heap to move it into its final position. The next item to be removed is popped off the queue with the pop command (line 53) by returning the entry in slot 1 of the queue (slot 0 is not used) and then putting the entry in the last slot into slot 1 and calling down-heap to move it to its correct position.

The push and pop commands are the standard heap operations familiar from any basic algorithms book or course. This implementation also has the afore mentioned recalc command and a map command. For debugging, the show command will display the size of the queue and the vector that holds the queue entries.

In my next post, I will give an example that uses a priority queue.

Posted in Programming | Tagged | Leave a comment

Sedach on Lisp Macros

Via Loper OS, here’s Vladimir Sedach on Lisp macros. He points out that the purpose of programming is automation and that once having realized that the next question is, “How can we automate our programming?” Sedach goes on to argue that macros are the only general system we have for that—they are, after all, little programs that write code—and that not having macros means that you don’t understand why you are writing code. He make some other, less controversial, points as well.

It’s a good read and at only 5 short paragraphs it will only take you a minute or two to do so. If you like it, you might want to head over to Loper OS to see what Stanislav Datskovskiy has to say about it.

Posted in General | Tagged | Leave a comment

Cleaning Up My Emacs Environment

While I was installing magit the other day, I decided it was a good time to do a little digital house cleaning. I keep all my important file groups—writing, software development, blogging, and personal—in Git repositories. Oddly, given that these all flow through Emacs, I didn’t have my Emacs environment in Git. As I wrote yesterday in my Magit post, one of the primary uses for these Git repositories is to keep my various machines in sync so not having .emacs.d in Git made no sense at all.

Part of the problem was some early poor choices on my part. I installed the Org-mode distribution as a subdirectory of .emacs.d. There’s nothing necessarily wrong with that but I had it in there as a Git repository. A while back I decided to try to put .emacs.d in Git but the nested Git repositories didn’t work out so well—not at all, in fact. When I finally got around to installing magit I already had some Emacs packages in my tools directory, so I moved the Org-mode repository out of .emacs.d and into tools. I had already replaced .emacs with init.el in .emacs.d so all I had to do was initialize .emacs.d as a Git repository, clone it as a bare repository, and move the cloned repository to the server with the rest of the repositories.

Many people like to break their .emacs into several files with a separate file for each mode or function. Some even use Babel’s literate programming facilities to tie everything together. I keep everything in one file otherwise I’d never be able to find anything. That means that I don’t have much in my .emacs.d but init.el and some elisp files for small packages that don’t need their own subdirectory. Here’s what it looks like

auto-save-list/
dired+.el
dired+.elc
gnuplot-gui.elc
gnuplot.elc
htmlize.el
init.el
paredit.el
quack.el
slime/
tramp
tutorial/
xml-rpc.el

Looking at that list, I could probably get rid of quack.el since Geiser has pretty much taken over its functions. Unfortunately the .emacs.bmk file lives in my home directory by default and there doesn’t appear to be a way to change it. That’s not really a problem for me because I don’t use book marks but it would be nice to get everything in one place. I suppose I could set a symbolic link if I did start using them.

Posted in General | 6 Comments

Magit

One of the first things I did when I started using Emacs was to install the git.el package that interfaces with Git. I chose git.el because it worked the same as the rest of the Emacs source control interfaces. There’s not much to do to install the package but somehow I must have messed it up because I could never get it to commit.

I use Git mostly to keep versioned backups of my work and to move data between the machines that I use. I have Git repositories on a local server that I can push to and pull from. It’s not hard to pop into a shell buffer and just call Git by hand and that’s what I fell into doing. Every once in a while I would try M-x git-status again and, of course, it still didn’t work. I checked that I had the latest version but other than that I didn’t look into the problem further.

All this time, what I really wanted to use was magit but I had it in my head that it was better to stick with a package that had a UI the same as for all the other SCM systems. Then one day it occurred to me that I use only Git and that, gee, maybe I could learn another interface if the need arose. But I still didn’t install magit. I know, I know, I’m a lazy, lazy person.

A few days ago I saw this video by Alex Vollmer and decided that I really should give it a spin. I’m very pleased that I did. The result is similar to what happened when I discovered I could FTP right out of dired. It hasn’t changed my life but it has made my work flow easier and more efficient. Even more important, I find that I do a lot more commits and pushes than I did before. I used to put off committing until I had several files or until I needed to refresh my files on another machine. Now it’s so simple that I do it as soon as I finish with a file. That means the repositories on the server are always up to date and I don’t have to trudge over to my main machine to push everything so that I can pull it onto another machine—I just pull and get the latest files.

It turns out that magit is very sensitive to the .git/config files so if you want to use it you will need to get those right. Here’s what I have for my local set up.

[core]
        repositoryformatversion = 0
        filemode = true
        bare = false
        logallrefupdates = true
        ignorecase = true
[remote "org-repo"]
        fetch = +refs/heads/*:refs/remotes/org-repo/*
        url = ssh://bedia/home/jcs/repos/org.git
[branch "master"]
        remote = org-repo
        merge = refs/heads/master

The [core] section of the configuration was auto-generated by Git, and the [branch "master"] section was generated by magit. That leaves the fetch line in the [remote "org-repo"] section. If you don’t have that, things will still work, mostly, but you get error messages in your magit-status buffer for some of the remote items.

Posted in Programming | Leave a comment

Mean!

This is just mean. Mean, but funny. It’s also awe-inspiring in the same disturbing way that someone who memorizes the Boston phone directory is.

Posted in Programming | Tagged | Leave a comment

Google Code University

Recently, via Hacker News, I came across this link to Google Code University. The University consists of introductions to a variety of subject matter in software development and Web programming. Some are internal courses that Google uses to train their own engineers and others are from outside sources such as Princeton, Stanford, University of Washington, and many more. They have courses on Programming Languages, Web Programming, Web Security, Algorithms, Android, Distributed Systems, Tools, and Google APIs and Tools.

As I have only minimal HTML and CSS skills and no Javascript skills at all, I decided to try out the HTML, CSS, and Javascript from the Ground Up course. It’s about 2 hours long and is broken up into several videos and exercises. There is a zip file with course material so that you can follow along and work the exercises. As you would expect from a 2 hour course, I’m hardly and expert Web designer after going through the material but I did learn some things I didn’t know before. More importantly, I came away with a refined way of thinking about HTML, CSS, Javascript, and the relationships between them.

The course is perfect for someone like me. I let Org-mode do most of the heavy lifting in generating HTML and only occasionally insert HTML code manually to tweak the result. Some of the tricks I learned from the course will help me do this in a more effective way. The CSS material will help me refine the look of my new Home page once I get around to setting it up.

All in all, the course was enjoyable and well worth my time. I will probably try out some of the other courses too. They have a wide variety and many of them look interesting.

Posted in Programming | Leave a comment

Setting the Babel Evaluate Confirm Status

Because Babel provides a facility to execute arbitrary code, it presents a security risk. Code blocks are evaluated when a document is exported as well as when a user explicitly asks for evaluation by typing C-c C-c in the block. To prevent an unwary user from unintentionally executing malicious code, Babel asks for confirmation before executing any code block. That’s a sensible precaution and one that I leave on even though it can be disabled.

There are times, however, when it’s really inconvenient. For example, in the Calling Babel From A Table post, Babel asks for confirmation for each table cell. In that case, I wanted to turn off the confirmation request. You do that by setting org-confirm-babel-evaluate to nil. That’s a long name and if a week goes by without my using it, I won’t remember the name and will have to look it up. It’s also a pain to type. Also, there’s no way to check whether you forgot to turn it back on other than by looking at org-confirm-babel-evaluate.

Here’s a little function that does all that in a convenient way and that has a name short enough to remember. Just add it to your .emacs or init.el file.

(defun babel-confirm (flag)
  "Report the setting of org-confirm-babel-evaluate.
If invoked with C-u, toggle the setting"
  (interactive "P")
  (if (equal flag '(4))
      (setq org-confirm-babel-evaluate (not org-confirm-babel-evaluate)))
  (message "Babel evaluation confirmation is %s"
           (if org-confirm-babel-evaluate "on" "off")))

Calling babel-confirm will tell you if confirmation is on or off. If you give it a prefix argument (call it as C-u M-x babel-confirm) it will toggle the setting and tell you what the new setting is.

If this isn’t fine enough control, you can also set org-confirm-babel-evaluate to a function. The function is passed the language of the code block and the block itself. The function can decide if the block is safe or not and return t to have Babel ask for confirmation or nil to just execute the block. All this is explained in the Org manual’s Miscellaneous section.

Posted in Programming | Tagged | 1 Comment

Kicking the Hornet’s Nest

Bloomberg Businessweek has an article up entitled Sony: The Company That Kicked the Hornet’s Nest. It’s about the break in and Sony’s penchant for suing and prosecuting “hackers.” There’s not much new in the article (although their telling of the story is interesting) but it is, I think, a bad sign for Sony when even Bloomberg is beating up on them. The article makes the point that it will be months, if not years, until the full extent of the damage to Sony, both in dollars and damage to their brand, will be known.

Posted in General | Leave a comment

Fibonacci Run Times

­

This posts combines the results of the last two posts. I thought it would be interesting to compare the actual run times of the three methods of calculating Fibonacci numbers so I put together a Babel table like the one in the Calling Babel From A Table post and ran the experiment. The results were a little disappointing because they didn’t really give me much information. The table below gives the times (in 100 Hz timer ticks) that the 3 methods took to calculate fibn for various values of n. (The code is the same as in the Calculating the Fibonacci Numbers post.)

n Recursive Iterative Logarithmic
10 0 0 0
20 0 0 0
30 15 0 0
40 1899 0 0
50 234539 0 0
100 Too Long 0 0
1000 Too Long 0 0
10000 Too Long 1 0
100000 Too Long 40 0
1000000 Too Long 8952 5

As expected, the recursive method quickly becomes impractical, taking 39 minutes to calculate fib50. The iterative method doesn’t register any time at all until fib10000 and is still under half a second for fib100000. The time for the iterative method to calculate fib1000000 is misleading because the wall clock time was much longer. The reason for that was that it used up all the memory (4 Gigs) and spent a lot of time swapping. Presumably this was the result of the big nums representations taking up memory. Interestingly, I could watch the garbage collector at work. The free memory would get down to a few MB and then start to climb again. Then it would drop again. This happened several times.

The results for the logarithmic method don’t provide much information other than that if you want to calculate fibn for large n you should use that method.

Posted in Programming | Leave a comment

Calculating the Fibonacci Numbers

If, like me, you’re fascinated by the Fibonacci numbers you should take a look at Robin Houston’s The Worst Algorithm in the World post over at the Bosker Blog.

Houston lists and analyzes the three main ways of calculating fib(n):

  • The naïve recursive method that results from programming right from the definition
    (define fib-rec
      (lambda (n)
        (if (< n 2)
            n
            (+ (fib-rec (1- n)) (fib-rec (- n 2))))))
    
  • The iterative method, which can be seen as a dynamic programming solution
    (define fib-it
      (lambda (n)
        (let fib ((k n) (a 1) (b 0))
          (if (zero? k)
              b
              (fib (1- k) (+ a b) a)))))
    
  • The logarithmic method, which is similar to fast exponentiation (see Exercise 1.19 of Structure and Interpretation of Computer Programs)
    (define fib-log
      (lambda (n)
        (define sum-of-squares
          (lambda ( a b)
            (+ (* a a) (* b b))))
        (define fib
          (lambda (n p q a b)
            (cond ((= n 0) b)
                  ((even? n) (fib (/ n 2) (sum-of-squares p q)
                                   (+ (* 2 p q) (* q q)) a b))
                  (else (fib (- n 1) p q (+ (* b q) (* a q) (* a p))
                              (+ (* b p) (* a q)))))))
        (fib n 0 1 1 0)))
    

Readers of this blog will know that the recursive method is horribly inefficient and Houston shows that, in fact, calculating (fib-rec n) takes 2 × fibn+1 – 1 calls to fib-rec. Thus (fib-rec 100) calls fib-rec 1,146,295,688,027,634,168,201 times.

We’re used to seeing just the first few terms (0, 1, 1, 2, 3, 5, …) of the Fibonacci sequence and it’s easy to forget how rapidly it grows. Here are the values of fibn for some relatively small n:

n fibn
0 0
10 55
20 6765
30 832040
40 102334155
50 12586269025
60 1548008755920
70 190392490709135
80 23416728348467685
90 2.880067194370816e+18

Another interesting thing that Houston points out is that the iterative method is actually O(n2) not O(n) as often supposed. That’s because when the numbers get large you must use a multiprecision numeric library and can no longer add the terms in constant time.

Finally, Houston gives a nice way of looking at fib-log and why it works. As I say, this is a really interesting post and worth a few minutes of your time.

Posted in Programming | Leave a comment