Calling Babel From A Table

While I was reading the Lisp Idioms article that I blogged about yesterday, it occurred to me that the first example on accumulating a list presents an excellent opportunity to talk some more about Babel and how to use it with Org-mode tables. It will also give us a nice, if trivial, example of Reproducible Research.

Most Org-mode users are aware of the spreadsheet capabilities of Org-mode tables. You can, for example, sum all the values in a row or column or perform other familiar operations on the data. With Babel, you can also have a cell make a call to a Babel code block to get its value. Whereas in our previous examples we’ve used code blocks to build new tables, in this example we will first build the table and then have the code blocks populate (most of) the cells.

To review, in the Lisp Idioms article one of the examples was of ways to accumulate a list from a stream of values. This is, of course, a common operation in Lisp and Scheme and there are some mildly diverse ways of doing it. Just for variety, we will write our functions in Scheme rather than Common Lisp.

Cons the items onto a list and reverse the results at the end

The most common way of doing this is shown in the first code block. The cntr definition just provides a stream of integers—a new one each time it’s called. The E1 function does the actual work of assembling the list and the let at the end just calls E1 and calculates the time it took to do its work. All this should be very familiar to even beginning Scheme programmers.

(define cntr ((lambda () (let ((k 0)) (lambda () (set! k (1+ k)) k)))))
(define E1
  (lambda (n)
    (let loop ((k n) (lst '()))
      (if (zero? k)
          (reverse lst)
          (loop (1- k) (cons (cntr) lst))))))
(let ((start (get-internal-run-time)))
  (E1 n)
  (/ (- (get-internal-run-time) start) internal-time-units-per-second))

The next code block is almost the same as the first except that it uses reverse! instead of reverse to reverse the list. Theoretically, reverse! should be faster because it reuses the cons cells from the first list rather than making a separate list from scratch.

(define cntr ((lambda () (let ((k 0)) (lambda () (set! k (1+ k)) k)))))
(define E2
  (lambda (n)
    (let loop ((k n) (lst '()))
      (if (zero? k)
          (reverse! lst)
          (loop (1- k) (cons (cntr) lst))))))
(let ((start (get-internal-run-time)))
  (E2 n)
  (/ (- (get-internal-run-time) start) internal-time-units-per-second))

Append the items onto the end of the list

In some sense this is the most intuitive, if naive, method if you are not a Lisp or Scheme programmer. The problem is that append must search down the list each time to find the list’s end and is therefore very inefficient. No serious Lisp or Scheme programmer would ever do this.

(define cntr ((lambda () (let ((k 0)) (lambda () (set! k (1+ k)) k)))))
(define E3
  (lambda (n)
    (let loop ((k n) (lst '()))
      (if (zero? k)
          lst
          (loop (1- k) (append  lst (list (cntr))))))))
(let ((start (get-internal-run-time)))
  (E3 n)
  (/ (- (get-internal-run-time) start) internal-time-units-per-second))

Place the items in a vector in the order in which you retrieve them

This is certainly the fastest method and the one that, say, C programmers would use. Its problem is that you have to know the largest number of items that you can get so that you can preallocate the vector. Common Lisp has a way of extending vectors and the Lisp Idioms article has an experiment for that method but Scheme doesn’t have resizable vectors so we don’t have that case.

(define cntr ((lambda () (let ((k 0)) (lambda () (set! k (1+ k)) k)))))
(define E4
  (lambda (n)
    (let ((vec (make-vector n)))
      (let loop ((k 0))
        (if (= k n)
            vec
            (begin
              (vector-set! vec k (cntr))
              (loop (1+ k))))))))
(let ((start (get-internal-run-time)))
  (E4 n)
  (/ (- (get-internal-run-time) start) internal-time-units-per-second))

Running the experiment

We start with a table like this:

|       n | Reverse | Reverse! | Append   | Vector |
|---------+---------+----------+----------+--------|
|       1 |         |          |          |        |
|       2 |         |          |          |        |
|    1024 |         |          |          |        |
|    4096 |         |          |          |        |
|   16384 |         |          |          |        |
|  262144 |         |          | Too Long |        |
| 1048576 |         |          | Too Long |        |

The code blocks have the same names as in the header. For example, the first code block has a

#+SRCNAME Reverse(n)

just above it. The last two rows of the Append column are set to “Too Long” because the Append method takes too long to run for lists of those sizes. All the other cells in columns 2–5 are empty.

Next, we set the code for each of the empty cells to ‘(sbe @1 (n $1)). This is easily done with the table editor and results in the TBLFM line

#+TBLFM: $2='(sbe @1 (n $1))::$3='(sbe @1 (n $1))::$5='(sbe @1 (n $1))::@2$4..@7$4='(sbe @1 (n $1))

sbe is a macro that calls the code block with the name of its first argument and passes the code block the rest of its arguments. The @1 means row one of the current column and thus refers to the name of the code block from the table header. The $1 in the second argument means column 1 of the current row and thus refers to the n-value for this cell.

Now all we need do is put the point anywhere in the table and type C-u C-c C-c and the table will be filled in with the run times.

n Reverse Reverse! Append Vector
1 0 0 0 0
2 0 0 0 0
1024 0 0 1/100 0
4096 0 0 6/25 0
16384 1/100 0 387/100 0
65536 1/100 1/100 1321/20 1/100
262144 1/25 3/100 Too Long 3/100
1048576 19/100 13/100 Too Long 1/10

get-internal-run-time returns number of 100 Hz ticks used by the Guile interpreter (thus internal-time-units-per-second = 100). As expected, the Vector method is fastest but not by as much as we might expect. The other interesting result is that except for really long lists, reverse! does not outperform reverse to any significant extent. The Append method, of course, is out of the running.

This post is a simple example of Reproducible Research because everything needed to reproduce the post is contained in a single the Org-mode file. If this were a serious research paper, any scientist could request a copy of the file and run the experiments for his or herself.

Update: Corrected the definition of get-internal-run-time.

Posted in Programming | Tagged | Leave a comment

An Oldie but Goodie

I stumbled across this list of Lisp idioms on Hacker News yesterday. I’ve seen it before and if you’ve been around for a while, you probably have too. Still, it’s always fun to read it again. I enjoy the terse elegance of the little snippets of code. Who would not be charmed by this one-liner to take the transpose of a matrix (stored as a list of rows)?

(apply #'mapcar (cons #'list matrix))
Posted in Programming | Tagged | 1 Comment

LastPass and the Press

I’ve written a couple of posts about LastPass recently, both with praise for the way they are handling a potential security event. One might think that I’m a satisfied customer or even an investor but, in fact, I’d never heard of them before I wrote the Doing It Right post. I just think that a company that has a clue and does its best to serve and protect its customers deserves some praise.

That’s why it’s so outrageous that a lazy and ignorant press lumped a potential security event and LastPass’s appropriate response to it with the obviously egregious actions of Sony. It’s not like the facts are hard to understand:

  • Sony
    Stored passwords, personal information, and some credit card information in plain text, used an outdated, unpatched version of Apache, and didn’t bother to install a firewall. When the inevitable attack came and was successful they delayed notifying their customers and were less than transparent when they did.
  • LastPass
    Performed regular, scrupulous reviews of their logs accounting for every anomaly. When a change in the traffic pattern occurred that they couldn’t account for, they proactively warned their customers even though there was no direct evidence of a compromise. They immediately put precautionary procedures in place to ameliorate the damage in case a break in did occur. They kept their customers informed throughout the event with regular updates on their site and an interview with PC World.

How are these two companies’ actions even remotely similar? Yet much of the press treated them as if they were similar. Some even reported as a fact that LastPass had been “hacked.”

Patrick Mylund Nielsen has a nice post over at Throwing Fire that explores this issue a little more deeply but if you really want some red meat about the Tech Press, Michael Arrington over at TechCrunch has got it for you.

Posted in General | Tagged | Leave a comment

R and Babel

I decided to accept my own challenge so I downloaded and installed R. It’s a huge system with a lot to learn but I did manage to figure out enough to generate some basic statistics and a couple graphs. I’ll be using the same data as in the 10,000 Steps post, but I won’t bother to reproduce the table here (it’s in the org file for this post, of course, I’m just not exporting it).

If you want to use R with Babel you really need to read the Org-babel-R page. Babel works a little differently with R and it’s easy to waste a lot of time trying to get things to work because you weren’t aware of those differences.

­

The first thing I wanted to do was calculate some basic statistics from the data. Nothing fancy, just maximum value, minimum value, mean, median, and standard deviation. That’s pretty easy in R because there are R primitives for them.

Here’s the code block for the basic statistics. As before, steps is the table of data. The in[,2] is the second column of that data (it’s a slice of the array that Babel creates from the table for R’s use). I put it in vals just to avoid typing in[,2] over and over. The only tricky part was figuring out how to form the output array but that’s because I don’t really know R yet.

#+BEGIN_SRC R :var inp=steps :exports results
vals <- inp[,2]
array( c("Max", "Min", "Mean", "Median", "Std. Dev.",
  max(vals), min(vals), mean(vals), median(vals),
  sd(vals)), dim=c(5, 2))
#+END_SRC

Here’s the resulting table:

Max 18078
Min 840
Mean 11665.6428571429
Median 12572
Std. Dev. 5372.68656994208

Next I wanted to generate a pie chart showing the distribution of steps in groups of 1,000. I was thinking that I’d have to do some low level programming to build the frequency table. It turns out, however, that R has the hist command that will do that for you. It will even plot a histogram. All you need to do is pass it the data and indicate where you want the breaks to be. This simple code block

#+BEGIN_SRC R :var inp=steps :file http://irreal.org/blog/wp-content/uploads/2011/05/wpid-hist.png :results graphics :exports results
hist(inp[,2], breaks=0+1000*(0:19), plot=TRUE, main="Steps Histogram", xlab="Steps")
#+END_SRC

results in the the following histogram.

http://irreal.org/blog/wp-content/uploads/2011/05/wpid-hist.png
Most of the arguments to hist involve setting the graph and axes labels.

The output of hist is really an object (think structure) that has entries for things like the bucket breaks, bucket midpoints, and bucket counts. You can see that in action in the next code block, which uses hist again but with no plotting. Instead the output object is captured and used as input to the pie primitive.

#+BEGIN_SRC R :var inp=steps :file http://irreal.org/blog/wp-content/uploads/2011/05/wpid-pie-steps.png :results graphics :R-dev-args pointsize=7 :exports results
ftab <- hist(inp[,2], breaks=0+1000*(0:19), plot=FALSE)
pie(ftab$counts, labels = ftab$mids/1000, radius=.75, main="Frequency (thousands)")
#+END_SRC

I had to scrunch up the pie chart a bit to prevent the bucket labels from colliding too much. Still, it’s a nice chart for very little effort.

http://irreal.org/blog/wp-content/uploads/2011/05/wpid-pie-steps.png
If you do any work involving statistics, R is definitely a tool you need. Of course, if you do any work involving statistics, you already know that. Using Org-mode and Babel is a great way to have all the data, graphs, and text for a paper in one file that can then easily be exported to HTML or LaTeX. I’ve said before that I really like Babel and the more I use it and learn about it the more I like it.

Posted in Programming | Tagged | Leave a comment

Doing It Wrong

Yesterday, I blogged about a company, LastPass, that takes security seriously and has well thought out procedures in place to protect its users. LastPass is a small company run by relatively young people who have perhaps a decade’s experience in software development and running companies.

What are we to think, then, of a large international company that is almost 140 years old, with very deep pockets, and run by people with decades of experience that nevertheless can not manage to deploy a Web site in a way that will protect their users? The Wall Street Journal has announced that they are starting their very own Wikileaks called WSJ SafeHouse.

It’s an idea with some merit. Properly implemented, it would give whistle blowers a safe way to inform the public of things they believe the public should know while at the same time ensuring that the information is evaluated and vetted by experienced editors. The problems started appearing as soon as people looked at their Terms of Use. The WSJ offers three options for submitting data but does not guarantee anonymity for any of them. Right there you’ve lost any whistle blower with even a minimal survival instinct.

But it gets worse. The Terms of Use goes on to say that you warrant that you have the legal right to upload the data and that it will not violate any law or the rights of any person. In other words, whistle blowers need not apply. This is so ridiculous that it could only have been dreamed up by lawyers. It doesn’t matter, though, because no one is going to seriously consider using the site.

You may roll your eyes and think, “Well yes, but this is just the usual lawyer-babble that the WSJ has to put there to protect themselves. We all know that they will go to the wall to protect their sources.” Indeed, after the laughter started, the WSJ put out a statement saying they are committed to protecting their sources to the fullest extent possible under the law. Again, what whistle blower in his or her their right mind is going to have anything to do with these people?

It turns out, though, that the lawyers aren’t the only ones without a clue. The security community took a look at the site and reported several problems with its security. In the first place, their handoff from the unencrypted http://www.wsjsafehouse.com to the encrypted https://www.wsjsafehouse.com fails to use Strict Transport Security and therefore renders the user susceptible to a man-in-the-middle attack that strips out the SSL protection. Even worse, the site’s SSL server supports several cipher suites without perfect forward secrecy so any successful attack on the server could immediately render all previous traffic using one of those suites vulnerable to decryption.

The previously mentioned statement from the WSJ says that many of the security problems have already been fixed and promises to address the others shortly. Unfortunately, whatever credibility WSJ SafeHouse had as a secure place for whistle blowers has already been squandered and will be hard, if not impossible, to get back.

Posted in General | Tagged | Leave a comment

Doing It Right

By now, even the non-Geeks among us have heard of the Sony PSN and Online Entertainment break ins and the subsequent loss of personal information and perhaps credit card numbers of over a 100 million Sony customers. Sony admitted that at least 10,000 customers did have their debit card information compromised.

The facts that Sony did not notify the FBI of the attack for 2 days, that it did not meet with the FBI for 5 days, and that it did not notify its customers of the potential loss of their personal data for 6 days is bad enough but it gets worse. On Wednesday (2011-05-04) Eugene Spafford, a professor at Purdue University, testified before the House Subcommittee on Commerce, Manufacturing and Trade that Sony was running an old unpatched version of Apache and that they didn’t have a firewall installed. Furthermore, Dr. Spafford testified, this information was publicly discussed on an open forum, which Sony employees monitor, 2 or 3 months prior to the break in.

The story is still unfolding but it’s clear that Sony did an extremely poor job of protecting its customers’ data and notifying them in a timely fashion after the compromise took place. It’s also clear that Sony is going to take a major hit to their reputation and customer goodwill at the least. Just how much of a hit remains to be seen. And, of course, the lawsuits are already being filed.

Now consider this announcement from LastPass. The differences could not be more pronounced.

  • Proactive Security
    LastPass scours—not just checks but scours—their logs every day. They aren’t satisfied until they’ve explained every anomaly.
  • Proactive Notification
    LastPass notices an anomaly that they can’t explain. They immediately notify their customers that something may have happened and tells them what they should do to protect themselves.
  • Proactive Followup
    LastPass keeps their customers informed with frequent updates. As events unwind, they revise the suggested actions that their customers should take. The company’s CEO gives an interview to PC World in which he further explains what happened and what they are doing. He says they are trying to handle it the way they would want it handled if they were the users.
  • Proactive Security Improvements
    LastPass had already been engineering a change to use PBKDF2 to generate their password hashes. When just the suggestion of a possible breach presents itself, they immediately roll out the change. They also put in place some further procedures to help protect their users in case they were broken into. Then, although there was no indication that any of the boxes or software had been tampered with, they rebuild the servers in question and check the hashes of the repositories to make sure they hadn’t been tampered with either.

This is doing things the right way and LastPass deserves their customers’ gratitude for doing everything possible to protect the data that their users have entrusted to them. The rest of us owe them our gratitude as well. They have shown us that it is possible to handle the inevitable attacks in a responsible and open way. It’s a pity that Sony, a huge corporation, couldn’t perform even remotely as well.

Posted in General | Tagged | Leave a comment

Another region-or-thing Example

In my last post on region-or-thing, I promised an example that uses the bounds information that region-or-thing returns. This example is much like the google-search function except that instead of looking up a word or phrase in Google, it turns the word or phrase into an Org-mode link to Wikipedia.

That requires a little explanation. Org-mode uses plain text for everything it does. You could, if you like, write your Org-mode files in any editor but then, of course, the display formatting that Org-mode does wouldn’t appear and you wouldn’t have the keyboard shortcuts. In its role as a markup language, it needs a way to encode links such as the link to the region-or-thing post above. The actual link is encoded as

[[http://irreal.org/blog/?p%3D50][region-or-thing]]

although Org-mode has a more convenient way of entering links than adding all the square brackets yourself. One other thing you should know is that for Wikipedia links, blanks are replaced by underscores.

With that information, it’s pretty easy to write our function

1:  (defun org-wp-linkify ()
2:    "Turn the region or word at the point into an org Wikipedia link"
3:    (interactive)
4:    (let* ((thing (region-or-thing 'word))
5:           (subject (replace-regexp-in-string " " "_" (elt thing 0))))
6:      (delete-region (elt thing 1) (elt thing 2))
7:      (insert (concat "[[http://en.wikipedia.org/wiki/" subject
8:                      "][" (elt thing 0) "]]"))))

On line 4, we call region-or-thing to get the vector describing the word or phrase and put it in the local variable thing. On line 5, we replace any spaces in the text with an underscore and store the result in subject. Next we delete the word or phrase from the Emacs buffer (line 6)—note how we use the bounds of the text to do that. Finally, we insert the link to Wikipedia in its place.

That’s a surprising amount of functionality for 8 lines of code (3 of which are boilerplate), but then it’s Lisp. Notice that some simple and obvious changes to lines 7–8 could make the function build a “normal” HTML link for use in non-Org files.

Posted in Programming | Tagged | Leave a comment

region-or-thing

Emacs Lisp has an amazingly useful function called thing-at-point, which will return the object that the point is currently on. You call it as

(thing-at-point THING)

where THING is a symbol that specifies the kind of object you want. Possibilities, as listed on the help page, include ‘symbol’, ‘list’, ‘sexp’, ‘defun’, ‘filename’, ‘url’, ‘email’, ‘word’, ‘sentence’, ‘whitespace’, ‘line’, ‘page’, and others. There is a companion function called bounds-of-thing-at-point that returns the bounds of the object.

As an example of using thing-at-point, suppose we want a function that will do a Google lookup of the symbol that the point is on. We can do that as

(defun google-search ()
  "Do a Google search of the symbol at the point"
  (interactive)
  (browse-url (concat "http://www.google.com/search?q="
                      (thing-at-point 'symbol))))

We’re using symbol instead of word because it’s more inclusive.

Of course, as soon as we have this function we realize that we sometimes want to look up a phrase. So now we have to worry about whether or not a region is active. We end up with something like this:

(defun google-search ()
  "Do a Google search of the region or symbol at the point"
  (interactive)
  (let ((phrase (if (use-region-p)
                    (buffer-substring-no-properties
                     (region-beginning) (region-end))
                  (thing-at-point 'symbol))))
    (browse-url (concat "http://www.google.com/search?q="
                        (replace-regexp-in-string " " "+" phrase)))))

The buffer-substring-no-properties function returns the substring in the region but removes all the text properties. The call to replace-regexp-in-string replaces any spaces in the text with plus signs as required by Google.

This version of google-search is reasonably complete except that it doesn’t cover some corner cases such as an ‘&’ in the region’s text. But for purposes of the example, let’s call it good enough.

The next thing we notice is that we keep writing code like the above to handle the two cases of a region or a single object at the point. So we abstract that code out to a function called region-or-thing, which returns a vector whose first entry is the text in question, and whose second and third entries are the bounds of that text.

(defun region-or-thing (thing)
  "Return a vector containing the region and its bounds if there is one
or the thing at the point and its bounds if there is no region"
  (if (use-region-p)
      (vector (buffer-substring-no-properties (region-beginning) (region-end))
              (region-beginning) (region-end))
    (let* ((bounds (bounds-of-thing-at-point thing))
           (beg (car bounds))
           (end (cdr bounds)))
      (vector (buffer-substring-no-properties beg end) beg end))))

The region-or-thing function is based on an idea from Xah Lee who has a much more elaborate version called get-selection-or-unit.

With region-or-thing we can rewrite google-search as

(defun google-search ()
  "Do a Google search of the region or symbol at the point"
  (interactive)
  (let ((phrase (elt (region-or-thing 'symbol) 0)))
    (browse-url (concat "http://www.google.com/search?q="
                        (replace-regexp-in-string " " "+" phrase)))))

The google-search function doesn’t make use of the bounds information that region-or-thing returns. In a subsequent post, I’ll show some examples that do make use of the bounds.

Posted in Programming | Tagged | 1 Comment

AirPush: NOT Coming to an iPhone Near You

Apple’s App Store gets a lot of grief for their locked down curation process—seemingly always with a reference to “walled gardens.” There’s a lot of truth in many of those complaints but only if you’re an iPhone developer; users aren’t much affected and are, in fact, somewhat protected by the policies. Case in point: AirPush. AirPush is a mobile ad network that pushes advertisements to the Android notification tray. App developers can include the software in their apps and earn money from the ads. These ads appear at random times and even if the app that is pushing them is not running. Of course, many free apps have advertisements but usually only when the app is running and certainly not in the notification tray. Here’s an AirPush pitch to developers.

This is apt to be popular with some developers and as Quentyn Kennemer points out, perhaps inevitable. Still, there’s been a lot of push (heh) back: Martin Adamek, apparently the first to implement the system in his popular APNdroid app, had his app suspended by Google after users complained it was malware. Adamek apologized to his users, removed AirPush, and said he had learned his lesson.

Joshua Scholl worries that AirPush could destroy the Android Market. He notes that the hugely negative reaction to it has caused AirPush to back off some of their more odious practices. He also notes that this sort of thing doesn’t happen with IOS apps because Apple is concerned with the iPhone’s user experience.

And that’s exactly the point. While developers may have cause to hate Apple’s curation policies, their users have reason to be grateful for them.

Posted in General | Leave a comment

10,000 Steps

About a year ago, I bought Gordon Bell and Jim Gemmell’s book Total Recall: How the E-Memory Revolution Will Change Everything. That was the first time I had ever read about the health benefits of taking at least 10,000 steps every day. Right afterward, of course, I began seeing references to it everywhere. I like measuring and keeping track of things—although I’m not as obsessive as these guys—so I ordered an Omron HJ-150 Pedometer from Amazon and started keeping track. I walk a lot and I was sure that I was easily doing 10,000 steps a day. Hah! Not even close. It turns out it was more like 6000–7000 so I started walking more and every morning I would record the previous day’s steps in an Org-mode table. Here’s the last 4 weeks worth of data:

Date Steps
2011-04-04 Mon 18078
2011-04-05 Tue 4025
2011-04-06 Wed 11820
2011-04-07 Thu 15827
2011-04-08 Fri 15990
2011-04-09 Sat 11048
2011-04-10 Sun 3151
2011-04-11 Mon 12974
2011-04-12 Tue 17522
2011-04-13 Wed 16498
2011-04-14 Thu 11476
2011-04-15 Fri 15877
2011-04-16 Sat 840
2011-04-17 Sun 984
2011-04-18 Mon 9253
2011-04-19 Tue 15125
2011-04-20 Wed 12010
2011-04-21 Thu 17198
2011-04-22 Fri 16229
2011-04-23 Sat 13029
2011-04-24 Sun 2674
2011-04-25 Mon 15151
2011-04-26 Tue 15746
2011-04-27 Wed 11496
2011-04-28 Thu 12001
2011-04-29 Fri 15112
2011-04-30 Sat 12170
2011-05-01 Sun 3334

Of course, once you have data like this there’s an overwhelming urge to do something with it. If I were an R user, I could run all sorts of statistical calculations on it but since I’m not, I had to settle for graphing it. That’s OK because it gives me an excuse to play with Babel some more. I just added

#+BEGIN_SRC gnuplot :var data=steps :file http://irreal.org/blog/wp-content/uploads/2011/05/wpid-steps.png
set size .75,1
set xdata time
set timefmt "%Y-%m-%d"
set xrange ["2011-04-03":"2011-05-02"]
set timefmt "%Y-%m-%d-00:00:00"
set format x "%m/%d"
plot data using 1:2 with linespoints pointtype 13 title 'Steps'
#+END_SRC

to the end of the file, typed C-c C-c and had a nice graph of my data:

http://irreal.org/blog/wp-content/uploads/2011/05/wpid-steps.png

As you can see, I take Sundays off. Perhaps I’ll install R and run some statistics on it. Babel has an R mode, you know.

Posted in Programming | Tagged | Leave a comment