The Unix Repository Project

I’m a big believer in the notion that one of the best ways to become a master programmer is to study the code of those who already are. There was a time when that was a lot harder than it is today. Once upon a time there was no Linux and Unix source code was not available. Eventually, the BSD and early AT&T Unix source code became widely available and it became easy to study how the masters did it.

I’ve collected all that code and spent many happy hours studying it and absorbing its lessons. Sadly, it’s scattered across a bunch of CDs and the hard disks of decommissioned computers. It would be nice to have the collection organized and easily accessible.

Diomidis Spinellis to the rescue. He’s created a GitHub repository of most of that code and is working on filling in the metadata as much as possible. The project makes it easy to browse the code for fun or to look up some particular aspect. The project is on-going and it would be great if the later Unix code was also made available. The Unix source code is an important part of our profession’s cultural heritage and it would be nice to have it all available in a unified git repository.

Posted in Programming | Tagged | Leave a comment

Eshell and Abbreviations

I’m a big fan of eshell and much prefer it to running Bash inside a terminal buffer. I like to run it full frame and wrote some code to automatically save my window configuration and then run eshell in the entire frame. When I quit eshell, the old configuration is restored.

Xah Lee has an interesting post on using abbreviations with eshell. He points out that eshell is superior to a running a Bash shell because in addition to having all the power of Emacs available, you can leverage the Emacs abbreviation system. Lee, of course, is very concerned with ergonomics and invoking functionality with the least effort possible. One way he does that is to have a collection of abbreviations for some of the complex shell commands he runs.

But now there’s a problem. He has a lot of abbreviations and many have similar names because they are variations on a theme so he has a hard time remembering them. To solve that problem he wrote a bit of Elisp to access those abbreviations through ido. For example, several of his abbreviations start with img. To choose the correct one he just types 【i】 and ido provides him with a handy list that he can select from it the usual ido way.

If, like Lee, you have a lot of long, complex shell commands you might want to steal his code or work up something similar of your own. Even if you don’t have this particular problem, you still have to love how Emacs allows you to adapt it to your workflow.

Posted in General | Tagged | Leave a comment

ido Completions Buffer

I just learned something that the rest of you probably already know. Sometimes when using ido you can get a long list of candidates. Scrolling through them can be a pain. That’s helped a bit by ido-vertical-mode, which I like very much and recommend. Sometimes, though, there’s just too many choices too easily pick one.

It turns out that if you type ‘?’ ido will pop open a buffer with all the possible completions in it. Then you can either click or press return on the desired item or, I suppose, use it do further refine your partial result.

This isn’t a problem I have very often but for those times when I do, this is a very handy solution. I’m happy to have discovered from this reddit post.

Posted in General | Tagged | 1 Comment

Troy Hunt on the Cobra Effect

Troy Hunt comments on that incredibly silly tweet by British Gas explaining why they disable pasting into the password field on their site. I wrote about that here. Sadly, it turns out that this practice is more widespread than I thought.

Hunt explains how this is a fine example of the cobra effect. These sites aren’t, of course, doing this just to annoy their customers. They believe that they’re improving the security of their site. They aren’t; just the opposite. And that’s why it’s an example of the cobra effect: the supposed solution actually makes the problem worse.

Here’s what happens. Disallowing pasting into the password field essentially makes password managers useless1. No one is going to type a 20 character random string into a password field every time they want to log in. Therefore, they pick a short password but because they have to be able to remember it, they make it a real word, maybe trying to obscure it a bit with ‘leet speak. This guarantees that their password will be compromised as soon as someone gets a hold of the hash database. So rather than making the site more secure, they actually make it less secure: a perfect example of the cobra effect.

Take a look at Hunt’s post to see some of the other clueless sites that do this. He even gives a reason for doing it that makes a bit (but only a bit) more sense than the one British Gas offered.

Footnotes:

1

Some password managers use autofill instead pasting the password into the field. If the site is just using the onpaste keyword in the Javascript input statement to disable the pasting, these password managers may still be able to fill the field.

Posted in General | Tagged | Leave a comment

Emacs and Vi

Mike Kozlowski has a thoughtful post on the lessons of Emacs and Vi(m). His thesis is that while the lessons of Emacs have largely been absorbed by those writing editors, those of Vi have not. Emacs versus Vi is one of our industry’s oldest religious wars, of course, but there’s a lot more heat than light in the skirmishes. Happily, Kozlowski has some some light to shed on the matter.

Kozlowski says that the big lesson from Emacs is extensibility and that “modern” editors have incorporated this lesson. Thus all the new editors have some sort of—I would argue inferior—extension language that allows you to customize the editor. It’s hard to argue with this and, indeed, I’ve said the same thing many times in this blog.

But what about Vi and its progeny? What everyone says about Vi—and don’t forget I was a long time Vi user before I switched to Emacs—is that it’s extraordinarily efficient at editing text. As Kozlowski explains, this is because Vi commands are composable. Thus the delete command can be composed with motion commands to provide a panoply of delete commands that all make sense once you know the command for delete and the commands for motion. If you now learn the command for copy, you can compose it with the motions command to have a suite of copy commands. Emacs doesn’t have this: every delete (say) command is different and has nothing to do with the motion commands. This is a powerful concept and goes a long way in explaining Vi(m)’s enduring popularity.

I believe that Emacs’ provision of a Lisp machine more than makes up for the advantages of Vi’s composability but no one can deny how super efficient you can be with Vi. As usual, we all make our own decisions about what editor to use but whatever choice you make it’s worth remembering that those on the other side have a point.

Posted in General | Tagged , | 4 Comments

Removing Repeated Occurrences of a Target Character From a String

The Problem

While I was going through my RSS feed the other day, I came across this Programming Praxis problem and thought right away that the solution called for a state machine. It’s one of those problems that seems easy enough but if you (or, at least, I) try to write it straight out without a state machine you end up drowning in little details and inevitably get something wrong.

Later, I thought that it would serve as an excellent example of Christopher Wellons’ buffer passing style that I wrote about previously. I coded it up from memory so of course I got the problem wrong. Not just wrong but backwards. It doesn’t really matter for our purposes so the problem we’re solving is to remove any repeated instance of the target character. Thus if X is the target character, then XabXXcdXeXXXfXXabcdXefX.

We want to write a function, remove-repeated that will take the target character and a string and remove all repeated sequences of the target character. You can see right away why it’s a perfect example of the problem Wellons was discussing: we need to build an output string piece by piece so the buffer passing style is just what we need.

The State Machine

This is a pretty simple problem so a simple state machine with three states is all we need.

remove-sm.png

The state diagram doesn’t show any actions—see the code for that—it just lists the states and their transitions. In the diagram, X stands for the target character and ANY is any other character.

The Code

Here’s the state machine implemented in Emacs Lisp. If there were a lot of states, the case-style implementation wouldn’t be very efficient but it’s fine for our three states.

 1: (defun remove-repeated (ch-to-remove string)
 2:   "Remove any sequence of two or more occurrences of CH-TO-REMOVE
 3: from STRING."
 4:   (let ((state :start))
 5:     (setq case-fold-search nil)
 6:     (dolist (current-ch (string-to-list string))
 7:       (case state
 8:         (:start (if (char-equal current-ch ch-to-remove)
 9:                     (setq state :check-dupe)
10:                   (insert current-ch)))
11:         (:check-dupe (if (char-equal current-ch ch-to-remove)
12:                          (setq state :remove-dupe)
13:                        (insert ch-to-remove)
14:                        (insert current-ch)
15:                        (setq state :start)))
16:         (:remove-dupe (unless (char-equal current-ch ch-to-remove)
17:                         (insert current-ch)
18:                         (setq state :start)))))
19:     (when (eql state :check-dupe)
20:       (insert ch-to-remove))))

The when in line 19 takes care of the case where there is a single target character at the end of the string.

When we run the code with

(with-temp-buffer
  (remove-repeated ?X "XabXXcdXeXXXfX")
  (buffer-string))

we get XabcdXefX as expected. Notice how remove-repeated doesn’t bother building a string. It just puts the required characters in the current buffer and returns. The caller coerces it into a string and the temporary buffer is killed when control flows out of the with-temp-buffer form.

To illustrate the flexibility of this approach, suppose we want to return the length of the resulting string. We could, of course, just change the last line of the call to

(length (buffer-string))

but a more efficient solution is to simply replace the last line with

(buffer-size)

and not bother forming the actual string at all.

Posted in Programming | Tagged , | 2 Comments

Pictures of NSA Corrupting New Networking Equipment

We knew about the NSA’s intercepting network equipment and installing backdoor hardware before shipping it on to its intended destination. The idea, of course, was to enable easy access to networks that would otherwise hard to exploit. The NSA describes this program as one of their most productive TAO1 operations.

It’s one thing to know this but it’s nevertheless shocking to see actual pictures of them installing the hardware. The pictures are one of the most recent revelations in the on-going publication of the Snowden documents.

If you are a non-American buyer of networking equipment would you purchase from an American company? What if you are an American buyer? It’s not clear whether the vendors are complicit in this program or not but either way they should be doing everything in their power to disassociate themselves from it. Otherwise, customers will be running for the exits. Really, it’s hard to see why that’s not already happening.

If these vendors were complicit, then they’re getting what they deserve. If they weren’t, then they’ve got a real beef with the US government, which they should pursue in the courts.

Footnotes:

1

Tailored Access Operations.

Posted in General | Tagged | Leave a comment

The PhD Movie

If you’ve been a graduate student since the late 1990’s, you’re probably familiar with PhD Comics, a comic drawn by Jorge Cham that captures the absurdities of his graduate studies in a way that will resonate with any graduate student. If you have graduate student experience and haven’t seen PhD Comics, you should definitely add it to your daily reading list.

A couple of years ago, Cham recruited some graduate students to make a movie that captures some of the fun of the comic. It’s about an hour long and has been for sale since its release. Now Cham is raising funds for a second movie and as part of that effort you can view the first movie for free during June. Just go to this page to enjoy the fun. It’s amazing how well the actors capture the comic’s characters.

You probably have to have been a graduate student to appreciate the in-jokes and humor of the movie but if you have that experience, the movie is sure to bring back those days of joy and pain. I love this comic and read it every day and I really enjoyed the movie. Now’s your chance to see it for free.

If you like it or are a PhD Comics fan, you might want to kick in some funds for the new movie. Just follow the second link above for the kickstarter page.

Posted in General | Leave a comment

Magnar Continues His WebRebels Videos

The estimable Magnar Sveen has, to use his words, come out of hibernation and continued the posting of his WebRebels talk. This video is part 5 of the series.

It’s been a while so if you don’t remember, the official video of the talk had technical problems but Sveen has recreated it bit by bit. If you’ve never seen Sveen talk, it’s a real treat. He’s an excellent speaker and his mastery of Emacs is impressive.

He’s most known for his spectacular Emacs Rocks! videos. If you’re an Emacs user and haven’t watched them, stop right now and have a look. His original talk at the Oslo WebRebels is astounding and you should definitely watch it too. You can get links to the other videos for the current WebRebels talk at this page.

If you like these videos and want to learn a bit more about Sveen, Sacha Chua interviewed him in one of her Emacs Chats.

Posted in General | Tagged | Leave a comment

Mutable String and Emacs Buffer Passing Style

Speaking of Christopher Wellons, he’s got a very interesting post over at null program on mutable strings and Emacs buffer passing style. Wellons starts by pointing out that strings in Emacs Lisp, like many other languages, have a fixed size and any operation that changes the size requires that a new string be allocated and current content copied into it. That can be a problem when you’re building up a string incrementally. Next, he observes that Elisp has a natural character-oriented data structure: buffers. Why not use a buffer to build up the string components and then turn the buffer into a string with buffer-string. Something along the lines of:

(with-temp-buffer
  ;; build up the string
  (buffer-string))

That’s a good tip all by itself but Wellons goes further.

Suppose you are writing a function to construct a string. A natural technique is to have the function build the string in a buffer and then pass the buffer back to the caller who can do whatever they need with it (perhaps never coercing it into an actual string). The problem is that using that method makes it easy to leak buffers, which are not garbage collected. Wellons has a nice example of how that can happen in his post.

To avoid this problem, Wellons advocates what he calls a buffer passing style. With this approach, a buffer is implicitly passed to the function, which fills it with the desired data. “Implicitly” because the caller arranges for the current buffer to be the buffer that the function uses. Then the caller can kill the buffer after using the data. The actual idiom is something like:

(with-temp-buffer
  (fill-buffer-with-desired-data)       ;string building function
  ;; process the data
  )

This is nice because the with-temp-buffer macro takes care of killing the buffer when you reach the end of the macro body.

Take a look at Wellons’ post. It’s very informative and gives some fleshed out, realistic examples rather than the skeletons I give here.

Posted in Programming | Tagged , | Leave a comment