Emacs Coding Systems

Those of us with, ahem, extensive experience can remember when ASCII was pretty much all there was. Happily, things are much different and better today. Now Unicode is all but universal and, in the U.S. at least, UTF-8 is king. Emacs now uses UTF-8 as its internal format and for most of us UTF-8 is our preferred coding system. Sometimes, though, UTF-8 isn’t enough and we have to use some of the other coding systems. Happily, most of us seldom, if ever, encounter this need but when we do Emacs is up to the challenge.

The incomparable Mickey over at Mastering Emacs has an excellent post on how to deal with coding systems in Emacs. If you find yourself needing to use alternate (meaning non-UTF-8) coding systems this post has the information you need to get the job done. He even presents a bit of Elisp to set environment variables specifying the coding system for other applications automatically. Even if all your work is in the ASCII subset of UTF-8 it’s worthwhile bookmarking his post against the day that you need to deal with other systems.

Posted in General | Tagged | Leave a comment

The Many Faces of Regex

One of a programmer’s most useful tools, Jamie Zawinski notwithstanding, is regular expressions. In a strictly Lisp world, s-expressions solve an astounding number of problems but in the real world of mixed technologies regexes are an incredibly useful and necessary tool. The problem is that there is no single “standard regex” syntax.

Even in the Unix world, where regular expressions made their first beachhead, there are two versions of regulars expressions: basic and extended. Today there are probably dozens of versions of regular expressions but for most of us there are really only four:

  • Unix basic regular expressions
  • Unix extended regular expressions
  • Perl regular expressions
  • Emacs regular expressions

Of the four, Perl regular expressions are probably the richest variety but they all have their advantages. In an ideal world grep, egrep, perl, and Emacs would all support Perl regular expressions plus whatever useful features the others have. Sadly, that is not the case so we must deal with multiple versions.

For the weak minded proprietor of Irreal, it’s always a challenge to keep the differences in mind. One of the ways that I’ve used to overcome this is Xah Lee’s Regex Tutorial. Lee has recently updated this so I’m mentioning it again for the benefit of those who missed my previous discussion of it. I have it bookmarked and if you, like me, need some help remembering the details of the various subtypes, you should too.

Posted in Programming | Leave a comment

Two Factor Authentication for Gmail

Mat Honan’s terrifying tale of being hacked should make all of us examine our digital security closely. If, like me and many others, a significant part of your life is lived or stored on-line, Honan’s story makes clear how vulnerable you can be.

I’ll probably have more to say about the Honan debacle later but suffice to say there are a couple of really important lessons to be learned. The first is that you must have reliable and continuous backup if you care at all about your data. Read Honan’s story to see how devastating, on a personal level, its loss can be.

The second lesson is that you must secure access to your on-line accounts. Honan was the victim of social engineering but there were still things he could have done to help mitigate the damage. One of those things is to lock down your Gmail account. These days, almost everyone has at least one and they often channel multiple accounts through a single Gmail account so that it is their gateway to all their email. The loss or compromise of your Gmail account can be devastating.

One way to prevent that is to use two factor authentication on your Gmail account. That can work in a couple of ways. The simplest is that when you sign on and give your password, Google will send an SMS message to your phone with a code that you have to enter in addition to the password. You can configure this so that Gmail will trust your computer for 30 days or more so that it isn’t as inconvenient as it might seem at first.

Matt Cutts, the head of the Google Web Spam team, has a nice post and video on how to set things up and some of the ways you can work with the system. I really recommend that you check out his post and video and that you implement two factor authentication. As Honan’s ordeal makes clear, the downside of failing to do so is just too horrible.

Posted in General | Tagged | 2 Comments

More You Can’t Make This Stuff Up

One hardly knows what to make of the Megaupload fiasco. On the one hand, Kim Dotcom hardly seems the picture of innocence. On the other hand, the worst that could be said of him is that there is some evidence he was involved with copyright infringement. Why, then, was it necessary to send an elite counterterrorism unit—complete with a helicopter injection and serious combat weapons—to raid his New Zealand home and arrest him? Now, thanks to a judicial review of the raid, we have the answer.

It seems the FBI was convinced that Dotcom, Blofeld like, had a doomsday device that would erase all evidence of his piracy off all servers all over the world. I’d love to believe that story but, of course, it’s complete nonsense. Leave aside the fact that the FBI had already seized those servers and shut them down, how exactly would this work? In the event it took the task force several minutes to locate Dotcom in his home’s panic room, giving him adequate time to unleash his doomsday machine. Needless to say, neither the NZ government nor the FBI could produce any evidence of the putative device.

It seems clear that the New Zealand government and the FBI got caught flatfooted with their overreaching and made up a story that only a teenage Bond fan could believe. One needn’t be sympathetic to piracy and those who engage in it to recognize that this whole raid was way over the top. Assault weapons and helicopters for copyright infringement? Really?

Posted in General | Tagged | Leave a comment

Whither TextMate?

Allan Odgaard recently announced that he is open sourcing the popular Mac editor TextMate. I’ve long considered it one of the few editors suitable for serious programmers so I treated the announcement as good news. It means that (what I consider) the three most important editors are open source.

Others, however, were not as sanguine. John Gruber of Daring Fireball writes “Pretty sure this is it for TextMate…” and Josh Kerr says that this is what happens when you set out to rewrite your codebase. Marco Arment joked that TextMate was just sent to retire on a farm upstate. My friend Watts Martin agrees that this is the end for TextMate.

All these naysayers are smart guys and may well be correct but it will be a shame if they are. One of the things that most everyone agree really hurt TextMate was the 6 years without a major update while Odgaard rewrote the code. Now that the source is available, it’s at least conceivable that a critical mass of developers could coalesce around that codebase and provide the regular updates needed to keep TextMate a major player in the field of text editors. Certainly, there are a large number of developers who have a huge investment in the muscle memory and mental model for TextMate. Of course, most of those developers won’t have the time or inclination to take on a maintainers role but perhaps a few will. In any event, I hope so. TextMate is a great editor and deserves to lives on.

Afterword: Odgaard speaks out and insists that he is not abandoning TextMate. Indeed, he says he will continue working on it as long as he is a Mac user.

Posted in General | 1 Comment

When SSL Is Not SSL

Troy Hunt has a nice post on SSL and how many sites misuse it. As Hunt says, SSL is not about encryption. The problem that Hunt is writing about is sites that deliver a login page, say, in http and then post the login credentials over https. The idea is that the credentials are sent encrypted so everything is nice and secure. Often these sites will even display a padlock icon suggesting that the login is secure.

The problem, as Hunt explains in detail, is that the user has no way of knowing to whom those encrypted credentials are being sent. With SSL, the user is assured1 that the login screen is from the site it’s purported to be representing. If the login screen is not sent by SSL (indicated by an https connection) then the user has no way of knowing where it came from or where the login credentials will be sent once the user enters them.

Hunt gives real examples of governments exploiting this vulnerability so we’re not talking about a theoretical threat. Sadly, many sites continue to get this wrong putting their users at risk. That authoritarian governments are exploiting it means that it can literally be a matter of life and death. Remarkably, when Google moved Gmail to SSL the increase in load was within 1% of existing load so there really is no excuse for not using SSL at least for logins.

This is a great post and very informative. If you are developing Web sites, you definitely need to know this material. Hunt’s post is a great way to get started.

Footnotes:

1 For various values of “assured.”

Posted in General | Tagged | 1 Comment

Emacs Compilation Errors

Over at the Definitely a plug blog there’s a very nice mini-tutorial on using the Emacs compilation facility. When you use 【Meta+xcompile to run a compilation process, Emacs will parse the error output and allow you to step through the errors, jumping to the source of each error as you go. You can go to the next error in the current buffer with 【Ctrl+x `】 and move to the next or previous file with 【Meta+}】 and 【Meta+{】. More conveniently, you can move to the *compilation* buffer and use the buffer-specific navigation keys listed in the post to deal with each error.

Most of this is known to developers who use Emacs regularly but the really interesting part of the post is how to define rules for parsing the results of new (or previously unsupported) compilers. That this is possible should be obvious to any Emacs user but I must admit I didn’t know about it.

The really nice thing about defining new rules is that it’s pretty easy. There’s a simple example in the post that serves as a go-by. This is really interesting and useful material. If you compile things from within Emacs then I really recommend this post—it’s full of good information.

Posted in Programming | Tagged | 1 Comment

Forensics with DRAKMA

I ran across the Open States Website, which provides information on legislative activity for many (and eventually all) of the U.S. states. They have an API that allows you to query information via http and get answers formatted as JSON. I thought it would be fun to play with this in Lisp but I didn’t have a Web client so I asked Google about Lisp Web clients and got pointed to DRAKMA, Edi Weitz’s Common Lisp Web client. DRAKMA provides a simple and easy to use library that allows you to make HTTP requests in Common Lisp.

I loaded it with Quicklisp and was quickly retrieving data for my state. The Open States site is a nice resource if you live in the United States and want to keep an eye on what your state legislators are up to.

While I was playing with DRAKMA it occurred to me that it would have been perfect for investigating the malware problem that I had a while ago. If you followed that sorry tale, you’ll recall that one of my site’s WordPress PHP functions was modified to serve some malicious JavaScript for Windows users. As part of cleaning up the site I needed to verify that the JavaScript was no longer being served. I originally did that with curl but that was a little limited. Using DRAKMA I am able to send a request, pretending I’m a Windows/MSIE user, and check all the JavaScript scripts that come back. Here’s a sample run on the now clean site.

DRAKMA-USER> (ppcre:all-matches-as-strings "<script.*?</script>" (http-request "http://irreal.org/blog/" :user-agent :explorer))
GET /blog/ HTTP/1.1
Host: irreal.org
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50727)
Accept: */*
Connection: close

HTTP/1.1 200 OK
Date: Sat, 28 Jul 2012 16:00:21 GMT
Server: Apache
X-Pingback: http://irreal.org/blog/xmlrpc.php
X-Powered-By: PHP/5.2.17
Vary: *
Connection: close
Transfer-Encoding: chunked
Content-Type: text/html; charset=UTF-8

("<script type='text/javascript' src='http://irreal.org/blog/wp-includes/js/jquery/jquery.js?ver=1.7.2'></script>"
 "<script type='text/javascript' src='http://irreal.org/blog/wp-content/plugins/nucaptcha/res/js/wp-nucaptcha-form.js?ver=3.4.1'></script>")

As you can see, only two scripts are served and they are both legitimate WordPress scripts. I set DRAKMA to print the HTTP headers so that I could verify that the correct USER_AGENT was being sent. This is very nice and since you have the power of Lisp at your disposal, it’s easy to ask any appropriate question about the data that comes back. In the above example, for instance, I used Weitz’s Portable Perl Compatible Regular Expressions (PPCRE) library to pick out any scripts in the output.

Posted in General | Tagged | Leave a comment

Now Let’s All Enjoy A Moment of Schadenfreude

ZDNet is reporting that SCO is in Chapter 7 and thus effectively dead. They’ve been in Chapter 11 since 2007 and really had no hope of getting out so this news is not surprising.

Those of us with memories of the computer industry that extend as far back as 5 years will remember the uproar and disruption that SCO caused with their suit against IBM for supporting Linux, which they claimed had stolen its copyrighted IP. It turned out, of course, that Linux had done no such thing and that in any event SCO didn’t own the copyrights to Unix that they were using to press their claims. These guys were the worst sort of IP trolls and almost all of geekdom will feel a bit of satisfaction at their belated demise.

By far, the best thing to come out of the debacle was Groklaw, which even today continues to defend free software and comment on the antics of those who prefer lawsuits to innovation. For those of you who were there and remember the excitement of pulling up Groklaw everyday to see the latest news on the SCO spectacle, here’s Groklaw’s report on SCO’s demise written by PJ herself. The other great thing to come out of the SCO saga was the utter humiliation of those in the tech press who shamelessly spread SCO’s FUD and predicted their ultimate victory. We won’t sully the sacred Irreal franchise by mentioning their names but they know who they are.

Posted in General | Leave a comment

Reusability and NIH

Peter Donis has a nice post discussing a programming challenge given to Knuth and critiqued by McIlroy. Don Knuth, certainly one the premier computer scientists (period) was asked to demonstrate literate programming by solving a simple problem involving finding the n most frequent words in a given text. Doug McIlroy, who’s also well known but not nearly as much as he should be1, was asked to critique Knuth’s solution. McIlroy’s critique consisted of a 6-line shell script that solved the exact same problem as Knuth’s 10-page Pascal program.

McIlroy’s larger point was about reusability (follow the link to Donis’s post for the details) but what struck me about the story—and made me vaguely uncomfortable because I’m often guilty of this—was how often those of us with the hacker disposition are inclined to jump in and implement a solution to a problem in our $FAVORITE_LANGUAGE rather than use existing functionality as McIlroy did.

It’s amazing how often a simple shell script can solve a problem simply and elegantly. It’s just crazy that we would rather write pages of code rather than simply use a pipeline of a few of the robust and tested Unix utilities. The problem, I think, is that we really love programming and it seems like cheating to solve a problem using a line or two of shell script.

It’s worth remembering that the goal is—most often—solving the problem in the fastest and easiest way. Sometimes that means a simple shell script rather than rolling up our sleeves and cranking out code.

Footnotes:

1 This story is from Dennis Ritchie as told on his home page.

Posted in Programming | 9 Comments