(Lots of) Stuff You Didn’t Know About Emacs and Unicode

Christopher Wellons has another great post on the minutia of Emacs. This time it’s about Emacs unicode pitfalls. Most of us know that Emacs uses UTF-8 as its internal data representation but little more. That’s mostly Okay because almost all the time Emacs does the right thing without us having to worry about it.

It turns out, though, that there are some edge cases that you probably don’t know about. For example, some (non-ASCII) characters display identically but have different code points. That means, for instance, that equality operations can fail. Wellons discusses these and shows how to deal with them.

You will probably almost never have to deal with these cases but Wellons tells you what to look out for and how to deal with them when they do occur. This is a post you should read and bookmark for those times when you need the information it contains.

This entry was posted in Programming and tagged . Bookmark the permalink.