Unicode Representation in Emacs Strings

Xah Lee posted a useful fact that I’m sure I knew but had forgotten or at least not internalized. The tip is how to encode unicode in Emacs strings. Given that Emacs supports Unicode and, indeed, uses UTF-8 as its default file format, you can usually just place the Unicode symbol right in the string. You can also encode the symbol as \uXXXX or \UXXXXXX—see Lee’s post for details.

Why would we need this alternative representation? As Lee points out, sometimes you want to embed a non-printable character in the string and the \u or \U representation is more convenient—especially when the non-printable character involves cursor motion of some type.

Another example is to deal with missing glyphs in a font. For example, the Inconolata font that I use doesn’t support some of the glyphs that I need. An easy way of representing them is to use the alternative encoding. I can still embed the glyph in the string, of course, but it will appear as a tiny sliver of white space. Unless you look carefully, it just appears as if nothing is there. With the alternative encoding you can see that something’s there, even if you have to look up what it represents.

This entry was posted in Programming and tagged . Bookmark the permalink.