Xah Lee recently tweeted out a reference to one of his old pages on Emacs Lisp. This one considers specifying Unicode escape sequences in Emacs strings. It’s a handy guide and well worth giving a look. Doubtless, you could find the same information in the documentation but Lee’s page is a useful guide, especially since most of us won’t need the facility very often.
The TL;DR is that if you want specify a Unicode character in an Emacs string, you can either insert it directly like this: “Here is the Unicode character 😁 in a string.” Or you can specify it as an escape sequence like this: “Here is the Unicode character \U0001f601 character in a string.”
The rules are pretty simple. If the code point for the Unicode character has 4 or less hexadecimal digits, you specify it as \uxxxx
where each x
is a hexadecimal digit and you must add leading zeroes if necessary. If the code point has 5 or 6 hexadecimal digits, you specify it as \U00xxxxxx
where, again, the each x
is a hexadecimal digit and you must add leading zeroes.
Take a look at Lee’s post for more details and further examples.