Replace Digits By Subscripts In Emacs

Xah Lee had an interesting Emacs Lisp problem last week but I was away and unable to join the fun. The problem is to replace any occurrence of a digit in a string by the equivalent subscript. Thus 1, 2 and so on. As soon as I saw the problem the following solution popped into my mind

(defun replace-digits-by-subscript (str)
  (replace-regexp-in-string
   "[0-9]"
   (lambda (v) (format "%c" (+ (string-to-number v) 8320))) str))

This works because the difference between a digit’s codepoint and the corresponding subscript’s codepoint is always 8320. As each digit is found, it is converted to a number, added to 8320, and then output as a string by the format function.

Someone else has a similar, though slightly more complicated, solution but Lee didn’t like it because it depends on the particular coding of the character sets. I disagree with that. Although it’s a good rule in general not to depend on a character’s encoding, with Unicode there is only one, fixed encoding so there is no harm in making use of the structure of that encoding.

If you wanted to do this interactively on some buffer, a similar solution uses query-replace-regexp with a regular expression of \([0-9]\) and the replacement string

\,(format "%c" (+ \#1 8320))

Actually, it’s too bad replace-regexp-in-string doesn’t support the \, construct as that would make the solution to the original problem even easier.

This entry was posted in Programming and tagged . Bookmark the permalink.