Replace Digits By Subscripts In Emacs

Xah Lee had an interesting Emacs Lisp problem last week but I was away and unable to join the fun. The problem is to replace any occurrence of a digit in a string by the equivalent subscript. Thus 1, 2 and so on. As soon as I saw the problem the following solution popped into my mind

(defun replace-digits-by-subscript (str)
   (lambda (v) (format "%c" (+ (string-to-number v) 8320))) str))

This works because the difference between a digit's codepoint and the corresponding subscript's codepoint is always 8320. As each digit is found, it is converted to a number, added to 8320, and then output as a string by the format function.

Someone else has a similar, though slightly more complicated, solution but Lee didn't like it because it depends on the particular coding of the character sets. I disagree with that. Although it's a good rule in general not to depend on a character's encoding, with Unicode there is only one, fixed encoding so there is no harm in making use of the structure of that encoding.

If you wanted to do this interactively on some buffer, a similar solution uses query-replace-regexp with a regular expression of [0-9] and the replacement string

\,(format "%c" (+ \#1 8320))

Actually, it's too bad replace-regexp-in-string doesn't support the \, construct as that would make the solution to the original problem even easier.

This entry was posted in Programming and tagged . Bookmark the permalink.
  • nice solution.

    Nice use of "format" that i always underused. (because somehow i find "format" a bad hack, from its C lineage root. More precisely, i think because its syntax is complete ad-hoc. e.g. it supports printing of hexadecimal, but not arbitrary number base, and its syntax cannot be extended... Much of these thoughts came from my Mathematica background where everything is as general as possible)

    after reading your comment about using charset code point solution, it lessened my thought about it being a hack.

    • jcs

      Yup, it's really just printf but I've crushed so much C code that it seems natural to me.