Removing Diacritics From A String

Jeremy Friesen has another interesting post. This time it’s about removing diacritics from a string. His problem is that he uses the same string for the title of a post as well as the name of the file containing the post but when a title contains diacritics, he doesn’t want those diacritics to appear in the file name.

To solve that problem he wrote some Elisp to replace each character having a diacritic with the corresponding character without the diacritic. For example, ç would be replaced with c. His code works fine but strikes me as being unlispy even though it’s written in Elisp.

Here, as an exercise, is my version of his code. First we need to generate the mapping between the characters with diacritics to those without. Friesen’s code produces CONS cells with the characters as strings: ("ç" . "c") for example. Since we will be replacing characters, it makes sense to keep the mappings as characters rather than strings. Here’s code to replace his code to produce the mapping jf/diacritics-to-non-diacritics-map.

(defvar jf/diacritics-to-non-diacritics-map
  (map 'list (lambda (a b) (cons a b))
       "ÀÁÂÃÄÅàáâãäåÒÓÔÕÕÖØòóôõöøÈÉÊËèéêëðÇçÐÌÍÎÏìíîïÙÚÛÜùúûüÑñŠšŸÿýŽža"
       "AAAAAAaaaaaaOOOOOOOooooooEEEEeeeeeCcDIIIIiiiiUUUUuuuuNnSsYyyZz"))

That seems to me to be much simpler than Friesen’s solution. There’s no nasty indexing into a list: map handles the whole thing.

His code to actually replace the diacritics is reasonably Lispy and his use of reduce is clever. It would never have occurred to me. Here’s what I came up with:

(defun jf/remove-diacritics-from (string)
  "Remove the diacritics from STRING."
  (map 'string (lambda (c) (or (cdr (assoc c jf/diacritics-to-non-diacritics-map)) c)) string))

If nothing else, my code has the advantage of parsimony.

Regardless of what code you prefer, Friesen’s idea is a useful one. Why have diacritics in a file name that only serve to make it harder to type?

This entry was posted in General and tagged . Bookmark the permalink.