A Simple Programming Praxis Solution in Elisp

Programming Praxis has a simple problem that is meant to explore how easy your favorite programming language makes it to work with strings. The problem is to double every blank in a string. That’s pretty simple but some languages make you deal with string allocation or perhaps, in the case of languages with immutable strings, repeatedly allocate new strings.

I have two solutions for this using Elisp. The first is a sort of direct solution that doesn’t think too hard about the special functions Emacs has for dealing with strings.

(defun double-blanks (string)
  (with-temp-buffer
    (dolist (ch (string-to-list string))
      (insert ch)
      (if (eql ch ? ) (insert ? )))
    (buffer-string)))

This code deals with the fact that strings in Elisp are immutable by doing all the work in a temporary buffer as I discussed in this post. Except for the temporary buffer trick, it’s pretty much what any programmer would write. Of course, that string-to-list call is going to do a lot of consing, so for a really long string it might make sense to get the string characters by hand.

Of course, this is Emacs so there’s a prebuilt solution.

(defun double-blanks (string)
  (replace-regexp-in-string " " "  " string))

The replace-regexp-in-string function works by building up a list of matches and combining the list into a single string at the end so it also avoids the problem a immutable strings. If there are a lot of blanks, replace-regexp-in-string is going to do a lot of consing too.

If you’re dealing with large strings with lots of blanks, the best strategy is probably to combine the two approaches above: use a regular expression search for the blanks and build the output string in a buffer as you find them. That wouldn’t be hard and makes a good exercise for anyone interested in learning Elisp.

UPDATE [2017-04-10 Mon 13:49]: In the comments Phil remarks that string-to-list isn’t very efficient and suggests another strategy. I replied that it would be interesting to see if my (unimplemented) strategy or his would be more efficient and said that perhaps I’d try them out and report back. Phil, being Phil, took care of that before I could get around to it and go some interesting results.

For some reason his main post with the results isn’t showing even though it appears in the recent comments bar. Here’s the meat of that comment. I have to agree with Phil that it’s a bit surprising that the split-string method was the winner but that goes to show that one’s intuition about these things is most often wrong.

(length teststr)    
2070    

(length (split-string teststr " "))    
553    

(benchmark-run 10000 (double-blanks1 teststr)) ;; string-to-list    
(28.203663436 967 6.414597634999999) ;; uncompiled    
(23.110737061000002 968 8.26656119499986) ;; compiled    

(benchmark-run 10000 (double-blanks2 teststr)) ;; replace-regexp-in-string    
(21.614643389999998 1913 11.981117643999973) ;; uncompiled    
(25.317938649 1913 15.727532714000041) ;; compiled    

(benchmark-run 10000 (double-blanks3 teststr)) ;; search-forward/insert    
(8.588276816 149 0.9523469339999999) ;; uncompiled    
(8.355207331 147 1.2536677860000367) ;; compiled    

(benchmark-run 10000 (double-blanks4 teststr)) ;; split-string/mapconcat    
(7.9514880880000005 476 3.2033875379999657) ;; uncompiled    
(7.953855983 477 3.237625087999996) ;; compiled
This entry was posted in Programming and tagged , . Bookmark the permalink.
  • grimnebulin2

    Emacs strings are mutable; you can change individual characters with the aset function. You can't change the string's length, though, which considerably restricts the situations where the mutability is useful.

    I once dealt with a library that invoked an external command, passing it a "-h" switch. For reasons that escape me, my local version of the command needed the switch specified as "-H". I couldn't easily tell the library to use the alternate switch directly, but I did have access to the command before it was called, so I used aset to capitalize the switch. Hacky, but it worked.

  • Phil

    Using with-temp-buffer is a classic elisp approach, but using string-to-list and processing that list is unnecessarily complicated compared to inserting the string into the temp buffer, and then processing the buffer text directly. While search-forward for a space, insert another space. No additional list required.

    • jcs

      Absolutely, that's why the caveat about the consing that string-to-list is going to do. An interesting question is whether your solution or the (unimplemented) one that I mentioned at the end is more efficient. My solution brings in the heavier machinery of the regex engine but doesn't require repeatedly moving the tail of the string to the right as yours does. If I over caffeinate in the next couple days, perhaps I'll try them out and write up the results. Probably scanning the original string in place and moving the pieces into the buffer is fastest, if slightly more complicated.

      All that said, if I were writing code that wasn't time critical or was only going to be used once or twice, I'd probably use string-to-list just because it's easy and gets me back into the Lisp comfort zone.

      • Phil

        I've just done some benchmarking:


        (length teststr)
        2070

        (length (split-string teststr " "))
        553

        (benchmark-run 10000 (double-blanks1 teststr)) ;; string-to-list
        (28.203663436 967 6.414597634999999) ;; uncompiled
        (23.110737061000002 968 8.26656119499986) ;; compiled

        (benchmark-run 10000 (double-blanks2 teststr)) ;; replace-regexp-in-string
        (21.614643389999998 1913 11.981117643999973) ;; uncompiled
        (25.317938649 1913 15.727532714000041) ;; compiled

        (benchmark-run 10000 (double-blanks3 teststr)) ;; search-forward/insert
        (8.588276816 149 0.9523469339999999) ;; uncompiled
        (8.355207331 147 1.2536677860000367) ;; compiled

        (benchmark-run 10000 (double-blanks4 teststr)) ;; split-string/mapconcat
        (7.9514880880000005 476 3.2033875379999657) ;; uncompiled
        (7.953855983 477 3.237625087999996) ;; compiled

      • Phil

        So (mapconcat 'identity (split-string string " ") " ") turns out to be the best overall performer in my tests, interestingly enough. In the pathological case of a string which is entirely spaces, it was notably quicker (19s) than the search-forward/insert loop (which was 2nd best at 29s). For a test string with no spaces, only the string-to-list approach took any appreciable amount of time to run.

        • jcs

          I got notified of your other post (with the results) by Disqus but it's not showing up here for some reason. If it doesn't resolve itself by today, I'll add your results as an update to the original post.