Programming Praxis has a simple problem that is meant to explore how easy your favorite programming language makes it to work with strings. The problem is to double every blank in a string. That’s pretty simple but some languages make you deal with string allocation or perhaps, in the case of languages with immutable strings, repeatedly allocate new strings.
I have two solutions for this using Elisp. The first is a sort of direct solution that doesn’t think too hard about the special functions Emacs has for dealing with strings.
(defun double-blanks (string) (with-temp-buffer (dolist (ch (string-to-list string)) (insert ch) (if (eql ch ? ) (insert ? ))) (buffer-string)))
This code deals with the fact that strings in Elisp are immutable by doing all the work in a temporary buffer as I discussed in this post. Except for the temporary buffer trick, it’s pretty much what any programmer would write. Of course, that
string-to-list call is going to do a lot of consing, so for a really long string it might make sense to get the string characters by hand.
Of course, this is Emacs so there’s a prebuilt solution.
(defun double-blanks (string) (replace-regexp-in-string " " " " string))
replace-regexp-in-string function works by building up a list of matches and combining the list into a single string at the end so it also avoids the problem a immutable strings. If there are a lot of blanks,
replace-regexp-in-string is going to do a lot of consing too.
If you’re dealing with large strings with lots of blanks, the best strategy is probably to combine the two approaches above: use a regular expression search for the blanks and build the output string in a buffer as you find them. That wouldn’t be hard and makes a good exercise for anyone interested in learning Elisp.
UPDATE Phil remarks that
string-to-list isn’t very efficient and suggests another strategy. I replied that it would be interesting to see if my (unimplemented) strategy or his would be more efficient and said that perhaps I’d try them out and report back. Phil, being Phil, took care of that before I could get around to it and go some interesting results.
For some reason his main post with the results isn’t showing even though it appears in the recent comments bar. Here’s the meat of that comment. I have to agree with Phil that it’s a bit surprising that the
split-string method was the winner but that goes to show that one’s intuition about these things is most often wrong.
(length teststr) 2070 (length (split-string teststr " ")) 553 (benchmark-run 10000 (double-blanks1 teststr)) ;; string-to-list (28.203663436 967 6.414597634999999) ;; uncompiled (23.110737061000002 968 8.26656119499986) ;; compiled (benchmark-run 10000 (double-blanks2 teststr)) ;; replace-regexp-in-string (21.614643389999998 1913 11.981117643999973) ;; uncompiled (25.317938649 1913 15.727532714000041) ;; compiled (benchmark-run 10000 (double-blanks3 teststr)) ;; search-forward/insert (8.588276816 149 0.9523469339999999) ;; uncompiled (8.355207331 147 1.2536677860000367) ;; compiled (benchmark-run 10000 (double-blanks4 teststr)) ;; split-string/mapconcat (7.9514880880000005 476 3.2033875379999657) ;; uncompiled (7.953855983 477 3.237625087999996) ;; compiled