A couple of days ago, I wrote about upgrading to Clozure Lisp 1.7 and remarked that I should try it out with a miniproject, perhaps from Programming Praxis. I went over there and found the Sum of Two Integers problem, which looked interesting. The problem is given a list of integers and a target integer determine if there are two integers in the list that sum to the target.
I chose to interpret the rules strictly:
- You can make no assumptions about the list of integers. They could be positive, negative, or zero. Numbers might be repeated multiple times. They are not sorted and you do not know the maximum or minimum number without checking the list.
- Similarly, you can make no assumptions about the target number other than that it's an integer
- You must find exactly two integers that sum to the target number—you can't reuse a number or just find the target in the list. Thus, if the target is 10 and 5 is in the list, there must be another 5 in the list for (5, 5) to be a solution. Similarly, if 10 is in the list, that's not a solution unless 0 is also in the list.
Truth to tell, if this were a one-off most of us would just hack up the obvious n2 solution with two nested loops checking each pair of numbers in the list. Of course, none of us would admit to doing such a thing so the question is: can we find a better solution? My first thought was to use a bit array where the nth bit is set if n is in the list. That leads to a O(n) solution but there are some problems:
- You need a second array to indicate whether a given number is repeated at least once in the list.
- You really need to know the maximum number in the list to size the bit array properly.
- It doesn't handle negative numbers gracefully.
I did code up a solution like this under the assumption that there were no negative numbers but that doesn't solve the problem so we need another solution.
We can use almost the same solution by substituting a hash table for the bit array. With this solution, each integer in the list is a key for the hash table and the value corresponding to the key is the count of the occurrences of the key in the list of integers. We need to go through the list twice1: once to build the hash table and once to check to see if a number in the list has the required number that sums to the target.
The code below implements this solution. The
check routine on lines 9–19 does all the real work. For each number in the list (
have) we calculate the number we need to sum to the target (
need) and then check to see if
need is in the hash table. If it is, we've found a solution (line 18), if it's not we recursively call
check with the
cdr of the list (line 19). Lines 15–17 check for the special case of a number being exactly half of the target. In that case we've found the solution if the number is in the list at least twice.
1: (defun hash-numbers (numbers) 2: (let ((ht (make-hash-table :size (length numbers)))) 3: (dolist (n numbers) 4: (setf (gethash n ht) (1+ (gethash n ht 0)))) 5: ht)) 6: 7: (defun check-sums (goal numbers) 8: (let ((ht (hash-numbers numbers))) 9: (labels ((check (numbers) 10: (if (null numbers) 11: nil 12: (let* ((have (car numbers)) 13: (need (- goal have))) 14: (cond 15: ((= need have) (if (> (gethash need ht) 1) 16: (list have have) 17: (check (cdr numbers)))) 18: ((gethash need ht) (list have need)) 19: (t (check (cdr numbers)))))))) 20: (check numbers)))) 21:
This solution won't be as fast as the bit array solution (given a decent implementation of bit arrays) but it's still pretty efficient and it avoids all the problems with the bit array solution.
Can we do better than O(n)? What if we know the list is sorted?
I looked through the solutions that other Programming Praxis readers submitted looking for something better than O(n). I didn't find anything but Graham and some others noticed that you could combine the testing with the building of the hash table. Here's my translation of Graham's Python solution
(defun check-sums-2 (goal numbers) (let ((ht (make-hash-table :size (length numbers)))) (dolist (n numbers) (if (gethash (- goal n) ht) (return (list n (- goal n))) (setf (gethash n ht) t)))))
It still O(n) but faster because it only goes through the list once. It's also simpler.
1 Actually, the solution goes through the list 3 times; calculating the size of the list also requires reading through the list.