Bjarne Stroustrup on Linked Lists

Suppose you have a large, sorted sequence of integers that you wish to store in memory. You need to be able to efficiently add and delete elements to and from the sequence while keeping it sorted. What data structure would you use?

The two natural candidates are arrays and linked lists. Which one do you think is most efficient? If you’re like most programmers you would choose the linked list to limit the amount of data you have to move. For example, if you use arrays and your sequence is 1,000,000 integers long, you will, on average, have to move 500,000 integers for each addition or deletion. With a linked list, on the other hand, it’s simply a matter of adjusting a couple of pointers. It makes sense, then, that a linked list is the most efficient data structure to use.

Except it’s not. The linked list is, in fact, a couple of orders of magnitude slower than using an array. How can this be? It turns out, as Bjarne Stroustrup explains in this video that the process is completely dominated by the linear search to find the insertion/deletion point. The reason for that has to do with the cache; watch the video for the details.

If you’re familiar with Emacs internals, you’ve probably wondered why Emacs treats its buffers as arrays and moves data when it needs to insert or delete characters from the middle of the buffer rather than using a linked list of lines as many editors do. It may seem grossly inefficient to do things this way but as the video shows, it is, in fact, far more efficient on modern machines.

The video is only 7 minutes and 45 seconds but makes an important point. It’s entertaining as well as informative and well worth your time.

This entry was posted in Programming and tagged . Bookmark the permalink.