How would you solve this problem?
Given a file of text lines, remove duplicate lines keeping the original order.
The first thing that springs to mind is using
sort | uniq
or perhaps
sort -u
but these solutions fail to maintain the original order.
Lazarus Lazaridis has a very nice solution that accomplishes the task with an AWK one-liner. I wrote about the power of AWK one-liners a couple of weeks ago so this is a nice coda. Lazaridis’ solution leverages AWK’s associative arrays so a similar solution is available in many other languages. Try, though, to implement it in, say, Python. It’s straightforward but it’s definitely not a one-liner.
Lazaridis’ post explains his solution in detail so it’s worth reading even if you aren’t an AWK user. He also explains how you can solve the problem with sort, although not in a trivial way.
Notice how the AWK solution leverages the two powerful features of AWK that I discussed in my AWK post: the implicit main loop and associative arrays. It’s amazing how many problems can be trivially solved by using those two features.