Exploratory Data Analysis with Unix

Seth Brown over at Dr. Bunsen has a great post on how the ordinary Unix coreutils can be used for exploratory data analysis. His first example

(head -5; tail -5) < data

to show the first and last 5 lines of a file is something that we all understand immediately but might not think of doing ourselves.

He moves on to showing how to change the shape of data using utilities like paste, and to enumerate data with wc and simple awk scripts. Then he has some examples that show how to massage data into useful forms for input to statistical analysis and plotting programs.

There are a lot of Unix utilities that many of us have forgotten or have only dim remembrances of. That’s too bad because as Brown demonstrates, they can be extraordinarily useful. As one of the commenters remarked, “…far more people need to get back to basics and learn coreutils.” I couldn’t agree more.

This entry was posted in General and tagged . Bookmark the permalink.