Tidying Data for Statistical Analysis

William Denton has an interesting post on tidying data to make statistical analysis easier. This may be interesting to Irreal readers because he uses Org mode and Babel to tidy the data and then to analyze the data.

The idea of tidy data comes from Hadley Wickham in his Tidy Data paper. The idea is that the data is arranged in tables where each variable is a column, each observation is a row, and each observational unit is a table. This is explained in greater detail in Wickham’s paper.

Starting with a simple data set of expenditures for two years, Denton first arranges it in a table that is easy for humans to read and understand but that is more difficult to do statistical analysis with. Using some R code he rearranges the data into a tidy format and then does some analysis by producing a couple of graphs that display the data in meaningful ways. All of this is done in an Org file using Babel.

This exact subject is of interest mainly to data scientists, of course, but I like seeing how Emacs and Org can be leveraged to help make the workflow easier.

This entry was posted in General and tagged , . Bookmark the permalink.