R and Babel

I decided to accept my own challenge so I downloaded and installed R. It’s a huge system with a lot to learn but I did manage to figure out enough to generate some basic statistics and a couple graphs. I’ll be using the same data as in the 10,000 Steps post, but I won’t bother to reproduce the table here (it’s in the org file for this post, of course, I’m just not exporting it).

If you want to use R with Babel you really need to read the Org-babel-R page. Babel works a little differently with R and it’s easy to waste a lot of time trying to get things to work because you weren’t aware of those differences.

­

The first thing I wanted to do was calculate some basic statistics from the data. Nothing fancy, just maximum value, minimum value, mean, median, and standard deviation. That’s pretty easy in R because there are R primitives for them.

Here’s the code block for the basic statistics. As before, steps is the table of data. The in[,2] is the second column of that data (it’s a slice of the array that Babel creates from the table for R’s use). I put it in vals just to avoid typing in[,2] over and over. The only tricky part was figuring out how to form the output array but that’s because I don’t really know R yet.

#+BEGIN_SRC R :var inp=steps :exports results
vals <- inp[,2]
array( c("Max", "Min", "Mean", "Median", "Std. Dev.",
  max(vals), min(vals), mean(vals), median(vals),
  sd(vals)), dim=c(5, 2))
#+END_SRC

Here’s the resulting table:

Max 18078
Min 840
Mean 11665.6428571429
Median 12572
Std. Dev. 5372.68656994208

Next I wanted to generate a pie chart showing the distribution of steps in groups of 1,000. I was thinking that I’d have to do some low level programming to build the frequency table. It turns out, however, that R has the hist command that will do that for you. It will even plot a histogram. All you need to do is pass it the data and indicate where you want the breaks to be. This simple code block

#+BEGIN_SRC R :var inp=steps :file http://irreal.org/blog/wp-content/uploads/2011/05/wpid-hist.png :results graphics :exports results
hist(inp[,2], breaks=0+1000*(0:19), plot=TRUE, main="Steps Histogram", xlab="Steps")
#+END_SRC

results in the the following histogram.

http://irreal.org/blog/wp-content/uploads/2011/05/wpid-hist.png
Most of the arguments to hist involve setting the graph and axes labels.

The output of hist is really an object (think structure) that has entries for things like the bucket breaks, bucket midpoints, and bucket counts. You can see that in action in the next code block, which uses hist again but with no plotting. Instead the output object is captured and used as input to the pie primitive.

#+BEGIN_SRC R :var inp=steps :file http://irreal.org/blog/wp-content/uploads/2011/05/wpid-pie-steps.png :results graphics :R-dev-args pointsize=7 :exports results
ftab <- hist(inp[,2], breaks=0+1000*(0:19), plot=FALSE)
pie(ftab$counts, labels = ftab$mids/1000, radius=.75, main="Frequency (thousands)")
#+END_SRC

I had to scrunch up the pie chart a bit to prevent the bucket labels from colliding too much. Still, it’s a nice chart for very little effort.

http://irreal.org/blog/wp-content/uploads/2011/05/wpid-pie-steps.png
If you do any work involving statistics, R is definitely a tool you need. Of course, if you do any work involving statistics, you already know that. Using Org-mode and Babel is a great way to have all the data, graphs, and text for a paper in one file that can then easily be exported to HTML or LaTeX. I’ve said before that I really like Babel and the more I use it and learn about it the more I like it.

This entry was posted in Programming and tagged . Bookmark the permalink.