R: open source statistical computing

r_20080131.jpg

I was digging around for an open source statistics package today and came across R, a GPLed statistics and and data analysis suite. Sweet!

R provides a wide variety of statistical (linear and nonlinear modelling, classical statistical tests, time-series analysis, classification, clustering, ...) and graphical techniques, and is highly extensible. The S language is often the vehicle of choice for research in statistical methodology, and R provides an Open Source route to participation in that activity.

One of R's strengths is the ease with which well-designed publication-quality plots can be produced, including mathematical symbols and formulae where needed. Great care has been taken over the defaults for the minor design choices in graphics, but the user retains full control.

So I've been messing around with this for the last half hour and it's really an exciting package, especially if you're a coder or unix geek. You interface with R through a command line programming interface, executing simple statements, setting variables, and defining functions. It feels similar to issuing commands at a unix prompt, except you're working with data sets instead of file descriptors.

What's cool is the robust capability of the standard function set. Want to read in a data set from a tab delimited table you found on the internet? Check this out:

# Read a table in from a URL (tab delimited table with row headers)
Mydata <- read.table(http://someserver.com/table.txt', header=TRUE)

# Display summary (mean, median, min, max, etc.) for each column
summary(Mydata)

# Get the standard deviation for the values in column "foo"
attach(Mydata)
sd(foo)

Learning the command set is a little daunting at first, but the console even does tab completion. If you don't know what a function does, just put a question mark before it. For instance, "?sd" will quickly pull up help for the standard deviation function.

I've only scratched the surface, but there are links below to some R beginner guides which should help you get started. Anyone out there more familiar with the package? Please share any useful links and tips in the comments.

The R Project for Statistical Computing - Link
An Introduction to Statistical Computing in R - Link
Producing Simple Graphs with R - Link

Posted by Jason Striegel | Jan 31, 2008 08:35 PM
Math, Science, Statistics | Permalink | Comments (6) Bookmark and Share

Recent Entries

Comments

Newest comments listed first.

Posted by: Andy Lester on January 31, 2008 at 10:57 PM

Not sure why the sample image got converted to a JPG, but anything other than photos look awful as JPGs. Compare the image above to the original at http://www.r-project.org/hpgraphic.png and see how muddy the compression artifacts make things look.


Posted by: Jake on February 1, 2008 at 4:12 AM

I'm a PhD student in computational biology and I program a LOT of R (pretty much exclusively). It is incredibly powerful because 1) it's extensible - even with C or FORTRAN code 2) it can be used as a general-purpose scripting language (personally I find it much more intuitive than Python, for instance) 3) it's geared toward handling real data and advanced machine learning and modeling, etc. is pretty easy to do 4) it's easy to create beautiful plots and graphics in very few lines of code.

Right now as a hobby I'm working on an R package to interface with Arduino. I've already got basic serial communication between the board and R working well. My dream is to someday release the package so that R and Arduino (aRduino, anyone?) can be used together as a platform for open scientific instrumentation development.

The serial communication part is easy, but the real beauty would be in leveraging R's powerful analysis tools in a real-time way, so that data collection and analysis can happen simultaneously. R also has GTK+ bindings, so it's pretty easy to write user interfaces in R, which would be nice for instrument software development.


Posted by: Jake on February 1, 2008 at 4:41 AM

Some links:

1) A gallery of example R graphics:
http://addictedtor.free.fr/graphiques/allgraph.php

These are contributed by users and vary quite a bit in quality. The output also looks much better when directed to a Cairo or PDF or Quartz (on OS X) device, as opposed to what is shown here.

2) Bioconductor, a subset of R packages for bioinformatics.
http://www.bioconductor.org

3) some good general R tips
http://pj.freefaculty.org/R/Rtips.html

4) the useR! conference is held every year with presentations and posters from a very wide variety of fields where R is applied in analysis. Here's the program of presentations and posters (with PDFs) from 2007:
http://user2007.org/program/


Posted by: tomas iš mažosios t on February 3, 2008 at 2:45 PM

R is quite nice package, but i was not able to make it read data from mysql in reasonable time (500000 records: R - >24hours, Matlab - ~2min). it could be that you have to use some trick, but i did not find any solution


Posted by: jake on February 5, 2008 at 2:25 PM

That's strange. I routinely fetch 30000+ records from a remote MySQL server in just a few (maybe 10-12) seconds from within R.


Posted by: tomas iš mažosios t on February 8, 2008 at 4:24 PM

well, probably i have to try once more ;-)

do you fetch only numerical data, or some strings to?


Leave a comment



Bloggers

Welcome to the Hacks Blog!

Brian Jepson.Brian Jepson


Jason Striegel.Jason Striegel


Philip Torrone.Phillip Torrone



See all of the books in the Hacks Series!
Advertise here.

Recent Posts

www.flickr.com
photos in Hacks More photos in Hacks

Most read entries (last 30 days)