you're reading...
statistics, technology

R starter resources

I’m hardly the first person you would want to talk to about learning statistics in R. But if you’re bent on teaching yourself R, and you’ve ended up at my blog, here are some resources I found useful. (No opinions here about whether R is good/bad better/worse than Excel, Minitab, Matlab, Octave, SPSS, Stata, SAS, or others.)

R Rroject is the mothership.

Rstudio is an IDE for R, which provides a better GUI for some basic tasks. Most of what you’d expect from a modern IDE: syntax highlighting, GUI commands for loading and saving data, setting the working directory, separate panes for help files.

UCLA tutorials are a well written introduction to basic data entry, functions, and graphics in R. There are similar tutorials for Stata and other languages here as well.

Quick-R is a blog and a book written by a statistician for people switching from SPSS and Stata to R. Excellent and concise website detailing all of the basics: data entry, functions, plots, and how to think about all of the above.

R help list and archives are a way to ask questions of experienced users. You’ll get excellent help here, but it’s important to respect the etiquette. Basically, (1) read the package manual, (2) work up a minimal example with your question, and (3) be extremely precise about the data you have and the data you want, as opposed to the way you’re trying to solve that problem. This will become clearer if you read a few discussions in the archives.

StackExchange is a glorified bulletin board for programmers exchanging help and (frequently great) advice. Search the archives before posting new questions–the guys that hang out here hate duplicate postings. But it’s easier to navigate than the R help archives.

Spoetry explains some of the syntax and style of R. It’s a longish treatise that includes innumerable gems such as the use of subscripts in R.

You’ll need to get used to reading the manuals for packages you want to use, generally speaking. And reverse engineering code is particularly useful because syntax standards are pretty weak across packages.

R has thousands of packages. Benefits: lightweight software, highly extensible, possible for anyone to code updates anywhere in the world. Drawbacks: hard to know which libraries you need, and whether your favorites are best in class or obsolete.

What packages should you use in R? That’s sort of a moving target. (Depends what you want to d0.) So CRAN does organize some collections called task views. Notably SocialSciences, Econometrics, Spatial, Bayesian, TimeSeries, Survival.

I’ve found myself frequently installing these in addition to the more obvious ones.

  • foreign–read/write data formats from SPSS, Stata, Matlab, SAS, dbase, etc.
  • vcd–visualizing categorical data
  • lattice–trellis plots
  • Hmisc–grab bag of useful odds and ends
  • reshape–lets you choose whether observations are organized in lots of columns or one big column
  • ggplot2–simplifies plotting commands
  • plyr–simplifies reshape commands
  • Zelig–grandiose ambitions to unify model specification
  • datasets–lots of built-in datasets you can play with
  • ISOcodes–uniform country abbreviations
  • sp–spatial statistics
  • statnet–network statistics, graphs, topology

Some really key concepts are hard to appreciate at the beginning. For example, why there are different data types in R. These turn out to be very useful, but at first they seem like a pain. The data types include scalars, vectors, matrices, arrays, lists, and data frames. Vectors can take on a number of types: numerics, characters, factors, and ordered factors. Matrices and arrays are composed of a single type of data with 2 or more dimensions. Data frames are the things most like a Stata dataset, where the different columns of a data frame can contain different types of data (numbers, strings, qualitative data, etc.).

And it’s never a bad idea to join a group of users. Mine for the Tufts campus fizzled badly so I abandoned it.

About Ben Mazzotta

Ben Mazzotta is a postdoc at the Center for Emerging Market Enterprises (CEME). His study of the Cost of Cash is part of CEME's research into inclusive growth.


3 thoughts on “R starter resources

  1. Here is a list of R helpful links from the Dallas R Users Group.

    Dallas R Users Group

    Dallas, TX
    403 Members

    Dallas and Ft. Worth metroplex region of R Users. Learn, present, teach, and apply statistics, optimization, and mathematics with the R Project for Statistical Computing.Wel…

    Check out this Meetup Group →

    Posted by Larry - IEOR Tools | June 8, 2012, 3:03 pm
    • Perfect. Thanks so much–I get occasional inquiries from students about how to learn R, even though my department is basically a Stata shop. It’s nice to see a well curated list that isn’t too long.

      Posted by Ben Mazzotta | June 11, 2012, 9:44 am
  2. You might find this a useful site: http://offensivepolitics.net/blog It’s not mine.

    Posted by Robert Young | June 8, 2012, 6:30 pm

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


CC License

Bookmark and Share
June 2012
« Nov   Apr »


People mentioned in this blog are hereby invited to post a reply, on this blog, to any remarks, disparaging or otherwise, that I make here.

For that matter, if you're an interested reader and you'd like to share your thoughts, I would welcome proposals for cross-posting at your blog, guest blogging, and other creative ideas you may have.
%d bloggers like this: