Computing with Data Seminar

Fall 2013 and Spring 2014

Materials for 2014-2015 Seminar

Steven Buechler
Department of Applied and Computational Mathematics and Statistics
University of Notre Dame


This is a set of notes, examples, data and projects for a short course in advanced R programming, aimed at graduate students and senior-level undergraduates. It is intended to help students develop the skills they'll need in their research or employment. In R there are many ways to accomplish a goal, and these notes make no attempt to be comprehensive. Often I'll simply describe one method to get a result and leave it to the student to explore alternatives.


Lecture Notes

  1. Introduction The general perspective of the course is given, and pointers to the main tools for generating statistical reports, namely RStudio, RMarkdown and knitr.
  2. Motivating example This study of height and weight with respect to gender illustrates how ggplot2 can help organize even very simple analyses.
  3. Vectors, factors, lists Setting a baseline of knowledge about the most fundamental R objects
  4. Matrices The most basic structure for doing linear algebra and storing tabular data
  5. Data frames First principles about working with R's foundational structure for storing tabular data.
  6. Functions How to define your own functions
  7. Functions on matrices Using the apply function to compute on rows and columns
  8. Functions on lists Using lapply to apply a function to each component of a list
  9. Split-apply example An example using lapply and the long form of a data frame to handle a large number of possible covariates
  10. Introduction to ggplot2 Introducing ggplot2 as a better way to create graphics
  11. Examples of geoms Examples of setting aesthetics and the most common geoms.
  12. Scales and themes Using scales and themes to control visual aspects of the ggplots.
  13. Facets Plots with panels ranging over subgroups.
  14. Topics for next semester Possible topics to cover in the next set of lectures.
  15. Split-apply methodology with plyr Introduction to the plyr package for flexibly grouping data and applying a function to the pieces.
  16. Baseball example Career performance for power hitters is analyzed using plyr.
  17. Refining kNN Refinement of a machine learning application with the help of plyr
  18. Plyr Practice Some in class practice in using plyr
  19. Manipulating strings and text The basics for manipulating and find patterns in textual data.
  20. Text Mining Example An example of text mining: spam filter
  21. Looping and iterators, part I An introduction to the foreach and iterators packages
  22. Parallelization with foreach The parallel backend to foreach.
  23. Importing data Importing data from a variety of sources.
  24. Introduction to dplyr Manipulating data with the dplyr package.

Homework

  1. Homework 1
  2. Homework 2
  3. Homework 2-26-14
  4. Homework 7-27-14

Practice work

  1. lapply practice data: source("http://www3.nd.edu/~steve/computing_with_data/practice/lapply_practice.R")