Computing with Data Seminar
Fall 2013 and Spring 2014
Materials for 2014-2015 Seminar
Steven Buechler
Department of Applied and Computational Mathematics and Statistics
University of Notre Dame
This is a set of notes, examples, data and projects for a short course in advanced R programming, aimed at graduate students and senior-level undergraduates. It is intended to help students develop the skills they'll need in their research or employment. In R there are many ways to accomplish a goal, and these notes make no attempt to be comprehensive. Often I'll simply describe one method to get a result and leave it to the student to explore alternatives.
Lecture Notes
- Introduction The general perspective of the course is given, and pointers to the main tools for generating statistical reports, namely RStudio, RMarkdown and knitr.
- Motivating example This study of height and weight with respect to gender illustrates how ggplot2 can help organize even very simple analyses.
- Vectors, factors, lists Setting a baseline of knowledge about the most fundamental R objects
- Matrices The most basic structure for doing linear algebra and storing tabular data
- Data frames First principles about working with R's foundational structure for storing tabular data.
- Functions How to define your own functions
- Functions on matrices Using the
apply
function to compute on rows and columns
- Functions on lists Using
lapply
to apply a function to each component of a list
- Split-apply example An example using
lapply
and the long form of a data frame to handle a large number of possible covariates
- Introduction to ggplot2 Introducing ggplot2 as a better way to create graphics
- Examples of geoms Examples of setting aesthetics and the most common geoms.
- Scales and themes Using scales and themes to control visual aspects of the ggplots.
- Facets Plots with panels ranging over subgroups.
- Topics for next semester Possible topics to cover in the next set of lectures.
- Split-apply methodology with plyr Introduction to the
plyr
package for flexibly grouping data and applying a function to the pieces.
- Baseball example Career performance for power hitters is analyzed using
plyr
.
- Refining kNN Refinement of a machine learning application with the help of
plyr
- Plyr Practice Some in class practice in using
plyr
- Manipulating strings and text The basics for manipulating and find patterns in textual data.
- Text Mining Example An example of text mining: spam filter
- Looping and iterators, part I An introduction to the
foreach
and iterators
packages
- Parallelization with
foreach
The parallel backend to foreach
.
- Importing data Importing data from a variety of sources.
- Introduction to
dplyr
Manipulating data with the dplyr
package.
Homework
- Homework 1
- Homework 2
- Homework 2-26-14
- Homework 7-27-14
Practice work
- lapply practice data:
source("http://www3.nd.edu/~steve/computing_with_data/practice/lapply_practice.R")