# Computing with Data Seminar

Fall 2013 and Spring 2014

Materials for 2014-2015 Seminar

This is a set of notes, examples, data and projects for a short course in advanced R programming, aimed at graduate students and senior-level undergraduates. It is intended to help students develop the skills they'll need in their research or employment. In R there are many ways to accomplish a goal, and these notes make no attempt to be comprehensive. Often I'll simply describe one method to get a result and leave it to the student to explore alternatives.

## Lecture Notes

1. Introduction The general perspective of the course is given, and pointers to the main tools for generating statistical reports, namely RStudio, RMarkdown and knitr.
2. Motivating example This study of height and weight with respect to gender illustrates how ggplot2 can help organize even very simple analyses.
3. Vectors, factors, lists Setting a baseline of knowledge about the most fundamental R objects
4. Matrices The most basic structure for doing linear algebra and storing tabular data
5. Data frames First principles about working with R's foundational structure for storing tabular data.
6. Functions How to define your own functions
7. Functions on matrices Using the `apply` function to compute on rows and columns
8. Functions on lists Using `lapply` to apply a function to each component of a list
9. Split-apply example An example using `lapply` and the long form of a data frame to handle a large number of possible covariates
10. Introduction to ggplot2 Introducing ggplot2 as a better way to create graphics
11. Examples of geoms Examples of setting aesthetics and the most common geoms.
12. Scales and themes Using scales and themes to control visual aspects of the ggplots.
13. Facets Plots with panels ranging over subgroups.
14. Topics for next semester Possible topics to cover in the next set of lectures.
15. Split-apply methodology with plyr Introduction to the `plyr` package for flexibly grouping data and applying a function to the pieces.
16. Baseball example Career performance for power hitters is analyzed using `plyr`.
17. Refining kNN Refinement of a machine learning application with the help of `plyr`
18. Plyr Practice Some in class practice in using `plyr`
19. Manipulating strings and text The basics for manipulating and find patterns in textual data.
20. Text Mining Example An example of text mining: spam filter
21. Looping and iterators, part I An introduction to the `foreach` and `iterators` packages
22. Parallelization with `foreach` The parallel backend to `foreach`.
23. Importing data Importing data from a variety of sources.
24. Introduction to `dplyr` Manipulating data with the `dplyr` package.

## Practice work

1. lapply practice data: source("http://www3.nd.edu/~steve/computing_with_data/practice/lapply_practice.R")