February 20, 2014
setwd("/Users/steve/Documents/Computing with Data/18_plyr_practice/")
library(ggplot2)
library(plyr)
library(hflights)
names(hflights)
[1] "Year" "Month" "DayofMonth"
[4] "DayOfWeek" "DepTime" "ArrTime"
[7] "UniqueCarrier" "FlightNum" "TailNum"
[10] "ActualElapsedTime" "AirTime" "ArrDelay"
[13] "DepDelay" "Origin" "Dest"
[16] "Distance" "TaxiIn" "TaxiOut"
[19] "Cancelled" "CancellationCode" "Diverted"
You should use str(hflights)
hflights2 <- transform(hflights, date = paste(Month, "-", DayofMonth, sep=""))
There is anothor way: order the days and add an indicator for the day of the year. It takes a little more typing.
Things to think about:
jan1_flights <- subset(hflights2, date == "1-1")
nrow(jan1_flights)
[1] 552
ddply
or daply
To produce a data.frame:
flights_by_day_df <- ddply(hflights2, .(date), function(df) nrow(df))
head(flights_by_day_df, n=3)
date V1
1 1-1 552
2 1-10 659
3 1-11 583
names(flights_by_day_df)[2] <- "FlightsByDay"
To produce a vector; i.e., an array:
flights_by_day_vector <- daply(hflights2, .(date), function(df) nrow(df))
flights_by_day_vector[1:3]
1-1 1-10 1-11
552 659 583
The variables are: Month, day of month, day of week, date
date_properties_df <- subset(hflights2, select=c(Month, DayofMonth, DayOfWeek, date))
How many rows here?
nrow(date_properties_df)
[1] 227496
date_properties_df <- unique(date_properties_df)
When flights by day is a data.frame
date_properties_df2 <- merge(date_properties_df, flights_by_day_df)
When flights by day is a vector
date_properties_df2_2 <- transform(date_properties_df, FlightsByDay = flights_by_day_vector[date])
This is a summary statistic by month.
flights_by_month <- daply(date_properties_df2, .(Month), function(df) mean(df$FlightsByDay))
flights_by_month[1:6]
1 2 3 4 5 6
610.0 611.7 628.1 619.8 618.5 653.3
flights_by_month[7:12]
7 8 9 10 11 12
662.8 650.8 602.2 603.1 600.7 616.7
There is a definite up-tick in the summer.
flights_by_day_week <- daply(date_properties_df2, .(DayOfWeek), function(df) mean(df$FlightsByDay))
flights_by_day_week
1 2 3 4 5 6 7
660.8 608.6 614.0 671.2 672.5 521.3 616.5
Monday, Thursday and Friday have many more flights.