Import two data.frames we’ll use in the problems.
ht_wt_df <- read.csv("http://www3.nd.edu/~steve/computing_with_data/Data/heights_weights_genders.csv")
wisc_bc_df <- read.csv("http://www3.nd.edu/~steve/computing_with_data/Data/wisc_bc_data.csv")
Compute the mean of the heights separately for the males and the females in ht_wt_df
.
Form a sub-data.frame sub_df1
of ht_wt_df
that contains every third row of ht_wt_df
; that is, row 1, 3, 6, 9, …. What are the mean and variance of the people in sub_df1
? HINT: You will need the seq
function.
Use the str
function to identify the variables (columns) in the wsic_bc_df
that are numeric. (a) Form a new data.frame num_bc_df
that contains only the numeric variables. (b) Compute the correlation matrix of num_bc_df
. HINT: Use the cor
function. Look up the help on this function to learn the correct way to use it. (c) List the variables that have a correlation > 0.90 with radius_mean
.