version 9.2 set more off * Change the -use- command if you want to use a local copy of the data. use "https://www3.nd.edu/~rwilliam/statafiles/missing.dta", clear * Part 1. Do frequencies/descriptives on the original vars. Look at MD * patterns, problems with coding. The -fre- command, available from * ssc, needs to be installed. sum rincome educ age sex race paeduc fre rincome educ age sex race paeduc, tab(10) * Part 2. I don't like the way RINCOME is coded. I also don't think the * MD categories are quite right. Create a new variable, INCOME, * that is coded better. I won't distinguish between MD codes. recode rincome (1=.5) (2=2) (3=3) (4=4.5) (5=5.5) (6=6.5) (7=7.5) (8=9) /// (9=12.5) (10=17.5) (11=22.5) (12=25) (else=.), gen(income) fre income * Part 3. Let's fix the RACE and SEX variables too. Even though race * has 3 categories, I think it is better to only make one dummy. recode race (1=1)(else=0), gen(white) recode sex (1=1)(else=0), gen(male) fre white male * Part 4. Create a modified PAEDUC2 that I can use later. Create * an MD indicator. Using the impute command makes it * easy and also more precise. gen one = 1 gen mdpaeduc = missing(paeduc) impute paeduc one, gen(paeduc2) fre paeduc2 mdpaeduc * Part 5. Listwise deletion of MD. reg income educ age male paeduc white * Part 6. Sorry, unlike SPSS, no easy way to do pairwise in Stata. If I was a fanatic * about it, I could probably use the pwcorr and corr2data commands. * Part 7. Mean substitution of MD (both IVs and DVs). Seems questionable for * the DV. I'll use the impute command to create new vars * with the mean substituted for MD. impute income one, gen(incomex) impute educ one, gen(educx) impute age one, gen(agex) impute male one, gen(malex) impute paeduc one, gen(paeducx) impute white one, gen(whitex) reg incomex educx agex malex paeducx whitex * Part 8. Mean substitution, Father's education only, without and then with an MD indicator. * The final regression will give us an idea of whether or not the MD in PAEDUC is missing * on a random basis. reg income educ age male paeduc2 white reg income educ age male paeduc2 white mdpaeduc * Part 9. Add any additional analyses you think are useful. * Other suggestions. Drop paeduc completely! reg income educ age male white * Try to id where the MD is. gen mdinc = missing(income) tabulate male mdinc, chi2 exact lrchi2 row tabulate white mdinc, chi2 exact lrchi2 row