version 12.1 *** Imputation for a single continuous variable using regress webuse mheart0, clear sum mi set mlong mi register imputed bmi mi register regular attack smokes age hsgrad female mi impute regress bmi attack smokes age hsgrad female, add(20) rseed(2232) list bmi attack smokes age hsgrad female _mi_id _mi_miss _mi_m if _mi_id ==8 mi xeq 0 1 20: summarize bmi mi estimate, dots: logit attack smokes age bmi hsgrad female mi xeq 0: logit attack smokes age bmi hsgrad female, nolog mi describe * Use mimrgns -- but with caution mi estimate, dots: logit attack i.smokes age bmi i.hsgrad i.female mimrgns smokes, at (age = (20 (10) 90)) predict(pr) cmdmargins vsquish marginsplot, noci scheme(sj) name(mimrgnsplot) *** Appendix A: More examples * PMM - Predictive Mean Matching webuse mheart0, clear mi set mlong mi register imputed bmi version 12.1 mi impute pmm bmi attack smokes age hsgrad female, add(20) knn(5) rseed(2232) mi estimate: logit attack smokes age bmi hsgrad female * Logit webuse mheart2, clear mi set mlong * This will show us how much missing data, and the ranges of observed values mi misstable summarize mi register imputed hsgrad version 12.1 mi impute logit hsgrad attack smokes age bmi female, add(10) rseed(2232) * Estimates before imputation mi xeq 0: logit attack smokes age bmi female hsgrad, nolog * Estimates after imputation mi estimate: logit attack smokes age bmi female hsgrad * Multinomial Logit webuse mheart3, clear mi set mlong mi misstable summarize mi register imputed marstatus version 12.1 mi impute mlogit marstatus attack smokes age bmi female hsgrad, add(20) rseed(2232) * Estimates before imputation mi xeq 0: logit attack smokes age bmi female hsgrad i.marstatus * Estimates after imputation mi estimate: logit attack smokes age bmi female hsgrad i.marstatus * Ordered Logit use http://www.stata-press.com/data/r13/mheart4, clear tabulate alcohol, missing mi set mlong mi register imputed alcohol version 12.1 mi impute ologit alcohol attack smokes age bmi female hsgrad, /// add(10) rseed(2232) mi estimate: logit attack smokes age bmi female hsgrad i.alcohol *Poisson - Count Variables use http://www.stata-press.com/data/r13/mheartpois, clear misstable summarize tab2 female npreg, missing mi set mlong mi register imputed npreg version 12.1 mi impute poisson npreg attack smokes age bmi hsgrad, /// add(20) conditional(if female==1) rseed(2232) mi estimate: logit attack smokes age bmi female hsgrad npreg *nbreg (Negative Binomial Regression) - Count Variables use http://www.stata-press.com/data/r13/mheartpois, clear mi set mlong mi register imputed npreg version 12.1 mi impute nbreg npreg attack smokes age bmi hsgrad, /// add(20) conditional(if female==1) rseed(2232) mi estimate: logit attack smokes age bmi female hsgrad npreg *** Appendix B: More than 1 variable webuse mheart8s0, clear mi describe mi misstable patterns, frequency version 12.1 mi impute chained (regress) bmi age = attack smokes hsgrad female, add(20) rseed(2232) mi xeq 0: logit attack smokes age bmi hsgrad female, nolog mi estimate: logit attack smokes age bmi hsgrad female *** Appendix C: Approximate do it yourself multiple imputation version 12.1 webuse mheart0, clear * Imputation model for bmi regress bmi attack smokes age hsgrad female predict bmihat if missing(bmi) scalar rmse = e(rmse) * As shown in the output, the rmse is a little over 4. To confirm, display rmse * Impute the values for missing cases 20 times version 12.1 set seed 2232 gen e = 0 if !missing(bmi) * Compute 20 random error terms forval i = 1/20 { quietly gen e`i' = rnormal() if missing(bmi) } * Compute 20 imputed values for each case * Imputed value = bmihat + random variation forval i = 1/20 { quietly clonevar bmi`i' = bmi quietly replace bmi`i' = bmihat + rmse * e`i' if missing(bmi) } * Compare the imputed values, first 3 imputations list bmihat bmi bmi1 bmi2 bmi3 e1 e2 e3 if missing(bmi) * Compare the summary stats, first 3 imputations sum bmihat bmi bmi1 bmi2 bmi3 e1 e2 e3, sep(5) * Convert to mi format. We will use wide, as each case now has * one record with several imputed variables mi import wide, imputed(e = e1-e20 bmi = bmi1-bmi20) clear drop * We can now convert to the more efficient mlong format mi convert mlong, clear * Now do the analytic model mi estimate: logit attack smokes age bmi hsgrad female *** Alternative coding; no need to generate the e vars version 12.1 webuse mheart0, clear * Imputation model for bmi regress bmi attack smokes age hsgrad female predict bmihat if missing(bmi) scalar rmse = e(rmse) * As shown in the output, the rmse is a little over 4. To confirm, display rmse * Impute the values for missing cases 20 times version 12.1 set seed 2232 * Compute 20 imputed values for each case * Imputed value = bmihat + random variation forval i = 1/20 { quietly clonevar bmi`i' = bmi quietly replace bmi`i' = bmihat + rnormal(0, rmse) if missing(bmi) } * Compare the imputed values, first 3 imputations list bmihat bmi bmi1 bmi2 bmi3 if missing(bmi) * Compare the summary stats, first 3 imputations sum bmihat bmi bmi1 bmi2 bmi3 , sep(5) * Convert to mi format. We will use wide, as each case now has * one record with several imputed variables mi import wide, imputed(bmi = bmi1-bmi20) clear drop * We can now convert to the more efficient mlong format mi convert mlong, clear * Now do the analytic model mi estimate: logit attack smokes age bmi hsgrad female *** Appendix D: Full Information Maximum Likelihood * Adapted from Example 2 of the mi estimate chapter of the Stata 14 MI manual webuse mhouses1993s30, clear version 12.1 * Regular Regression with missing data mi xeq 0: regress price tax sqft age nfeatures ne custom corner * Corresponding sem model with missing data sem (price <- tax sqft age nfeatures ne custom corner) if _mi_m ==0, nolog * Multiple Imputation Model mi estimate: regress price tax sqft age nfeatures ne custom corner * Correspondign SEM model using fiml sem (price <- tax sqft age nfeatures ne custom corner) if _mi_m ==0, method(mlmv) nolog