version 12.1 *** Imputation for a single continuous variable using regress webuse mheart0, clear sum mi set mlong mi register imputed bmi mi register regular attack smokes age hsgrad female mi impute regress bmi attack smokes age hsgrad female, add(20) rseed(2232) list bmi attack smokes age hsgrad female _mi_id _mi_miss _mi_m if _mi_id ==8 mi xeq 0 1 20: summarize bmi mi estimate, dots: logit attack smokes age bmi hsgrad female mi xeq 0: logit attack smokes age bmi hsgrad female, nolog mi describe *** Appendix A: More examples * PMM - Predictive Mean Matching webuse mheart0, clear mi set mlong mi register imputed bmi mi impute pmm bmi attack smokes age hsgrad female, add(20) knn(5) rseed(2232) mi estimate: logit attack smokes age bmi hsgrad female * Logit webuse mheart2, clear mi set mlong * This will show us how much missing data, and the ranges of observed values mi misstable summarize mi register imputed hsgrad mi impute logit hsgrad attack smokes age bmi female, add(10) rseed(2232) * Estimates before imputation mi xeq 0: logit attack smokes age bmi female hsgrad, nolog * Estimates after imputation mi estimate: logit attack smokes age bmi female hsgrad * Multinomial Logit webuse mheart3, clear mi set mlong mi misstable summarize mi register imputed marstatus mi impute mlogit marstatus attack smokes age bmi female hsgrad, add(20) rseed(2232) * Estimates before imputation mi xeq 0: logit attack smokes age bmi female hsgrad i.marstatus * Estimates after imputation mi estimate: logit attack smokes age bmi female hsgrad i.marstatus * Ordered Logit use http://www.stata-press.com/data/r13/mheart4, clear tabulate alcohol, missing mi set mlong mi register imputed alcohol mi impute ologit alcohol attack smokes age bmi female hsgrad, /// add(10) rseed(2232) mi estimate: logit attack smokes age bmi female hsgrad i.alcohol *Poisson - Count Variables use http://www.stata-press.com/data/r13/mheartpois, clear misstable summarize tab2 female npreg, missing mi set mlong mi register imputed npreg mi impute poisson npreg attack smokes age bmi hsgrad, /// add(20) conditional(if female==1) rseed(2232) mi estimate: logit attack smokes age bmi female hsgrad npreg *nbreg (Negative Binomial Regression) - Count Variables use http://www.stata-press.com/data/r13/mheartpois, clear mi set mlong mi register imputed npreg mi impute nbreg npreg attack smokes age bmi hsgrad, /// add(20) conditional(if female==1) rseed(2232) mi estimate: logit attack smokes age bmi female hsgrad npreg *** Appendix B: More than 1 variable webuse mheart8s0, clear mi describe mi misstable patterns, frequency mi impute chained (regress) bmi age = attack smokes hsgrad female, add(20) rseed(2232) mi xeq 0: logit attack smokes age bmi hsgrad female, nolog mi estimate: logit attack smokes age bmi hsgrad female *** Appendix C: Approximate do it yourself multiple imputation version 12.1 webuse mheart0, clear * Imputation model for bmi regress bmi attack smokes age hsgrad female predict bmihat if missing(bmi) scalar rmse = e(rmse) * As shown in the output, the rmse is a little over 4. To confirm, display rmse * Impute the values for missing cases 20 times set seed 2232 gen e = 0 if !missing(bmi) * Compute 20 random error terms forval i = 1/20 { quietly gen e`i' = rnormal() if missing(bmi) } * Compute 20 imputed values for each case * Imputed value = bmihat + random variation forval i = 1/20 { quietly clonevar bmi`i' = bmi quietly replace bmi`i' = bmihat + rmse * e`i' if missing(bmi) } * Compare the imputed values, first 3 imputations list bmihat bmi bmi1 bmi2 bmi3 e1 e2 e3 if missing(bmi) * Compare the summary stats, first 3 imputations sum bmihat bmi bmi1 bmi2 bmi3 e1 e2 e3, sep(5) * Convert to mi format. We will use wide, as each case now has * one record with several imputed variables mi import wide, imputed(e = e1-e20 bmi = bmi1-bmi20) clear drop * We can now convert to the more efficient mlong format mi convert mlong, clear * Now do the analytic model mi estimate: logit attack smokes age bmi hsgrad female *** Alternative coding; no need to generate the e vars version 12.1 webuse mheart0, clear * Imputation model for bmi regress bmi attack smokes age hsgrad female predict bmihat if missing(bmi) scalar rmse = e(rmse) * As shown in the output, the rmse is a little over 4. To confirm, display rmse * Impute the values for missing cases 20 times set seed 2232 * Compute 20 imputed values for each case * Imputed value = bmihat + random variation forval i = 1/20 { quietly clonevar bmi`i' = bmi quietly replace bmi`i' = bmihat + rnormal(0, rmse) if missing(bmi) } * Compare the imputed values, first 3 imputations list bmihat bmi bmi1 bmi2 bmi3 if missing(bmi) * Compare the summary stats, first 3 imputations sum bmihat bmi bmi1 bmi2 bmi3 , sep(5) * Convert to mi format. We will use wide, as each case now has * one record with several imputed variables mi import wide, imputed(bmi = bmi1-bmi20) clear drop * We can now convert to the more efficient mlong format mi convert mlong, clear * Now do the analytic model mi estimate: logit attack smokes age bmi hsgrad female