use https://www3.nd.edu/~rwilliam/statafiles/mdpart1.dta, clear sum * Mean substitution gen one = 1 impute educ one, gen(xeduc1) sum educ xeduc1 * Add a missing data indicator gen md = 0 replace md = 1 if xeduc1!=educ * Subgroup means impute educ black white other, gen(xeduc2) tab race, sum(xeduc2) * Substitute regression estimate impute educ jobexp black other white, gen(xeduc3) sum educ xed* tab race, sum(xeduc3)