clear all
use https://www3.nd.edu/~rwilliam/statafiles/anes_codeddata, clear
* estout, fre, khb, & spost13_ado need to be installed

********** Prepare the data first! **********
* Clean up income variables & some labels
mvdecode V083309A, mv(-4 = .a \ 98 = .b)
label variable income "Estimated Family Income in $1000s (recode of V083309A)"
label variable pres2008 "Who did R vote for in 2008? (0 = McCain, 1 = Obama)"
* pres2008 is a recode of the variable president, with the 25
* who voted for other candidates recoded to missing.

* We aren't going to use weights in this problem, but we'll
* leave the weights in in case you want to double-check whether the
* logistic regression coefficients and the marginal effects
* are affected much by whether you use or do not use weights.
svyset CASEID [pweight = V080102A]

* Dichotomize race to make analysis simpler. The nonwhite groups all voted
* for Obama by wide margins so combining them seems reasonable.
recode race (1 = 1 "White") (2 3 4 = 0 "NonWhite") (else = .), gen(white) label (white)
label variable white "race recoded to 0 = NonWhite, 1 = White"

* For convenience, we will keep ONLY the variables and cases we will be using.
* DON'T drop the cases if you want to use svy though, as standard
* errors will be off.
keep CASEID V080102A pres2008 white age income bush feminist V083309A
keep if !missing(CASEID, V080102A, pres2008, white, age, income, bush, feminist, V083309A)

* Descriptive information on the variables used. Descriptive stats are off a bit
* because weighting is not being used. e.g. Obama's margin of victory 
* was NOT as large as the unweighted numbers imply.
codebook, compact
fre pres2008 white

********** Now do the actual analysis! **********
capture estimates drop m1 m2
* We won't use weights because the khb command does not support them.
* Luckily the logistic regression coefficients and margins results 
* are similar whether you use svy: or do not use it.

* 1a. First, compare coefficients across nested models:
logit pres2008 i.white, nolog
est store m1
logit pres2008 i.white age income bush feminist, nolog
est store m2
esttab m1 m2, t scalars(r2_p chi2 df_m p bic) sfmt(%9.4f) obslast nobaselevels

* 1b. Now look at the y-standardized coefficients, given in the bStdY column,
* and see if they lead to the same or different conclusions about 
* how the effect of white changes as more variables are added to the model.
est restore m1
listcoef, std
est restore m2
listcoef, std

* 1c. Now look at the khb results, and see if they lead to the 
* same or different conclusions as the original coefficients did about 
* how the effect of white changes as more variables are added to the model.
khb logit pres2008 i.white || age income bush feminist, nolog

* 1e. Now look at the marginal effects. In this case it doesn't matter much 
* whether you the margins command, the MDL approach, or khb.
estimates restore m1
margins, dydx(white)
estimates restore m2
margins, dydx(white)
khb logit pres2008 i.white || age income bush feminist, nolog ape