use http://www3.nd.edu/~rwilliam/statafiles/outliers.dta, clear
reg dv iv
* Basic descriptive stats
fre dv iv, tabulate(3)
sum dv iv
extremes dv iv
scatter dv iv
gen id = _n
lvr2plot, mlabel(id)
* Residual statistics
* Discrepancy measures
* Standardized residuals -- values more extreme than 3 may be a problem
predict stdresid, rstandard
* Studentized residual
predict rstudent, rstudent
* leverage measure
* leverage (or hat) identifies cases that can have a large effect on the
* fitted model even if the corresponding residual is small
* When the leverage > 2k/n then there is high leverage
* Maybe use 3k/n when N is small
predict leverage, leverage
* Influence measures
* DFBetas -- SPSS calls these SDBETAS -- values larger than 1
* or > 2/ sqrt(N) (about .316 in this case) are a problem
dfbeta
* Get Cook's Distance measure -- values greater than 4/N may cause concern
predict cooksd, cooksd
sum
extremes stdresid rstudent _dfbeta_1 cooksd leverage
use http://www3.nd.edu/~rwilliam/statafiles/outliers.dta, clear
reg dv iv if dv!=99
* Appendix A: Leverage
use http://www3.nd.edu/~rwilliam/statafiles/outliers.dta, clear
* Case 9: dv = 999
replace dv = 999 in 9
reg dv iv
* Case 9: dv = 9999
replace dv = 9999 in 9
reg dv iv
* Make case 9 exactly average on iv
replace iv = -3.996264 in 9
* Make dv = 99
replace dv = 99 in 9
reg dv iv
* Make dv = 999
replace dv = 999 in 9
reg dv iv
* Make dv = 9999
replace dv = 9999 in 9
reg dv iv
* Appendix B: Robust regression techniques
use http://www3.nd.edu/~rwilliam/statafiles/outliers.dta, clear
reg dv iv if dv!=99
rreg dv iv, nolog
qreg dv iv, nolog