use http://www3.nd.edu/~rwilliam/statafiles/outliers.dta, clear reg dv iv * Basic descriptive stats fre dv iv, tabulate(3) sum dv iv extremes dv iv scatter dv iv gen id = _n lvr2plot, mlabel(id) * Residual statistics * Discrepancy measures * Standardized residuals -- values more extreme than 3 may be a problem predict stdresid, rstandard * Studentized residual predict rstudent, rstudent * leverage measure * leverage (or hat) identifies cases that can have a large effect on the * fitted model even if the corresponding residual is small * When the leverage > 2k/n then there is high leverage * Maybe use 3k/n when N is small predict leverage, leverage * Influence measures * DFBetas -- SPSS calls these SDBETAS -- values larger than 1 * or > 2/ sqrt(N) (about .316 in this case) are a problem dfbeta * Get Cook's Distance measure -- values greater than 4/N may cause concern predict cooksd, cooksd sum extremes stdresid rstudent _dfbeta_1 cooksd leverage use http://www3.nd.edu/~rwilliam/statafiles/outliers.dta, clear reg dv iv if dv!=99 * Appendix A: Leverage use http://www3.nd.edu/~rwilliam/statafiles/outliers.dta, clear * Case 9: dv = 999 replace dv = 999 in 9 reg dv iv * Case 9: dv = 9999 replace dv = 9999 in 9 reg dv iv * Make case 9 exactly average on iv replace iv = -3.996264 in 9 * Make dv = 99 replace dv = 99 in 9 reg dv iv * Make dv = 999 replace dv = 999 in 9 reg dv iv * Make dv = 9999 replace dv = 9999 in 9 reg dv iv * Appendix B: Robust regression techniques use http://www3.nd.edu/~rwilliam/statafiles/outliers.dta, clear reg dv iv if dv!=99 rreg dv iv, nolog qreg dv iv, nolog