Richard Williams, Notre Dame Sociology

Sociology 63993

Graduate Statistics II

Richard Williams, Instructor

Spring 2015



NOTE:  These are the Spring 2015 course notes for the second semester of my graduate statistics courses.  The notes for the first  semester course, Sociology 63992, are also available.  These pages make extensive use of Stata and SPSS. If you are mostly interested in learning how to use Stata, the Stata Highlights page lists several of the most relevant handouts from both courses.  Some pages are more "stand alone" than others, so adjacent handouts may help clear up any questions you have. 

Feel free to email Richard Williams if you have comments or suggestions.

The following special types of files are used on this web page:

PDF Pdf files. Require Adobe Acrobat.  Get Acrobat Reader
Tbk Toolbook files. Viewing Instructions
SPSS SPSSWIN files. Necessary for doing homework problems. Can probably be adapted for other SPSS platforms. You should save these files to your local hard disk and then use them with SPSS.
Stata files.  Necessary for doing homework problems.You should save these files to your local hard disk and then use them with Stata.

In addition, some files are in zipped (compressed) format.  If you don't have an unzipping program (e.g. Winzip), you can use the free PC Magazine PCDEZIP utility.

Finally, please note that the answer keys for the exams and homework differ in the amount of detail provided.  I sometimes give very detailed answers, other times the answers are much more minimal (and given the information provided I assume the student can figure out the rest).  Students should always aim for complete answers in their homework and exams.  In particular, it is hard to give partial credit when it is not clear why an error was made.


Syllabus

Readings Packet (You need a Notre Dame NETID to access these)

Notre Dame's Center for Social Research (has links to several data sets and describes support services)

Dropbox. I strongly encourage you to set up a Dropbox account if you do not already have one. Dropbox gives you a minimum of 2GB of free online storage. More critically, with Dropbox you can set up shared folders. This makes it much easier when you want me or others to help you with your research. You can create a folder, put your data and programs in it, and then share the folder with me. If you set up an account use your .edu email address because you can get more bonus storage that way. For more click on this link. Or, you may wish to use the Box or Google Drive features offered by Notre Dame.


Useful sites for learning about Stata and SPSS

Rich Williams' Stata Highlights Page

UCLA's Statistical Computing Resources 
RW Suggestions for Using Stata at Notre Dame 

UCLA's Stata Starter Kit

RW's Suggested downloads

UCLA's SPSS Starter Kit
Resources for learning Stata UCLA - How does Stata compare with SAS and SPSS?
The Stata User Support Page Ben Jann's estout/esttab support page (esttab & estout are great for formatting output from Stata)

 


PART I: In this section, we briefly review the basics of OLS regression. We talk about some of the most common issues (measurement error, missing data, violations of OLS assumptions) encountered in regression analysis.

Using SPSS for OLS Regression (Optional; Read it on your own if you want/need to use SPSS)

reg01.sav - Data file used in the SPSS Regression handout

Using Stata 9 & and higher for OLS Regression (Read on your own & ask questions in Lab as needed.)

OLS-Stata.doreg01.dta- Stata files used in the Stata Regression handout

Overview

Review of Multiple Regression

Homework # 1 (Due Jan 28)

sphrd.dta (Stata data file required for HW # 1)

Homework # 1 Answer Key

Multicollinearity

multicoll.domulicoll.dta - Stata files used in the Multicollinearity handout

Missing Data Part 1: Overview, Traditional Methods

  mdpart1.do, md.dta - Stata files used in the Missing Data Part 1 handout & in the homework

Missing Data Part 2: Multiple Imputation

mdpart2.do - Stata file for the MD Part 2 handout

Also Recommended: Wisconsin SSCC's Multiple Imputation in Stata: Introduction. If you want to do serious analysis using Multiple Imputation, you should be sure to read this.

Homework # 2 (Due Feb 4)

longley.dta, md.dta, mdpov2.dta, missing.dta

missing.do

Homework # 2 Answer Key

hw02ak.do (Additional Stata Analyses for last part)

Measurement Error 1: Consequences of measurement error

merror1.do - Stata file for the measurement error handout

Measurement Error 2: Scale Construction (Very Brief Overview)

  merror2.do, anomia.dta - Stata files used in the Scale Construction handout

anomia.sav - SPSS data file used in the Scale Construction handout

Outliers

outliers.do, outliers.dta - Stata files used in the Outliers handout

outliers.sav - SPSS data file used in the Outliers handout

Also Recommended: Robert Yaffee's Robust Regression Modeling with Stata (This is 93 pages long but it is basically overhead slides and hence much shorter than it at first appears to be.  Nice discussions of how to deal with outliers and with heteroskedasticity.)

Also Recommended: UCLA's Regression Diagnostics Page.  Shows a lot of the techniques that are available with Stata for detecting outliers, heteroskedasticity, multicollinearity, serial correlation and other problems with regression models.

Heteroskedasticity

hetero.do, reg01.dta - Stata files used in the Heteroskedasticity handout

Complex Survey Data. By default, most statistical techniques assume that data were collected via simple random sampling. This is often not true for large national data sets. Fortunately, Stata makes it easy to analyze such data, but there are some important differences in how you go about testing hypotheses and assessing model fit. 

Also Recommended: UCLA's (see lower third of page) and StataCorp's FAQS on Survey Data Analysis (Optional; you may want to refer to these if you use the SVY commands)

Serial Correlation (Optional)

Homework # 3 (Due Feb 11)

resales.dta

Homework # 3 Answer Key

Sample first exams and answer keys

    Exam 1, 2015

    Exam 1, 2015 Answer Key


PART II: This section shows how regression can be used to properly specify a causal model. We begin by introducing "the logic of causal order," which lets us understand the different kinds of causal relationships that might be present between variables. Common model mis-specifications are then addressed (e.g. omitted variables, extraneous variables, variables with nonlinear effects). We discuss how to choose between alternative causal models. Finally, we introduce path analysis as a method for causal modeling.

tbklogic.zip These are toolbook presentations which we will go over in class.  Viewing Instructions  [NOTE: You may have trouble using this with Win 7. I did with Win7 64 bit. But it can be done. If you really want to do so I can tell you how to do it.]

[Optional] If you also want more conventional notes for the above material, click here and here. In class, I'll only use these notes if there is a problem with the Toolbook presentation.

Local of Causal Order, Handout 1: Variable Naming 

Local of Causal Order, Handout 2: Sample Problem, Logic of Causal Order

Local of Causal Order, Handout 3: Suppressor Effects

Local of Causal Order, Handout 4: Interaction Effects

Local of Causal Order, Handout 5: Another Sample Problem for the Logic of Causal Order

The Logic of Causal Order, Closing Comments

Homework # 4 (due Feb 25)

Homework # 4 Answer Key

Specification Error

specerror.do, reg01.dta - Stata files used in the specification error handout

Imposing and Testing Equality Constraints in Models

equalitytests.do, blwh.dta - Stata files used in the constraints & group comparisons handouts

Group Comparisons: Differences in Composition Versus Differences in Models and Effects

groupcomparisons1.do, blwh.dta - Stata files used in the constraints & group comparisons handouts

Group Comparisons: Using "What If" Scenarios to Decompose Differences Across Groups

groupcomparisons2.do, blwh.dta, goodpay.dta - Stata files used in the second group comparisons handout

Homework # 5 (Due March 4)

gender.dta

Homework # 5 Answer Key

hw05ak.do - Stata program used in Homework #5 Answer key

Interaction Effects and Group Comparisons

interactions1.do, blwh.dta - Stata files used in the interactions effects and group comparisons handout

Models for Group Comparisons - Summary

groupsummary.do, blwh.dta - Stata files used in the group comparisons summary handout

Interpreting Interaction Effects; Interaction Effects and Centering

centering.do, drinking.dta - Stata files used in the Interpreting Interaction Effects handout

  Interactions Between Continuous Variables (Optional)

Homework # 6 (Due March 18)

gender.dta

jqges2.do

jqges2.dta

Homework # 6 Answer Key

Nonlinear Relationships

nonlinear.do, nonlin1.dtanonlinln.dta - Stata files used in the nonlinear relations handout

Also recommended: http://fmwww.bc.edu/repec/bocode/t/transint.html This is a nice discussion of the reasons for doing transformations and some of the more common types of transformations.

Introduction to Path Analysis

 pathanalysis.do - Stata file used in the path analysis handout

Introduction to Path Analysis - Highlights

Homework # 7 (Due March 25)

nonlinhw.do

nonlinhw.dta

Homework # 7 Answer Key

Sample second exams and answer keys

    Exam 2, 2015

    Exam 2, 2015 Answer Key


PART III: Here, we develop path analysis techniques more fully. We talk about more complicated models that cannot be accurately estimated through conventional OLS regression techniques (e.g. nonrecursive models). We also talk about situations where the nature of the data make OLS regression inappropriate (e.g. dichotomous dependent variables) or less than optimal.

Structural Coefficients in Recursive Models/ Evils of Standardization

Computing R Square/ Evils of R Square

Homework # 8 (Due April 8)

evilstnd.do

Homework # 8 Answer Key

Logistic Regression I: Problems with the Linear Probability Model (LPM)

 logit1.do - Stata file(s) used in the logistic regression 1 handout

Logistic Regression II: The Logistic Regression Model (LRM)

 logit2.do - Stata file(s) used in the logistic regression 2 handout

Logistic Regression III: Hypothesis Testing, Comparisons with OLS

 logit3.do - Stata file(s) used in the logistic regression 3 handout

Using Stata for Logistic Regression (be sure to read this on your own, as it covers important details we may not go over in class)

 logistic-stata.do - Stata file(s) used in the using stata for logistic regression handout

logist.dta - Stata data file used in the Logistic Regression handouts

Homework # 9 (Due April 22)

lrb.dta

Homework # 9 Answer Key

Brief Overviews of Other Advanced Methods.

Advanced Categorical Data Analysis (Optional). Depending on your research topics you may want to look at some of the notes from my Soc 73994 class.

Brief Overview of Panel Data

panel.do, nlsy.dta and teenpov.dta - Stata files used in the Panel Data handout

nlsyxt.dta and teenpovxt.dta - reshaped Stata data files used in the Panel Data handout

Brief Overview of Manova

blwh.dta - Stata data file used in the Manova handout

Nonrecursive Models (Highlights)

Nonrecursive Models (Optional Long Version) This is an older version of the handout that has much more detail if you want it; the highlights version is probably all you need in practice, at least for a basic understanding.

nonrecur.dta - Stata data file used in the Nonrecursive Models handouts

Extremely Brief Overviews of Event History Analysis and Hierarchical Linear Modeling -- Read Ch. 9 of Paul Allison's Multiple Regression Primer, paying particular attention to section 9.9 (Multilevel Models) and section 9.12 (Event History Analysis)

Brief Overview of Structural Equation Modeling using Stata's sem commands

Brief Overview of Structural Equation Modeling using LISREL (Optional; you can read this if you ever happen to be using LISREL

Homework # 10 (Due April 29)

Homework # 10 Answer Key

Sample final exams and answer keys

    Exam 3, 2015

    Exam 3, 2015 Answer Key


Other materials may be available upon request.