PROBLEM 8: TESTING FOR HETEROSKEDASTICITY
OBJECT OF THE PROBLEM
To test for heteroskedasticity using the Glejser test and the
Goldfeld-Quandt test.
INTRODUCTION
Heteroskedasticty is a problem in econometric estimation because
it violates the OLS assumption of constant variance between the
dependent variable and the independent variables. While OLS
estimates are still unbiased and consistent, efficiency is lost.
Two tests for heteroskedasticity include the Glejser test and the
Goldfeld-Quandt test.
GLEJSER TEST
The Glejser test attempts to determine whether as the independent
variable increases in size, the variance of the observed
dependent variable increases. This is done by regressing the
error term of the predicted model against the independent
variables. A high t-statistic (or low PROB-VALUE) for the
estimated coefficient of the independent variable(s) would
indicate the presence of heteroskedasticity.
The following program can be used to perform the Glejser
test:
//PROB08A JOB (AF,A592),yourname,REGION=4096K
/*OPENBIN
//S1 EXEC SAS
//SYSIN DD *
DATA LABOR;INPUT EDUC 2-3 AGE 6-7 HRWORK 10-13 WAGE 14-17 2
INVEST 18-23 YIELD 24-25 2 CONSUMP 26-31;
PAY=WAGE*HRWORK;
IVEARN=YIELD*INVEST;
INCOME=PAY+IVEARN;
CARDS;
*NOTE USE SAME CARDS AS IN PROBLEM THREE (PROB03);
PROC PRINT;
PROC REG; MODEL CONSUMP=INCOME EDUC AGE;
OUTPUT OUT=LORENZ
RESIDUAL=RESID;
DATA ONE; SET LORENZ;
ARESID=ABS(RESID);
PROC REG;
MODEL ARESID=INCOME EDUC AGE;
MODEL ARESID=INCOME AGE;
MODEL ARESID=INCOME EDUC;
MODEL ARESID=EDUC AGE;
MODEL ARESID=INCOME;
MODEL ARESID=EDUC;
MODEL ARESID=AGE;
TITLE 'GLEJSER TESTS';
//
In this example we use the job-worker characteristics data, which
we do not bother to reprint since it has been used in other
problems (e.g. PROB03). We use PROC REG to regress the absolute
value of the residuals against the independent variables, using
the OUTPUT statement to create a data set containing the
residuals which is then transformed to get its absolute value.
The statistic of interest in the REG output is the t-
statistic of the independent variables when the dependent
variable is the absolute value of the residual. A high t
statistic (i.e. low prob-value) is an indication of
heteroskedasticity.
GOLDFELD-QUANDT TEST
The Goldfeld-Quandt test cuts the data in half and then measures
the variance of the two groups. See pp. 148-149 in Pindyck &
Rubinfeld. If the variance differs then we have an indication of
heteroskedasticity. When splitting the data a middle portion of
the data is excluded from either group. This increases the power
of the test by accentuating the difference between the two
groups. The rule of thumb for the number of observations to
exclude is 4/15ths of the data.
The following program can be used in applying the Goldfeld-
Quandt test:
//PROB08B JOB (AF,A592),yourname,REGION=4096K
/*OPENBIN
//S1 EXEC SAS
//ZALKIN DD DSN=AU5920.ECON592.DATA(FARMDATA),DISP=SHR
//SYSIN DD *
DATA FARM; INFILE ZALKIN;
INPUT CORN LABOR LAND FERT;
PROC REG; MODEL CORN=LAND LABOR FERT;
TITLE 'ORIGINAL MODEL';
DATA FARM2; RETAIN CORN LABOR LAND FERT; SET FARM;
PROC SORT DATA=FARM2 OUT=TEST1; BY LABOR;
DATA TEST1A; SET TEST1;
X+1; IF X>55 THEN STOP;
PROC REG; MODEL CORN=LABOR LAND FERT;
TITLE 'GOLDFELD-QUANDT LOWER FOR LABOR';
DATA TEST1B; SET TEST1;
X+1; IF X<96 THEN DELETE;
PROC REG; MODEL CORN=LABOR LAND FERT;
TITLE 'GOLDFELD-QUANDT UPPER FOR LABOR';
//
The sequence starting with DATA FARM2 is the test for the labor
variable. It can be repeated for each of the other independent
variables.
Note that we had to create a data set (FARM2) sorted
by the independent variable. The SORT is necessary in order to
make sure that our lower data set contains only the low values of
the independent variable of interest, and that our upper data set
contains only the high values. Since there are 150 observations
in the data set, we use the two input statements to exclude the
middle 40 (4/15ths of 150).
INSTRUCTIONS FOR WRITE-UP
Carefully read pp. 148-152 in Pindyck & Rubinfeld. Give an
intuitive and econometrics notation explanation of each step of
the program and an intuitive explanation of what the problem is
trying to accomplish. Write down the estimated models and analyze
them. Give an intuitive explanation of each model. Demonstrate
the Glejser test by analyzing all the necessary test statistics:
giving critical values from the tables; and drawing the t and F
distribution graphs. Do you feel that the primary model in the
first part of this problem is heteroskedastic? Perform the
Goldfeld-Quandt test, including the test statistic, distribution,
critical point, and indicate whether significant or not. Draw a
scatter plot of the dependent variable (CORN) against the
independent variable (LABOR); showing the three regions: SSE1
region, the deleted region, and SSE2 region. Do you feel that the
primary model in the second part of this problem is
heteroskedastic? In general, what are the properties of the
ordinary least squares (OLS) estimators in the presence of
heteroskedasticity.