PROBLEM 8: TESTING FOR HETEROSKEDASTICITY

OBJECT OF THE PROBLEM

To test for heteroskedasticity using the Glejser test and the

Goldfeld-Quandt test.

INTRODUCTION

Heteroskedasticty is a problem in econometric estimation because

it violates the OLS assumption of constant variance between the

dependent variable and the independent variables. While OLS

estimates are still unbiased and consistent, efficiency is lost.

Two tests for heteroskedasticity include the Glejser test and the

Goldfeld-Quandt test.

GLEJSER TEST

The Glejser test attempts to determine whether as the independent

variable increases in size, the variance of the observed

dependent variable increases. This is done by regressing the

error term of the predicted model against the independent

variables. A high t-statistic (or low PROB-VALUE) for the

estimated coefficient of the independent variable(s) would

indicate the presence of heteroskedasticity.

The following program can be used to perform the Glejser

test:

//PROB08A JOB (AF,A592),yourname,REGION=4096K

/*OPENBIN

//S1 EXEC SAS

//SYSIN DD *

DATA LABOR;INPUT EDUC 2-3 AGE 6-7 HRWORK 10-13 WAGE 14-17 2

INVEST 18-23 YIELD 24-25 2 CONSUMP 26-31;

PAY=WAGE*HRWORK;

IVEARN=YIELD*INVEST;

INCOME=PAY+IVEARN;

CARDS;

*NOTE USE SAME CARDS AS IN PROBLEM THREE (PROB03);

PROC PRINT;

PROC REG; MODEL CONSUMP=INCOME EDUC AGE;

OUTPUT OUT=LORENZ

RESIDUAL=RESID;

DATA ONE; SET LORENZ;

ARESID=ABS(RESID);

PROC REG;

MODEL ARESID=INCOME EDUC AGE;

MODEL ARESID=INCOME AGE;

MODEL ARESID=INCOME EDUC;

MODEL ARESID=EDUC AGE;

MODEL ARESID=INCOME;

MODEL ARESID=EDUC;

MODEL ARESID=AGE;

TITLE 'GLEJSER TESTS';

//

In this example we use the job-worker characteristics data, which

we do not bother to reprint since it has been used in other

problems (e.g. PROB03). We use PROC REG to regress the absolute

value of the residuals against the independent variables, using

the OUTPUT statement to create a data set containing the

residuals which is then transformed to get its absolute value.

The statistic of interest in the REG output is the t-

statistic of the independent variables when the dependent

variable is the absolute value of the residual. A high t

statistic (i.e. low prob-value) is an indication of

heteroskedasticity.

GOLDFELD-QUANDT TEST

The Goldfeld-Quandt test cuts the data in half and then measures

the variance of the two groups. See pp. 148-149 in Pindyck &

Rubinfeld. If the variance differs then we have an indication of

heteroskedasticity. When splitting the data a middle portion of

the data is excluded from either group. This increases the power

of the test by accentuating the difference between the two

groups. The rule of thumb for the number of observations to

exclude is 4/15ths of the data.

The following program can be used in applying the Goldfeld-

Quandt test:

//PROB08B JOB (AF,A592),yourname,REGION=4096K

/*OPENBIN

//S1 EXEC SAS

//ZALKIN DD DSN=AU5920.ECON592.DATA(FARMDATA),DISP=SHR

//SYSIN DD *

DATA FARM; INFILE ZALKIN;

INPUT CORN LABOR LAND FERT;

PROC REG; MODEL CORN=LAND LABOR FERT;

TITLE 'ORIGINAL MODEL';

DATA FARM2; RETAIN CORN LABOR LAND FERT; SET FARM;

PROC SORT DATA=FARM2 OUT=TEST1; BY LABOR;

DATA TEST1A; SET TEST1;

X+1; IF X>55 THEN STOP;

PROC REG; MODEL CORN=LABOR LAND FERT;

TITLE 'GOLDFELD-QUANDT LOWER FOR LABOR';

DATA TEST1B; SET TEST1;

X+1; IF X<96 THEN DELETE;

PROC REG; MODEL CORN=LABOR LAND FERT;

TITLE 'GOLDFELD-QUANDT UPPER FOR LABOR';

//

The sequence starting with DATA FARM2 is the test for the labor

variable. It can be repeated for each of the other independent

variables.

Note that we had to create a data set (FARM2) sorted

by the independent variable. The SORT is necessary in order to

make sure that our lower data set contains only the low values of

the independent variable of interest, and that our upper data set

contains only the high values. Since there are 150 observations

in the data set, we use the two input statements to exclude the

middle 40 (4/15ths of 150).

INSTRUCTIONS FOR WRITE-UP

Carefully read pp. 148-152 in Pindyck & Rubinfeld. Give an

intuitive and econometrics notation explanation of each step of

the program and an intuitive explanation of what the problem is

trying to accomplish. Write down the estimated models and analyze

them. Give an intuitive explanation of each model. Demonstrate

the Glejser test by analyzing all the necessary test statistics:

giving critical values from the tables; and drawing the t and F

distribution graphs. Do you feel that the primary model in the

first part of this problem is heteroskedastic? Perform the

Goldfeld-Quandt test, including the test statistic, distribution,

critical point, and indicate whether significant or not. Draw a

scatter plot of the dependent variable (CORN) against the

independent variable (LABOR); showing the three regions: SSE1

region, the deleted region, and SSE2 region. Do you feel that the

primary model in the second part of this problem is

heteroskedastic? In general, what are the properties of the

ordinary least squares (OLS) estimators in the presence of

heteroskedasticity.