PROBLEM 7: MEASURING MULTICOLLINEARITY

OBJECT OF THE PROBLEM

To use various measures of the degree of multicollinearity.

INTRODUCTION

There are basically six ways of measuring multicollinearity. The

first is to look for weak t-statistics among the explanatory

variables when the F statistic is significant. A second measure

is to regress each explanatory variable against all of the other

explanatory variables. A high R-square will show significant

correlation from linear combinations between the variables. A

third measure is the R-square delete measure, done by dropping

explanatory variables from the model. If the R-square value

changes very little then we have an indication of

multicollinearity. A fourth measure is to look at the determinant

of the X'X matrix, having standardized the data. If it is close

to zero then we have an indication of multicollinearity since

that indicates that the variance is very large. The fifth measure

is the condition number, if it is greater than 30 then

multicollinearity exists. The final measure is the Variance

Inflation Factors (VIFs), if they are greater than 5 than

multicollinearity is indicated.

PROBLEM PROGRAM

The following program can be used to provide the four measures

for multicollinearity:

//PROB07 JOB (AF,A592),yourname,REGION=4096K

/*OPENBIN

//S1 EXEC SAS,SYSOUT=S,UCS=TN

//DAVISSON DD DSN=AU5920.ECON592.DATA(PRDATA),DISP=SHR

//SYSIN DD *

DATA MACRO; INFILE DAVISSON;

INPUT GNP 14-18 RESINV 44-47 PRICEIN 61-65 #2 LRINT 2-6 SRINT

8-12; PROC PRINT;

* (1) PRIMARY REGRESSION MODEL;

PROC REG;

MODEL RESINV = GNP PRICEIN LRINT SRINT / VIF COLLIN STB COVB;

TITLE 'PRIMARY REGRESSION MODEL';

* (2) T-STATISTIC AND EXPLANATORY VARIABLE REGRESSION MODEL; ROC

REG;

MODEL SRINT = GNP PRICEIN LRINT;

MODEL LRINT = GNP PRICEIN SRINT;

MODEL PRICEIN = GNP LRINT SRINT;

MODEL GNP = LRINT SRINT PRICEIN;

TITLE 'T-STATISTIC AND EXPLANATORY VARIABLE REGRESSION ODEL';

* (3) RSQUARE DELETE MEASURE;

PROC RSQUARE; MODEL RESINV=GNP PRICEIN LRINT SRINT;

TITLE 'RSQUARE DELETE MEASURE';

* (4) STANDARDIZED X'X DETERMINANT MEASURE;

DATA STAN1; SET MACRO;

PROC MEANS NOPRINT;

OUTPUT OUT = STATS1

N = OBS;

DATA STAN2; SET MACRO;

PROC STANDARD MEAN=0 STD=1 OUT=STAN3;

DATA BACON;

IF _N_ = 1 THEN SET STATS1;

SET STAN3;

N = OBS;

SGNP = GNP/SQRT(N-1);

SPRICEIN = PRICEIN/SQRT(N-1);

SLRINT = LRINT/SQRT(N-1);

SSRINT = SRINT/SQRT(N-1);

PROC IML;

TITLE 'PROC IML RESULTS';

USE BACON;

READ ALL VAR{SGNP,SPRICEIN,SLRINT,SSRINT} INTO X;

INT=J(NROW(X),1,1); * CREATES A VECTOR OF 1S FOR THE INTERCEPT;

X=INT||X;

XT=X`;

XTX=XT*X;

DXTX=DET(XTX);

PRINT 'STANDARDIZED X''X DETERMINANT MEASURE',DXTX;

* (5) CONDITION NUMBER;

M = EIGVAL(XTX);

MINEIGV = MIN(M);

MAXEIGV = MAX(M);

CONDNUM = SQRT(MAXEIGV/MINEIGV);

PRINT 'CONDITION NUMBER',CONDNUM;

* (6) VARIANCE INFLATION FACTORS;

IXTX = INV(XTX);

VIFS = VECDIAG(IXTX);

PRINT 'VARIANCE INFLATION FACTORS',XTX,IXTX,VIFS;

//

DISCUSSION OF THE PROGRAM

The model used for our problem tries to explain movements in

residential investment (housing construction). Four independent

variables are used and they include GNP, the consumer price

index, a long run interest rate, and a short run interest rate.

The first measure is found by using the PROC REG statement

and comparing the model's t-statistics and F-statistics. The

second measure is found by using the second PROC REG statement

with four separate models, regressing each independent variable

against all the other explanatory variables. By using PROC

RSQUARE we do the third measure, the Rsquare delete measure.

Fourth, having standardized the variables, we use PROC IML to

look at the determinant of the standardized X'X matrix. Fifth, we

find the maximum and minimum eigenvalues of the standardized X'X

matrix. The Condition Number is then found by taking the square

root of the quotient of the maximum and minimum eigenvalues.

Sixth, the Variance Inflation Factors are found by getting the

diagonal of the inverse of the standardized X'X matrix.

INSTRUCTIONS FOR WRITE-UP

Give an intuitive and econometrics notation explanation

of each step of the program. Give an intuitive explanation of

what the problem is trying to accomplish. Multicollinearity is

tricky business. If you are going to blame poor regression

results on multicollinearity, you must show evidence of its

existence. Analyze the six methods of measuring

multicollinearity. Pay special attention to the t and F-

statistics and the R-Squares. Are the t-statistics "small" while

the F-statistic remains "large"? Are any of the independent

variables highly correlated with the rest of the independent

variables? Does the R-Square decrease significantly if

a variable(s) is dropped? Is the DET(XTX) "small", that is, close

to zero? Is the Condition Number greater than 30? Are the

Variance Inflation Factors greater than 5? How subjective a

process is measuring for multicollinearity? Does this begin to

tell you about the nature of much of econometric analysis? Is

your sample multicollinear? Does exact (perfect)

multicollinearity violate the assumptions of ordinary least

squares? Does imperfect multicollinearity violate the assumptions

of ordinary least squares? What are the resulting OLS properties

of the estimators in each case?