PROBLEM 7: MEASURING MULTICOLLINEARITY
OBJECT OF THE PROBLEM
To use various measures of the degree of multicollinearity.
INTRODUCTION
There are basically six ways of measuring multicollinearity. The
first is to look for weak t-statistics among the explanatory
variables when the F statistic is significant. A second measure
is to regress each explanatory variable against all of the other
explanatory variables. A high R-square will show significant
correlation from linear combinations between the variables. A
third measure is the R-square delete measure, done by dropping
explanatory variables from the model. If the R-square value
changes very little then we have an indication of
multicollinearity. A fourth measure is to look at the determinant
of the X'X matrix, having standardized the data. If it is close
to zero then we have an indication of multicollinearity since
that indicates that the variance is very large. The fifth measure
is the condition number, if it is greater than 30 then
multicollinearity exists. The final measure is the Variance
Inflation Factors (VIFs), if they are greater than 5 than
multicollinearity is indicated.
PROBLEM PROGRAM
The following program can be used to provide the four measures
for multicollinearity:
//PROB07 JOB (AF,A592),yourname,REGION=4096K
/*OPENBIN
//S1 EXEC SAS,SYSOUT=S,UCS=TN
//DAVISSON DD DSN=AU5920.ECON592.DATA(PRDATA),DISP=SHR
//SYSIN DD *
DATA MACRO; INFILE DAVISSON;
INPUT GNP 14-18 RESINV 44-47 PRICEIN 61-65 #2 LRINT 2-6 SRINT
8-12; PROC PRINT;
* (1) PRIMARY REGRESSION MODEL;
PROC REG;
MODEL RESINV = GNP PRICEIN LRINT SRINT / VIF COLLIN STB COVB;
TITLE 'PRIMARY REGRESSION MODEL';
* (2) T-STATISTIC AND EXPLANATORY VARIABLE REGRESSION MODEL; ROC
REG;
MODEL SRINT = GNP PRICEIN LRINT;
MODEL LRINT = GNP PRICEIN SRINT;
MODEL PRICEIN = GNP LRINT SRINT;
MODEL GNP = LRINT SRINT PRICEIN;
TITLE 'T-STATISTIC AND EXPLANATORY VARIABLE REGRESSION ODEL';
* (3) RSQUARE DELETE MEASURE;
PROC RSQUARE; MODEL RESINV=GNP PRICEIN LRINT SRINT;
TITLE 'RSQUARE DELETE MEASURE';
* (4) STANDARDIZED X'X DETERMINANT MEASURE;
DATA STAN1; SET MACRO;
PROC MEANS NOPRINT;
OUTPUT OUT = STATS1
N = OBS;
DATA STAN2; SET MACRO;
PROC STANDARD MEAN=0 STD=1 OUT=STAN3;
DATA BACON;
IF _N_ = 1 THEN SET STATS1;
SET STAN3;
N = OBS;
SGNP = GNP/SQRT(N-1);
SPRICEIN = PRICEIN/SQRT(N-1);
SLRINT = LRINT/SQRT(N-1);
SSRINT = SRINT/SQRT(N-1);
PROC IML;
TITLE 'PROC IML RESULTS';
USE BACON;
READ ALL VAR{SGNP,SPRICEIN,SLRINT,SSRINT} INTO X;
INT=J(NROW(X),1,1); * CREATES A VECTOR OF 1S FOR THE INTERCEPT;
X=INT||X;
XT=X`;
XTX=XT*X;
DXTX=DET(XTX);
PRINT 'STANDARDIZED X''X DETERMINANT MEASURE',DXTX;
* (5) CONDITION NUMBER;
M = EIGVAL(XTX);
MINEIGV = MIN(M);
MAXEIGV = MAX(M);
CONDNUM = SQRT(MAXEIGV/MINEIGV);
PRINT 'CONDITION NUMBER',CONDNUM;
* (6) VARIANCE INFLATION FACTORS;
IXTX = INV(XTX);
VIFS = VECDIAG(IXTX);
PRINT 'VARIANCE INFLATION FACTORS',XTX,IXTX,VIFS;
//
DISCUSSION OF THE PROGRAM
The model used for our problem tries to explain movements in
residential investment (housing construction). Four independent
variables are used and they include GNP, the consumer price
index, a long run interest rate, and a short run interest rate.
The first measure is found by using the PROC REG statement
and comparing the model's t-statistics and F-statistics. The
second measure is found by using the second PROC REG statement
with four separate models, regressing each independent variable
against all the other explanatory variables. By using PROC
RSQUARE we do the third measure, the Rsquare delete measure.
Fourth, having standardized the variables, we use PROC IML to
look at the determinant of the standardized X'X matrix. Fifth, we
find the maximum and minimum eigenvalues of the standardized X'X
matrix. The Condition Number is then found by taking the square
root of the quotient of the maximum and minimum eigenvalues.
Sixth, the Variance Inflation Factors are found by getting the
diagonal of the inverse of the standardized X'X matrix.
INSTRUCTIONS FOR WRITE-UP
Give an intuitive and econometrics notation explanation
of each step of the program. Give an intuitive explanation of
what the problem is trying to accomplish. Multicollinearity is
tricky business. If you are going to blame poor regression
results on multicollinearity, you must show evidence of its
existence. Analyze the six methods of measuring
multicollinearity. Pay special attention to the t and F-
statistics and the R-Squares. Are the t-statistics "small" while
the F-statistic remains "large"? Are any of the independent
variables highly correlated with the rest of the independent
variables? Does the R-Square decrease significantly if
a variable(s) is dropped? Is the DET(XTX) "small", that is, close
to zero? Is the Condition Number greater than 30? Are the
Variance Inflation Factors greater than 5? How subjective a
process is measuring for multicollinearity? Does this begin to
tell you about the nature of much of econometric analysis? Is
your sample multicollinear? Does exact (perfect)
multicollinearity violate the assumptions of ordinary least
squares? Does imperfect multicollinearity violate the assumptions
of ordinary least squares? What are the resulting OLS properties
of the estimators in each case?