Case 8

CHAPTER FOUR

Case #8: AUTOCORRELATION and SPURIOUS REGRESSION

Goal: This case examines the consequences of the common time-series problem of serial correlation (autocorrelation). Specifically, it introduces:

Basic Keynesian Consumption Theory
Testing for the presence of Autocorrelation
Estimation of Autocorrelated Data using Data First-Differencing
Implications of Spurious Regression.

Problem Spreadsheet

The spreadsheet for this problem is CH4_Case2.xls. It contains the following data:

Variable	Data Range
CON	1970Q1-1990Q4
GNP	1970Q1-1990Q4
DIFF_CON	1970Q1-1990Q4
DIFF_GNP	1970Q1-1990Q4

The series CON is quarterly data on Personal Consumption Expenditure (seasonally adjusted in current dollars). This variable measures aggregate consumption expenditure.

The series GNP is quarterly data on Gross National Product (seasonally adjusted in current dollars). This variable measures the total value of goods produced in the national economy.

The series DIFF_CON is the first difference of CON calculated as.

DIFF_CON_t = CON_t — CON_t-1

The series DIFF_GNP is the first difference of GNP calculated as above.

Keynesian Theory of Consumption

John Maynard Keynes, the father of modern macroeconomics, made two important theoretical observations about aggregate consumption behavior: (1) What was not consumed out of current income is saved; and (2) Current consumption depends on current income in a less than proportionate manner.

Keynes’ theories placed emphasis on a parameter called the marginal propensity to consume (MPC) which is the slope of the aggregate consumption function and a key factor in determining the "multiplier effect" of tax and spending policies.

To examine behavior of personal consumption expenditure (CON) and gross national product (GNP) we generated a time-series plot in Excel.

Question #1: Based on examination of the time-series plot of CON and GNP, are these series stationary?

ANSWER:

To formally test whether both series are nonstationary, we estimated the autocorrelation function using the Analyze button in FORECASTX^TM.

Obs	CON ACF	GNP ACF
1	.9653	.9666
2	.9296	.9319
3	.8940	.8961
4	.8580	.8600
5	.8226	.8248
6	.7870	.7891
7	.7513	.7532
8	.7157	.7175
9	.6799	.6824
10	.6447	.6480
11	.6100	.6138
12	.5757	.5801

Correlograms for CON and GNP are reported below.

Question #2: Based on examination of the correlogram for each series, are the data stationary, i.e., can we reject all nulls of zero autocorrelation for a twelve-quarter lag structure at the 5-percent level of significance?

ANSWER:

Next, to examine the relationship between consumption and income, we estimated the correlation matrix for the two series using FORECASTX^TM.

Audit Trail -- Correlation Coefficient Table
Series
Description	CON	GNP
CON	1.00E+00	9.99E-01
GNP	9.99E-01	1.00E+00

Using Excel, a scatterplot of the data in levels is shown below.

Question #3: Given the scatterplot and the estimated correlation between CON and GNP, do the two series appear to be related?

ANSWER:

Estimating the Marginal Propensity to Consume

Keynes’ analysis of the economy placed great emphasis on the size of the marginal propensity to consume (MPC), which determined the "multiplier effect" associated with fiscal and monetary policy. Keynes argued that the MPC was less than, but close to one.

We can obtain an estimate of the MPC by applying Ordinary Least Squares to the following aggregate consumption function:

GC = a + bGNP + e

where the slope parameter (b) is the marginal propensity to consume.

Estimation results of the Keynesian consumption function using FORECASTX^TM are reported below.

Multiple Regression -- Result Formula

*CON = -99.00 + ( (GNP) 0.673365 )**

Audit Trail -- Coefficient Table (Multiple Regression Selected)
Series	Included		Standard			Overall
Description	in Model	Coefficient	Error	T-test	P-value	F-test
CON	Dependent	-9.90E+01	9.60E+00	-10.31	0.00	55,092.75
GNP	Yes	6.73E-01	2.87E-03	234.72	0.00

Accuracy Measures				Value
AIC				852.80
BIC				855.23
Mean Absolute Percentage Error (MAPE)				2.12%
Sum Squared Error (SSE)				123,189.23
R-Square				99.85%
Adjusted R-Square				99.85%

Question #4: How well does this model fit the data? Explain!

ANSWER:

The forecaster should be suspicious however that these results may be overly optimistic. The reason is that both series are nonstationary and therefore may have a common trend. In other words, the regression may be plagued by serial correlation.

Testing for Serial Correlation

Using FORECASTX^TM, we obtained the following estimate of the Durbin-Watson statistic.

Forecast Statistics		Value
Durbin Watson		1.64E-01

Question #5: Does the data exhibit autocorrelation? Specifically, test the null hypothesis of no autocorrelation using the Durbin-Watson test.

ANSWER:

Correcting for Autocorrelation

When significant autocorrelation is present, spurious regression may arise in that our results appear to be highly accurate when in fact they are not, since the OLS estimator of the regression error variance is biased downward. To investigate this possibility, we will re-estimate our consumption function using first differences of the original data. This transformation is designed to eliminate any common linear trend in the data.

A scatterplot of the transformed series DIFF_CON and DIFF_GNP is shown below.

Note how the relationship between DIFF_CON and DIFF_GNP is still positive it is much more random than CON and GNP.

To test for stationarity in the first-differences of the data we produced the following correlograms using FORECASTX^TM.

Question #6: Based upon the correlograms above are the first-differenced series stationary?

ANSWER:

Using FORECASTX^TM, the following results were obtained from estimating the consumption function using DIFF_CON and DIFF_GNP.

Multiple Regression -- Result Formula

*DIFF_CON = 14.86 + ( (DIFF_GNP) 0.42388 )**

Audit Trail -- Coefficient Table (Multiple Regression Selected)
Series	Included		Standard			Overall
Description	in Model	Coefficient	Error	T-test	P-value	F-test
DIFF_CON	Dependent	1.49E+01	2.89E+00	5.14	0.00	93.84
DIFF_GNP	Yes	4.24E-01	4.38E-02	9.69	0.00

Accuracy Measures			Value
AIC			672.97
BIC			675.40
Mean Absolute Percentage Error (MAPE)			38.04%
Sum Squared Error (SSE)			14,482.62
R-Square			53.37%
Adjusted R-Square			52.80%

Question #7: How well does the revised model fit the data? Explain!

ANSWER:

To examine whether serial correlation is still a problem we estimated the Durbin-Watson statistic for the revised model.

Forecast Statistics		Value
Durbin Watson		1.76E+00

Question #8: By reference to the Durbin-Watson statistic, does the revised model exhibit autocorrelation? Specifically, test the null hypothesis of no autocorrelation using the Durbin-Watson test.

ANSWER: Given K = 1 and N = 84, Table 4-7 gives us:

dL = 1.62 and dU = 1.67.

Given d = 1.76, by application of Figure 4-12 leads us to conclude that we no longer have autocorrelation, i.e., we cannot reject the null in favor of positive autocorrelation.

Question #9: Compare the estimated coefficient of determination (R2), the estimated coefficient on the independent variable, the standard error of the estimated coefficient on the independent variable, and t-values for both versions of the model.

Does the original model in levels (CON and GNP) show signs of spurious regression? Explain.

ANSWER: Comparisons of the two models are summarized below:

Estimate	Data in Levels	Data in First Differences
Slope Parameter (b)	.6733	.4238
Std. Error of Slope Estimate	.00287	.0438
t-statistic on Slope Estimate	234.72	9.69
R-squared	.9985	.5337

As we expect to find under autocorrelation, the two coefficient estimates are rather close, reflecting the fact that OLS slope estimator is still unbiased under autocorrelation, i.e., the slope estimates should converge for vary large samples.

On the other hand, the estimated standard error of the coefficient is much greater in the latter model than the former, consistent with OLS underestimating the regression variance in the first model. This is especially true for the t-statistics of the estimated slope coefficients, which differ by a factor of 200! Both of these results lead us to conclude that "Spurious Regression" plagues the first version of the model. In other words, the reported quality of our initial results is overstated.

Student Practice Questions

Question #1: Explain why researchers should always check their regression models for the presence of serial correlation. What is "spurious" about this problem?

Question #2: Repeat the analysis of this case using second differences of CON and GNP. Contrast and compare your results to those of this case.