CHAPTER FIVE

Case #10: DEALING WITH SEASONAL DATA

Goal: This assignment deals with the common problem of seasonality in time series data.

Specifically, it introduces:

Problem Spreadsheet

The spreadsheet for this problem is CH5_Case2.xls. It contains the following data:

Variable

Data Range

N_HOME_SALES

1980M1-1985M12

TIME

1980M1-1985M12

M2

1980M1-1985M12

M3

1980M1-1985M12

M4

1980M1-1985M12

M4

1980M1-1985M12

M5

1980M1-1985M12

M6

1980M1-1985M12

M7

1980M1-1985M12

M8

1980M1-1985M12

M9

1980M1-1985M12

M10

1980M1-1985M12

M11

1980M1-1985M12

M12

1980M1-1985M12

The series N_HOME_SALES is monthly data on new homes sales in the United States.

The series TIME is an index taking the value of 1 in 1980M1 and increasing every year by a unit. This variable will proxy any linear growth over time in new homes sold.

The series M2-M12 are monthly seasonal dummy variables taking the value of one in their respective month. For example, the variable M5 takes the value of 1 in month five, and zero values elsewhere. The set of 11 dummies will allow us to quantify seasonal behavior in the context of multiple regression.

Examining Data for Seasonality

Seasonality is defined as variations in the level of data that occur with regularity at the same time each year. Accordingly, when the data are seasonal, we can use this information to improve our forecasts since, to a large extent, seasonal effects are predictable. For example, we can safely predict that ski lodges will have reduced business during the summer months.

To visually inspect for seasonality, we plotted new homes sales data using Excel.

To formally test for data stationarity, we estimated a correlogram for N_HOME_SALES using FORECASTXTM.

Question #1: Based upon examination of the time-series plot and correlogram of N_HOME_SALES, are the data seasonal? Explain.

ANSWER:

Linear Trend Regression

First, let’s estimate a linear trend model using 1985 as our holdout period to assess out-of-sample forecast accuracy as well as generate forecasts into 1986. This is easily done in FORECASTXTM by using the holdback period button. The results are reported below in a series of tables.

Multiple Regression -- Result Formula

N_HOME_SALES = 36.23 + ( (TIME) * 0.27952 )

Forecast -- Multiple Regression Selected

 

 

Forecast

 

Date

Monthly

Quarterly

Annual

Jan-1986

56.63

Feb-1986

56.91

Mar-1986

57.19

170.73

Apr-1986

57.47

May-1986

57.75

Jun-1986

58.02

173.24

Jul-1986

58.30

Aug-1986

58.58

Sep-1986

58.86

175.74

Oct-1986

59.14

Nov-1986

59.42

Dec-1986

59.70

178.25

697.96

Avg

58.16

174.49

697.96

Max

59.70

178.25

697.96

Min

56.63

170.73

697.96

Audit Trail -- Coefficient Table (Multiple Regression Selected)

 

 

 

Series

Included

Standard

Overall

Description

in Model

Coefficient

Error

T-test

P-value

F-test

N_HOME_SALES

Dependent

3.62E+01

2.19E+00

16.54

0.00

28.73

TIME

Yes

2.80E-01

5.21E-02

5.36

0.00

 

The following accuracy statistics refer to the holdout period of 1985.

Accuracy Measures

 

Value

Mean Absolute Percentage Error (MAPE)

18.01%

Sum Squared Error (SSE)

5,919.93

R-Square

29.10%

Adjusted R-Square

28.09%

Question #2: How well does the linear trend model fit the data?

ANSWER:

Time Trend with Seasonal Dummy Variables

Next we generated new home sales forecasts for the holdout period 1985M1-1985M12 as well as forecasts into 1986 based upon a linear trend model with seasonal dummies. Specifically, our forecasts will be based upon a model with two features:

1) Trend, accomplished by adding a time index to the model as an explanatory variable.

2) Season, accomplished by adding a series of seasonal dummies as explanatory variables. Here the dummy variables act as on-off switches for a certain time of the year for every year, allowing us to track seasonal variation in the regression model constant term. Accordingly, this approach handles seasonality in an additive manner.

Note: In creating our seasonal dummy variables, we have followed our "N-1" rule, as a set of 12 monthly seasonal dummy variables would be perfectly collinear.

The following multiple regression model designed to account for trend and data seasonality was estimated using FORECASTXTM:

yt = a + bTIME + d2M2 + d3M3 + d4M4 + d5M5 + d6M6

+ d7M7 + d8M8 + d9M9 + d10M10 + d11M11 + d12M12 + et

Here the M2 through M12 are dummy variables with associated parameters d2, ...d12.

Accordingly, each dummy variable coefficient shows the level of housing sales in a given month relative to January, the base period.

Results from estimating the time trend model with seasonal dummies using 1985 as our holdout period as well as generate forecasts into 1986 are reported below.

Forecast -- Multiple Regression Selected

 

 

Forecast

 

Date

Monthly

Quarterly

Annual

Jan-1986

54.77

Feb-1986

58.11

Mar-1986

65.44

178.32

Apr-1986

61.44

May-1986

64.94

Jun-1986

63.43

189.81

Jul-1986

60.77

Aug-1986

61.10

Sep-1986

58.10

179.96

Oct-1986

58.26

Nov-1986

53.26

Dec-1986

50.76

162.29

710.38

Avg

59.20

177.59

710.38

Max

65.44

189.81

710.38

Min

50.76

162.29

710.38

Audit Trail -- Coefficient Table (Multiple Regression Selected)

 

 

 

Series

Included

Standard

Overall

Description

in Model

Coefficient

Error

T-test

P-value

F-test

N_HOME_SALES

Dependent

3.26E+01

3.75E+00

8.69

0.00

4.92

TIME

Yes

3.04E-01

4.84E-02

6.29

0.00

M10

Yes

7.63E-01

4.87E+00

0.16

0.88

M11

Yes

-4.54E+00

4.88E+00

-0.93

0.36

M12

Yes

-7.35E+00

4.88E+00

-1.50

0.14

M2

Yes

3.03E+00

4.86E+00

0.62

0.54

M3

Yes

1.01E+01

4.86E+00

2.07

0.04

M4

Yes

5.75E+00

4.86E+00

1.18

0.24

M5

Yes

8.95E+00

4.86E+00

1.84

0.07

M6

Yes

7.15E+00

4.86E+00

1.47

0.15

M7

Yes

4.18E+00

4.86E+00

0.86

0.39

M8

Yes

4.20E+00

4.87E+00

0.86

0.39

M9

Yes

9.00E-01

4.87E+00

0.18

0.85

 

The following accuracy statistics refer to the holdout period of 1985.

Accuracy Measures

 

Value

Mean Absolute Percentage Error (MAPE)

14.20%

Sum Squared Error (SSE)

4,172.11

R-Square

50.03%

Adjusted R-Square

39.87%

Question #3: Based upon the regression results above, do the pattern of estimated coefficients of the eleven dummy variables make sense? Which month has the highest new houses sold and which month the lowest? Explain noting the interpretation of the estimated parameters of the dummy variables.

ANSWER:

Question #4: Comment on how each model generates forecasts for 1996. Which method shows seasonality in forecasts for 1996?

ANSWER:

Student Practice Questions

Question #1: Examine the seasonal linear time trend model (See Question #2) for the presence of serial correlation. Re-estimate the data using first differences of the data. Contrast and compare your results with those in this case.

Question #2: Try forecasting N_HOME_SALES using Winter's Exponential Smoothing. Contrast and compare your results with those of this case. Explain.