INTERPRETING COMPLEX NONLINEAR MODELS

Lawrence Marsh and Maureen A, McGlynn, University of Notre Dame

Deboparn Chakraborty, University of Wisconsin-Milwaukee

ABSTRACT

This paper demonstrates the use of SAS software in the systematic evaluation and interpretation of complex nonlinear models. This impact of a unit change in one explanatory variable on the dependent variable of a nonlinear model is difficult to determine since the change in the predicted value is conditional on all of the other variables for that observation. Furthermore, attempts to approximate this relationship have proven inadequate and often meaningless. In this paper, we present an alternative approach to evaluating the marginal effects of a unit change in the explanatory variables which uses individual observations to calculate the change in the predicted value brought about by a unit change in one of the explanatory variables to illustrate the variable's distributional impact on the dependent variable. Examples of a CES production function, a logit model and a neural network model are provided to illustrate the application of this technique using SAS software.

INTRODUCTION

The standard nonlinear regression model can be expressed by the general form

where yi is the endogenous variable, xi is a 1*N vector of exogenous parameters, b is a vector of unknown parameters and is the error term which is independent and identically distributed with E(i)=0 and VAR(i)=2

To estimate the unknown parameters of a nonlinear model, an objective function is specified and the optimal value of b is computed by solving a system of nonlinear equations. If the distribution of the random error term is unknown, then nonlinear least squares is used to estimate b. The least squares technique finds the values of b which minimize the sum of squared deviations.



Minimization is carried out by an iterative procedure since the system of nonlinear equations does not have an explicit solution.

If the distribution of the disturbance is known, then is estimated using a maximum likelihood approach, where ï the likelihood function is based on the joint probability density of the sample.

For example, if we can assume that

e ~ N(0, )

then the likelihood function may be expressed as


which estimated b and o for b' and 0' and are chosen such that


for all possible and if is unknown.

Despite the fact that complex nonlinear models are becoming more common in economic analysis as well as in other disciplines, their results can easily be misinterpreted. Two common problems associated with complex nonlinear models are the identification of parameters and the interpretation of these parameters in terms of the response of the dependent variable to particular explanatory variables. The problem of identifying specific parameters becomes irrelevant in the context of universal approximators since many alternative universal approximators with different parameter specifications are shown to converge asymptotically to the appropriate generating function. This makes the identification of individual parameters inconsequential. However, the determination and interpretation of the derivative of the dependent variables with respect to each of the explanatory variables requires further analysis.

INTERPRETING COMPLEX NONLINEAR MODELS

In the case of simple linear regression, the interpretation of estimated parameters is straightforward. The linear regression model, which assumes the functional form,
y = X+,

produces estimates of which are easily interpreted as


In other words, the parameter estimates of the linear regression model are the partial derivatives of the dependent variables with respect to each of the independent variables and can be interpreted as the marginal impact of a unit change in as independent variable on the dependent variable.

However, when dealing with a nonlinear model of the functional form

the interpretation of the coefficients is not as straightforward because the derivative of with respect to x varies with the values of x. That is,


which implies that the estimated parameters do not represent the marginal effects of a change in x on y.

The most common method used to determine the impact of a unit change in one of the explanatory variables on the dependent variable is to approximate the relationship by evaluating

However, such attempts at approximation may creates an artificial and often meaningless interpretation. First of all, the mean of the sample may not represent any particular individual or group in the data and may therefore produce misleading implications for policy analysis. For example, if we are interested in the impact of an additional year of education on the wage rate, it may be the case that an additional year of education may have little impact on the wage rates of five sixths of the population but may have ï large impact on the remaining one sixth. Although the average impact may seem significant. it may be insignificant for a large portion of the population and be of importance to only a small group. This distributional impact is not captured by the traditional methods and therefore does not always provide adequate information about the sample.

Secondly. in the case of models which use discrete or dummy variables as explanatory variables, partial derivatives can only be interpreted as Dirac derivatives. Therefore, an alternative approach is necessary.

AN ALTERNATIVE APPROACH

An alternative approach in the interpretation on nonlinear models uses each individual observation and calculates the changes in the predicted value of the dependent variables

brought about by a unit change in the explanatory variable of interest. For linear models, the change in the predicted value would be equal for all observations, but for a nonlinear model, the change in the predicted value is conditional on all other variables for that observation. Therefore, the impact that each explanatory variable has on the dependent variable differs from observation to observation. The subsequent changes in the predicted value brought about by a change in one of the explanatory variables for all observations may be sorted in ascending order

and then plotted to demonstrate the distributional impact of that explanatory variable on the predicted value of the dependent variable.

The remainder of this paper demonstrates this new approach using SAS software for a logistic model of cancer remission, the CES production function, and a neural network model for a nonlinear production function.

Example: Logit Model of Cancer Remission

Lee(1974) estimated the probability of remission in cancer patients based on a set of relevant patient characteristics. We have replicated Lee'e experiment to illustrate how a change in each of the characteristic variables may affect the probability of cancer remission. We begin by estimating the predicted probabilities of the logit model which are of the form


by using the PROC LOGISTIC command in SAS and storing the estimated coefficients and predicted probabilities into a new data file. Each characteristic variable is then increased by one unit while simultaneously holding all other characteristic variables constant at their observed values and the new predicted probability for each observation is calculated. The impact on the dependent variable brought about by each unit change is equal to the difference between the new predicted probabilities (PSTAR) and the original predicted probabilities (PROB), which were generated by the LOGISTIC procedure. The following SAS statements are used to compute the distributional impact on the dependent variable.

ARRAY VARIABLE CELL SMEAR INFIL LI BLAST TEMP;

ARRAY LOGSTAR LOGSTAR1-LOGSTAR6;

ARRAY PSTAR PSTAR1-PSTAR6; correspondents to the number of explanatory variables;

ARRAY PDIFF PDIFF1-PDIFF6;

represents the impact of an independent unit change in each of the explanatory variables on the probability of cancer remission;

ARRAY POUSONE CELL SMEAR INFIL LI BLAST TEMP;

DO OVER PLUSONE;

DO OVER PSI'AR;

PLUSONE=I+VARIABLE;

LOGSTAR=BO+B1*CELL+B2*SMEAR+

B3*INFIL+B4*LI+B5*BLAST+B68TEMP;

PSTAR=(EXP(LOGSTAR))/(1+EXP(LOGSTAR));

PDIFF=PSTAR-PROB;

PLUSONE=VARIABLE-1;

END;

END;

The PDIFFs. which represent

.......... may then be sorted in ascending order and plotted using SAS/GRAPG. Figures 1 and 2 illustrate the impact of a unit increase in CELL and TEMP respectively. In both of these graphs, we see that there is a substantial impact on most of the group. It is also interesting to note that the impact of a unit change in these two explanatory variables on the probability of cancer remission produces a sigmoidal shape, which is characteristic of the logit function, despite the fact that the explanatory variable varies across observations and that the YDIFF variable. This suggests that the shape which the impact variable assumes, which is monotonically by design, is influenced by the structure of the model being estimated.

Example 2: CES Production Function

The CES production function to be estimated can be expressed as

The data consists of 309 observations and was taken from Lutkepohl (1980). The PROC NLIN statement, using the Gauss-Newton estimation method, produced the following estimates of b0, b1, b2, and b3:

B0= 0.124485105

B1=-0.336341238

B2= 0.663292542

B3=-3.010617415

The same SAS procedure used in Example 1 was followed to compute the marginal impact of labor and capital on output, that is,

Figures 3 and 4 show the impact of each factor of production. Both figures show a great degree of disparity across the observations. There was a very small impact on output when capital and labor were marginally increased for a large portion of the sample while a small group experienced a large impact. The next example uses the same production data to estimate the impact of each factor of production on output using a different model specification.

Example 3: Neural Networks and Production

In this example, we use a weighted linear combination of the nonlinear logistic function to estimate a production model using the same data in example 2.

The neural network model to be optimized can be expressed as

where k is the number of nodes in the neural network and a and b are weights which initially define the network.

More specifically, the network used in this example is

Once the model was estimated using the SAS NLIN procedure, we followed the same procedures as in the previous two examples to produce figures 5 and 6. Whereas for the CES production function model the impact of labor and capital on output produced a concave function in which most of the group experienced no impact, the neural network production model shows that a vast majority of the sample experiences some impact when labor and capital are marginally increased.

Despite the fact that the same data set was used in examples 2 and 3, we show that the models produce very different results with respect to the deviates. If the means of the variables were used to evaluated the marginal impacts, that is,

the most likely the results would not have been able to detect a difference in the distributional effects of the CES and neural network models of production.

CONCLUSION

In all three of the examples provided, the results are influenced by the model structure. For the logit model, the impact variable (PDIFF) retained a sigmoidal structure characteristic of the logit model and examples 2 and 3 produced opposite results. This reinforces the idea that the distributional effects of the impact of the explanatory variables on the dependent variable can be of great use in econometric practice. Not only is this approach interesting for policy analysis that attempt to reach a variety of people, especially in issues of social justice, but the results can also offer insight into model design and structure. The fact that the CES and neural network models produced opposite results suggests that perhaps neither of these models represent the appropriate structure for estimating output. Therefore. we may wish to choose a universal approximator to identify specific Parameters. Nonparametric versions of neural network models can sometimes be formulated as universal approximators which have the ability to asymptotically approximate any function and all of its derivatives.