Dynamic Panel Data Modeling using Maximum Likelihood
Paul D. Allison, University of Pennsylvania (firstname.lastname@example.org)
Enrique Moral-Benito, Banco de Espana, Madrid (email@example.com)
Richard Williams, University of Notre Dame (firstname.lastname@example.org)
Overview. Paul Allison, Enrique Moral-Benito, and Richard Williams are currently working on a project entitled "Dynamic Panel Data Modeling using Maximum Likelihood." Panel data have many advantages when trying to make causal inferences but can also be difficult to work with. We show that ML provides an alternative to widely used GMM methods such as Arellano-Bond and is superior in many cases. We have prepared a Stata command called xtdpdml that greatly simplifies the process of estimating our models. This page lists the materials that are currently available.
Description. Panel data make it possible both to control for unobserved confounders and to include lagged, endogenous regressors. Trying to do both at the same time, however, leads to serious estimation difficulties. In the econometric literature, these problems have been solved by using lagged instrumental variables together with the generalized method of moments (GMM). In Stata, commands such as xtabond and xtdpdsys have been used for these models. xtdpdml addresses the same problems via maximum likelihood estimation implemented with Stata's structural equation modeling (sem) command. The ML (sem) method is substantially more efficient than the GMM method when the normality assumption is met and suffers less from finite sample biases. xtdpdml greatly simplifies the SEM model specification process; makes it possible to test and relax many of the constraints that are typically embodied in dynamic panel models; unlike most related methods, allows for the inclusion of time-invariant variables in the model; takes advantage of Stata's ability to use full information maximum likelihood (FIML) for dealing with missing data; provides an overall goodness of fit measure by default and provides easy access to others; and can also generate code for use with Mplus.
Accessing the command. xtdpdml has now been officially released on SSC! If you have previously installed the command from this page, uninstall it first:
ado uninstall xtdpdml
To install the official version, type
ssc install xtdpdml, replace
Beta versions may continue to be released on this page before they are sent to SSC.
Main Suggested Readings:
Allison, Paul D., Richard Williams and Enrique Moral-Benito. (A revised version is forthcoming in Socius and will be posted here once it is available.) Maximum Likelihood for Cross-Lagged Panel Models with Fixed Effects.
Abstract: Panel data make it possible both to control for unobserved confounders and to allow for lagged, reciprocal causation. Trying to do both at the same time, however, leads to serious estimation difficulties. In the econometric literature, these problems have been solved by using lagged instrumental variables together with the generalized method of moments (GMM). Here we show that the same problems can be solved by maximum likelihood estimation implemented with standard software packages for structural equation modeling (SEM). Monte Carlo simulations show that the ML-SEM method is less biased and more efficient than the GMM method under a wide range of conditions. ML-SEM also makes it possible to test and relax many of the constraints that are typically embodied in dynamic panel models.
Williams, Richard, Paul D. Allison and Enrique Moral-Benito. (In progress; last revised November 21, 2016. "xtdpdml: Linear Dynamic Panel-Data Estimation using Maximum Likelihood and Structural Equation Modeling." Note: This paper focuses on how to use the xtdpdml command. Also here are the slides (PDF and Powerpoint) for an earlier version of the paper that was presented at the 2015 Stata Users Conference in Columbus, Ohio.
Abstract: Panel data make it possible both to control for unobserved confounders and to include lagged, endogenous regressors. Trying to do both at the same time, however, leads to serious estimation difficulties. In the econometric literature, these problems have been addressed by using lagged instrumental variables together with the generalized method of moments (GMM), while in Sociology the same problems have been dealt with via maximum likelihood estimation and Structural Equation Modeling. While both approaches have merit, we show that the ML (SEM) method is substantially more efficient than the GMM method when the normality assumption is met and suffers less from finite sample biases. We introduce a command named xtdpdml with syntax similar to other Stata commands for linear dynamic panel-data estimation. xtdpdml greatly simplifies the SEM model specification process; makes it possible to test and relax many of the constraints that are typically embodied in dynamic panel models; allows for the inclusion of time-invariant variables in the model, unlike most related methods; and takes advantage of Stata’s ability to use full information maximum likelihood (FIML) for dealing with missing data. The strengths and advantages of xtdpdml are illustrated via examples from both Economics and Sociology.
You can replicate the analysis using xtdpdml_examples.do and xtdpdml_simul.ado (you may not want to run the simulations as is because they are incredibly time consuming -- only doing 500 replications should get you close enough). You can see all of the output in xtdpdml_examples.txt.
Moral-Benito, Enrique, Paul D. Allison and Richard Williams. (In progress; last revised January 7, 2017). “Dynamic Panel Data Modeling using Maximum Likelihood: An Alternative to Arellano-Bond”. Note: This paper is a more technical discussion of the underlying statistical theory. Also here are the slides for an earlier version of the paper that was presented at the October 2016 Spanish Stata Users Group meetings
Abstract: The Arellano and Bond (1991) estimator is widely-used among applied researchers when estimating dynamic panels with fixed effects and predetermined regressors. This estimator might behave poorly in finite samples when the cross-section dimension of the data is small (i.e. small N), especially if the variables under analysis are persistent over time. This paper discusses a maximum likelihood estimator that is asymptotically equivalent to Arellano and Bond (1991) but presents better finite sample behavior. Moreover, the estimator is easy to implement in Stata using the xtdpdml command as described in Williams et al. (2016).
You can replicate most of the analysis in the paper with these do files (Tables 1 & 2, Tables 3 and 4, Table 5) along with panel1.dta.
Other Suggested Readings by the authors.
Allison, Paul. 2015. "Don't Put Lagged Dependent Variables in Mixed Models." http://statisticalhorizons.com/lagged-dependent-variables
Moral-Benito, Enrique. 2013. "Likelihood-based Estimation of Dynamic Panels with Predetermined Regressors." Journal of Business and Economic Statistics 31:4, 451-472.
Williams, Richard. Last revised July 14, 2016. Multiple Imputation & fiml with xtdpdml. Briefly outlines procedures for using MI and fiml with xtdpml.
Williams, Richard, Enrique Moral-Benito and Paul D. Allison. Last revised December 11, 2016. Dealing with non-normality in xtdpdml. By default, xtdpdml assumes variables have a multivariate normal distribution. This short note discusses the strengths and weaknesses of three different approaches that relax that assumption.
Other Suggested Readings.
Ahn, S. C. and Peter Schmidt (1995) “Efficient Estimation of Models for Dynamic Panel Data.” Journal of Econometrics 68: 5-27.
Arellano, M. and S. Bond (1991) “Some tests of specification for panel data: Monte Carlo evidence and an application to employment equations.” The Review of Economic Studies 58: 277-297.
Bai, Jushan (2013). “Fixed effects dynamic panel data models, a factor analytical approach.” Econometrica 81 (1): 285-314.
Baltagi, Badi H. (2013), Econometric Analysis of Panel Data. Fifth Edition. New York: John Wiley & Sons.
Bollen, Kenneth, and Jennie Brand. 2010. "A General Panel Model with Random and Fixed Effects: A Structural Equations Approach." Social Forces 89:1, 1-34. (NOTE: Many/most of the Bollen and Brand models are a special case of the models that can be estimated with xtdpdml. xtdpdml makes it far easier to specify their models. This paper may be of special interest to Sociologists and other non-economists. Several of their models can be replicated using this code. Here is the corresponding log file. If you have both Stata and Mplus 7.4, this code should run much much faster. Read the cautions in the do files about being in the right directory and not overwriting files accidentally. If you don't have Mplus or don't want to run it the output files are re1.out, re2.out, re3.out, fe1.out, fe2.out, fe3.out, m5b2.out, and m5b3.out.)
Hsiao, Cheng (2014) Analysis of Panel Data. Third Edition. London: Cambridge University Press.
Hsiao, C., M. H. Pesaran, and A. K. Tahmiscioglu. 2002. Maximum likelihood estimation of fixed effects dynamic panel data models covering short time periods. Journal of Econometrics 109: 107-150.
Kripfganz, S. 2015. xtdpdqml: Quasi-Maximum Likelihood Estimation of Linear Dynamic Panel Data Models in Stata. Manuscript. Goethe University Frankfurt. http://www.kripfganz.de
Wooldridge, Jeffrey M. (2010) Econometric Analysis of Cross Section and Panel Data. Cambridge, MA: MIT Press.