xtdpdml - Dynamic Panel Data Models

Richard Williams, Notre Dame Sociology

Dynamic Panel Data Modeling using Maximum Likelihood

Paul D. Allison, University of Pennsylvania (allison@statisticalhorizons.com)
Enrique Moral-Benito, Banco de Espana, Madrid (enrique.moral@gmail.com)
Richard Williams, University of Notre Dame (rwilliam@nd.edu)

Overview. Paul Allison, Enrique Moral-Benito, and Richard Williams are currently working on a project entitled "Dynamic Panel Data Modeling using Maximum Likelihood." Panel data have many advantages when trying to make causal inferences but can also be difficult to work with. We show that ML provides an alternative to widely used GMM methods such as Arellano-Bond and is superior in many cases. We have prepared a Stata command called xtdpdml that greatly simplifies the process of estimating our models. This page lists the materials that are currently available.

Description. Panel data make it possible both to control for unobserved confounders and to include lagged, endogenous regressors. Trying to do both at the same time, however, leads to serious estimation difficulties. In the econometric literature, these problems have been solved by using lagged instrumental variables together with the generalized method of moments (GMM). In Stata, commands such as xtabond and xtdpdsys have been used for these models. xtdpdml addresses the same problems via maximum likelihood estimation implemented with Stata's structural equation modeling (sem) command. The ML (sem) method is substantially more efficient than the GMM method when the normality assumption is met and suffers less from finite sample biases. xtdpdml greatly simplifies the SEM model specification process; makes it possible to test and relax many of the constraints that are typically embodied in dynamic panel models; unlike most related methods, allows for the inclusion of time-invariant variables in the model; takes advantage of Stata's ability to use full information maximum likelihood (FIML) for dealing with missing data; provides an overall goodness of fit measure by default and provides easy access to others; and can also generate code for use with Mplus.

Accessing the command. xtdpdml has now been officially released on SSC! If you have previously installed the command from this page, uninstall it first:

ado uninstall xtdpdml

To install the official version, type

ssc install xtdpdml, replace

Beta versions may continue to be released on this page before they are sent to SSC.

Main Suggested Readings:

Allison, Paul D., Richard Williams and Enrique Moral-Benito. 2017. Maximum Likelihood for Cross-lagged Panel Models with Fixed Effects. Socius: 3: 1-17.

Also available at http://journals.sagepub.com/doi/suppl/10.1177/2378023117710578.

Abstract: Panel data make it possible both to control for unobserved confounders and allow for lagged, reciprocal causation. Trying to do both at the same time, however, leads to serious estimation difficulties. In the econometric literature, these problems have been solved by using lagged instrumental variables together with the generalized method of moments (GMM). Here we show that the same problems can be solved by maximum likelihood (ML) estimation implemented with standard software packages for structural equation modeling (SEM). Monte Carlo simulations show that the ML-SEM method is less biased and more efficient than the GMM method under a wide range of conditions. ML-SEM also makes it possible to test and relax many of the constraints that are typically embodied in dynamic panel models.

NOTE: Jacob Long pointed out that the R code in the Socius article (which uses an alternative parameterization of the model) produces slightly different results than what we present in the paper. The revised code lav_Socius.R now produces the same results as do Stata and SAS. The output is shown in lav_Socius.Rout. The lavaan option in xtdpdml (added after the Socius paper was written) can easily produce the code needed for R. For an example, see the Stata program lav_Socius.do.

Williams, Richard, Paul D. Allison and Enrique Moral-Benito. (Last revised June 1, 2018. The final version is in The Stata Journal Volume 18 Number 2: pp. 293-326 "Linear Dynamic Panel-Data Estimation using Maximum Likelihood and Structural Equation Modeling." This paper focuses on how to use the xtdpdml command. These slides (PDF and Powerpoint) summarize the main points of the paper.

Abstract: Panel data make it possible both to control for unobserved confounders and to include lagged, endogenous regressors. Trying to do both at the same time, however, leads to serious estimation difficulties. In the econometric literature, these problems have been addressed by using lagged instrumental variables together with the generalized method of moments (GMM), while in sociology the same problems have been dealt with via maximum likelihood estimation and structural equation modeling. While both approaches have merit, we show that the ML-SEM method is substantially more efficient than the GMM method when the normality assumption is met, and it also suffers less from finite sample biases. We introduce a command named xtdpdml with syntax similar to other Stata commands for linear dynamic panel-data estimation. xtdpdml greatly simplifies the SEM model specification process; makes it possible to test and relax many of the constraints that are typically embodied in dynamic panel models; allows for the inclusion of time-invariant variables in the model, unlike most related methods; and takes advantage of Stata's ability to use full information maximum likelihood for dealing with missing data. The strengths and advantages of xtdpdml are illustrated via examples from both economics and sociology.

You can replicate the analysis using xtdpdml_examples.do You can see all of the output in xtdpdml_examples.txt.

Moral-Benito, Enrique, Paul D. Allison and Richard Williams. (Last revised February 13, 2018. The final version is in Applied Economics. Published online November 3, 2018.) "Dynamic Panel Data Modeling using Maximum Likelihood: An Alternative to Arellano-Bond". Note: This paper is a more technical discussion of the underlying statistical theory. Also here are the slides for an earlier version of the paper that was presented at the October 2016 Spanish Stata Users Group meetings

Abstract: The Arellano and Bond (1991) estimator is widely-used among applied researchers when estimating dynamic panels with fixed effects and predetermined regressors. This estimator might behave poorly in finite samples when the cross-section dimension of the data is small (i.e. small N), especially if the variables under analysis are persistent over time. This paper discusses a maximum likelihood estimator that is asymptotically equivalent to Arellano and Bond (1991) but presents better finite sample behavior. The estimator is based on an alternative parameterization of the likelihood function introduced in Moral-Benito (2013). Moreover, it is easy to implement in Stata using the xtdpdml command as described in the companion paper Williams et al. (2018), which also discusses further advantages of the proposed estimator for practitioners.

You can replicate most of the analysis in the paper with these do files (Tables 1 & 2, Tables 3 and 4, Table 5) along with panel1.dta.

Other Suggested Readings by the authors.

Allison, Paul. 2015. "Don't Put Lagged Dependent Variables in Mixed Models." http://statisticalhorizons.com/lagged-dependent-variables

Moral-Benito, Enrique. 2013. "Likelihood-based Estimation of Dynamic Panels with Predetermined Regressors." Journal of Business and Economic Statistics 31:4, 451-472.

FAQS

Williams, Richard. Last revised July 14, 2016. Multiple Imputation & fiml with xtdpdml. Briefly outlines procedures for using MI and fiml with xtdpml.

Williams, Richard, Enrique Moral-Benito and Paul D. Allison. Last revised December 11, 2016. Dealing with non-normality in xtdpdml. By default, xtdpdml assumes variables have a multivariate normal distribution. This short note discusses the strengths and weaknesses of three different approaches that relax that assumption.

Williams, Richard. Last Revised October 20, 2021. Using Survey Weights with xtdpml. Survey weights are not officially supported by xtdpml, so use them at your own risk. However, this FAQ describes BETA procedures that seem to have worked fine in the limited testing we have done.

Other Suggested Readings.

Ahn, S. C. and Peter Schmidt (1995) "Efficient Estimation of Models for Dynamic Panel Data." Journal of Econometrics 68: 5-27.

Arellano, M. and S. Bond (1991) "Some tests of specification for panel data: Monte Carlo evidence and an application to employment equations." The Review of Economic Studies 58: 277-297.

Bai, Jushan (2013). "Fixed effects dynamic panel data models, a factor analytical approach." Econometrica 81 (1): 285-314.

Baltagi, Badi H. (2013), Econometric Analysis of Panel Data. Fifth Edition. New York: John Wiley & Sons.

Bollen, Kenneth, and Jennie Brand. 2010. "A General Panel Model with Random and Fixed Effects: A Structural Equations Approach." Social Forces 89:1, 1-34. (NOTE: Many/most of the Bollen and Brand models are a special case of the models that can be estimated with xtdpdml. xtdpdml makes it far easier to specify their models. This paper may be of special interest to Sociologists and other non-economists. Several of their models can be replicated using this code. Here is the corresponding log file. If you have both Stata and Mplus 7.4, this code should run much much faster. Read the cautions in the do files about being in the right directory and not overwriting files accidentally. If you don't have Mplus or don't want to run it the output files are re1.out, re2.out, re3.out, fe1.out, fe2.out, fe3.out, m5b2.out, and m5b3.out.)

Hsiao, Cheng (2014) Analysis of Panel Data. Third Edition. London: Cambridge University Press.

Hsiao, C., M. H. Pesaran, and A. K. Tahmiscioglu. 2002. Maximum likelihood estimation of fixed effects dynamic panel data models covering short time periods. Journal of Econometrics 109: 107-150.

Kripfganz, S. 2015. xtdpdqml: Quasi-Maximum Likelihood Estimation of Linear Dynamic Panel Data Models in Stata. Manuscript. Goethe University Frankfurt. http://www.kripfganz.de

Leszczensky, Lars and Tobias Wolbring. 2018. How to Deal With Reverse Causality Using Panel Data? Recommendations for Researchers Based on a Simulation Study. This working paper uses simulations to compare ML-SEM with several other methods, and the ML-SEM methods winds up looking pretty good.

Wooldridge, Jeffrey M. (2010) Econometric Analysis of Cross Section and Panel Data. Cambridge, MA: MIT Press.

Dynamic Panel Data Modeling using Maximum Likelihood

Paul D. Allison, University of Pennsylvania (allison@statisticalhorizons.com) Enrique Moral-Benito, Banco de Espana, Madrid (enrique.moral@gmail.com) Richard Williams, University of Notre Dame (rwilliam@nd.edu)

Paul D. Allison, University of Pennsylvania (allison@statisticalhorizons.com)
Enrique Moral-Benito, Banco de Espana, Madrid (enrique.moral@gmail.com)
Richard Williams, University of Notre Dame (rwilliam@nd.edu)