Hostname: page-component-7c8c6479df-fqc5m Total loading time: 0 Render date: 2024-03-28T10:07:25.368Z Has data issue: false hasContentIssue false

EDA for HLM: Visualization when Probabilistic Inference Fails

Published online by Cambridge University Press:  04 January 2017

Jake Bowers*
Affiliation:
Department of Political Science, Center for Political Studies, University of Michigan, Ann Arbor, MI 48109
Katherine W. Drake
Affiliation:
Department of Political Science, Center for Political Studies, University of Michigan, Ann Arbor, MI 48109. e-mail: kwdrake@umich.edu
*
e-mail: jwbowers@umich.edu (corresponding author)

Abstract

Nearly all hierarchical linear models presented to political science audiences are estimated using maximum likelihood under a repeated sampling interpretation of the results of hypothesis tests. Maximum likelihood estimators have excellent asymptotic properties but less than ideal small sample properties. Multilevel models common in political science have relatively large samples of units like individuals nested within relatively small samples of units like countries. Often these level-2 samples will be so small as to make inference about level-2 effects uninterpretable in the likelihood framework from which they were estimated. When analysts do not have enough data to make a compelling argument for repeated sampling based probabilistic inference, we show how visualization can be a useful way of allowing scientific progress to continue despite lack of fit between research design and asymptotic properties of maximum likelihood estimators.

Somewhere along the line in the teaching of statistics in the social sciences, the importance of good judgment got lost amid the minutiae of null hypothesis testing. It is all right, indeed essential, to argue flexibly and in detail for a particular case when you use statistics. Data analysis should not be pointlessly formal. It should make an interesting claim; it should tell a story that an informed audience will care about, and it should do so by intelligent interpretation of appropriate evidence from empirical measurements or observations.

—Abelson, 1995, p. 2

With neither prior mathematical theory nor intensive prior investigation of the data, throwing half a dozen or more exogenous variables into a regression, probit, or novel maximum-likelihood estimator is pointless. No one knows how they are interrelated, and the high-dimensional parameter space will generate a shimmering pseudo-fit like a bright coat of paint on a boat's rotting hull.

—Achen, 1999, p. 26

Type
Research Article
Copyright
Copyright © The Author 2005. Published by Oxford University Press on behalf of the Society for Political Methodology 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Footnotes

Authors' note: We owe many thanks to James Bowers, Andrew Gelman, Orit Kedar, Burt Monroe, Kevin Quinn, Phil Shively, Laura Stoker, Cara Wong, and the anonymous reviewers of the first version of our manuscript for their many helpful comments.

References

Abelson, Robert. 1995. Statistics as Principled Argument. New York: Lawrence Erlbaum.Google Scholar
Achen, Christopher H. 1999. “Warren miller and the future of political data analysis.” Political Analysis 8: 142146.Google Scholar
Achen, Christopher H., and Shively, W. P. 1995. Cross-Level Inference. University of Chicago Press, Chicago.Google Scholar
Becker, R. A., Cleveland, W. S., and Shyu, M. J. 1996. “The visual design and control of Trellis Display.” Journal of Computational and Statistical Graphics 5: 123155. (Available from www.stat.purdue.edu/wsc/papers/trellis.design.control.ps.)Google Scholar
Bowers, Jake, and Ensley, Michael. 2003. “Issues in Analyzing Data from the Dual-Mode 2000 American National Election Study.” Technical Report. Ann Arbor, MI: National Election Studies.Google Scholar
Brady, Henry, and Seawright, Jason. 2004. “Framing social inquiry: From Models of Causation to Statistically Based Causal Inference.” Working paper.Google Scholar
Buja, Andreas, and Cook, Dianne. 1999. “Inference for Data Visualization.” Presented at the Joint Statistics Meetings, August 1999. Baltimore, MD. (Available from www-stat.wharton.upenn.edu/buja/PAPERS/jsm99.ps.gz.)Google Scholar
Burns, Nancy, Schlozman, Kay L., and Verba, Sidney. 2001. The Private Roots of Public Action: Gender, Equality, and Political Participation. Cambridge, MA: Harvard University Press.CrossRefGoogle Scholar
Cleveland, William S. 1993. Visualizing Data. Summit, NJ: Hobart.Google Scholar
Davidson, Russell, and MacKinnon, James G. 1993. Estimation and Inference in Econometrics. New York: Oxford University Press.Google Scholar
Fox, John. 1997. Applied Regression Analysis, Linear Models, and Related Methods. Thousand Oaks, CA: Sage.Google Scholar
Gelman, Andrew. 2003. “A Bayesian Formulation of Exploratory Data Analysis and Goodness-of-Fit Testing.” International Statistical Review 71: 369382.CrossRefGoogle Scholar
Gelman, Andrew. 2004. “Exploratory Data Analysis for Complex Models (with Discussion by Andreas Buja and Rejoinder).” Journal of Computational and Graphical Statistics 13: 755787.CrossRefGoogle Scholar
Gelman, Andrew, Carlin, John B., Stern, Hal S., and Rubin, Donald B. 2004. Bayesian Data Analysis, 2nd ed. Boca Raton, FL: Chapman and Hall/CRC.Google Scholar
Gelman, Andrew, Pasarica, Cristian, and Dodhia, Rahul. 2002. “Let's Practice What We Preach: Turning Tables into Graphs.” Statistical Computing and Graphics 56: 121130.Google Scholar
Gill, Jeff. 2002. Bayesian Methods: A Social and Behavioral Sciences Approach. Boca Raton, FL: Chapman and Hall/CRC.CrossRefGoogle Scholar
Goldstein, H. 1999. Multilevel Statistical Models. London: Edward Arnold.Google Scholar
Greene, William H. 2002. Econometric Analysis, 5th ed. Upper Saddle River, NJ: Prentice Hall.Google Scholar
Holland, Paul W. 1986. “Statistics and Causal Inference.” Journal of the American Statistical Association, 81: 945960.CrossRefGoogle Scholar
Hox, J. J., and Maas, C. J. M. 2002. Sample Sizes for Multilevel Modeling. In Social Science Methodology in the New Millennium. Proceedings of the Fifth International Conference on Logic and Methodology, eds. Blasius, J., Hox, J., de Leeuw, E., and Schmidt, P. Opladen, Germany: Leske + Budrich Verlag.Google Scholar
Huckfeldt, R. R. 1979. “Political Participation And The Neighborhood Social Context.” American Journal of Political Science 23: 579592.CrossRefGoogle Scholar
Jackman, Simon. 2004. “Bayesian Analysis for Political Research.” Annual Review of Political Science 7: 483505.CrossRefGoogle Scholar
King, Gary. 1989. Unifying Political Methodology: The Likelihood Theory of Statistical Inference. New York: Cambridge University Press.Google Scholar
Kreft, Ita. 1996. “Are Multilevel Techniques Necessary? An Overview, Including Simulation Studies.” Unpublished manuscript.Google Scholar
Kreft, I., and Leeuw, J. D. 1998. Introducing Multilevel Modeling. London: Sage.CrossRefGoogle Scholar
Langford, Ian H., and Lewis, Toby. 1998. “Outliers in Multilevel Data.” Journal of the Royal Statistical Society A 161: 121160.CrossRefGoogle Scholar
Leisch, Friedrich. 2002. Dynamic Generation of Statistical Reports Using Literate Data Analysis. In Compstat 2002—Proceedings in Computational Statistics, eds. Haerdle, W. and Roenz, B. Heidelberg, Germany: Physika Verlag, pp. 575580.Google Scholar
Leisch, Friedrich. 2005. “Sweave User Manual.” (Available from www.ci.tuwien.ac.at/leisch/Sweave.)Google Scholar
Lewis, Jeffrey B., and Linzer, Drew A. 2005. “Estimating Regression Models in Which the Dependent Variable Is Based on Estimates.” Political Analysis. doi:10.1093/pan/mpi026.CrossRefGoogle Scholar
Longford, N. T. 1993. Random Coefficient Models. Oxford: Clarendon.Google Scholar
Maas, Cora J.M., and Hox, Joop J. 2002. “Robustness of Multilevel Parameter Estimates against Small Sample Sizes.” In Social Science Methodology in the New Millennium, eds. Blasius, J., Hox, J., de Leeuw, E., and Schmidt, P. Opladen, Germany: Leske + Budrich. (Available from www.fss.uu.nl/ms/jh/papers/p090101.pdf.)Google Scholar
Maas, Cora J.M., and Hox, Joop J. 2004. “Robustness Issues in Multilevel Regression Analysis.” Statistica Neerlandica 58: 127137.CrossRefGoogle Scholar
McCulloch, Charles E., and Searle, Shayle R. 2001. Generalized, Linear, and Mixed Models. New York:JohnWiley and Sons.Google Scholar
Mundlak, Yair. 1978. “On the Pooling of Time Series and Cross Section Data.” Econometrica 46: 6985.CrossRefGoogle Scholar
Nie, Norman, Junn, Jane, and Barry, Kenneth S. 1996. Education and Democratic Citizenship in America. Chicago: University of Chicago Press.Google Scholar
Pinheiro, José C., and Bates, Douglas M. 2000. Mixed-Effects Models in S and S-PLUS. New York: Springer-Verlag.CrossRefGoogle Scholar
R Development Core Team. 2005. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. (Available from www.R-project.org.)Google Scholar
Raudenbush, Stephen W., and Bryk, Anthony S. 2002. Hierarchical Linear Models: Applications and Data Analysis Methods, 2nd ed. Thousand Oaks, CA: Sage.Google Scholar
Rosenbaum, Paul R. 2002. Observational Studies. New York: Springer.CrossRefGoogle Scholar
Rosenstone, Steven, and Hansen, John M. 1993. Mobilization, Participation and Democracy in America. New York: MacMillan.Google Scholar
Rubin, Donald B. 1991. “Practical Implications of Modes of Statistical Inference for Causal Effects and the Critical Role of the Assignment Mechanism.” Biometrics 47: 12131234.CrossRefGoogle ScholarPubMed
Sarkar, Deepayan. 2005. Lattice: Lattice Graphics. R Foundation for Statistical Computing [producer and distributor]. (Available from http://cran.r-project.org/src/contrib/Descriptions.lattice.html.)Google Scholar
Singer, Judith D., and Willett, John B. 2003. Applied Longitudinal Data Analysis: Modeling Change and Event Occurrence. New York: Oxford University Press.CrossRefGoogle Scholar
Snijders, T., and Bosker, R. 1999. Multilevel Modeling: An Introduction to Basic and Advanced Multilevel Modeling. London: Sage.Google Scholar
Steenbergen, Marco R., and Jones, Bradford S. 2002. “Modeling Multilevel Data Structures.” American Journal of Political Science 46: 218237.CrossRefGoogle Scholar
Stoker, Laura, and Bowers, Jake. 2002a. “Designing Multi-level Studies: Sampling Voters and Electoral Contexts.” Electoral Studies, 21: 235267.CrossRefGoogle Scholar
Stoker, L., and Bowers, J. 2002b. “Erratum to ‘Designing Multi-level Studies: Sampling Voters and Electoral Contexts’.” Electoral Studies 21: 535536.CrossRefGoogle Scholar
Tenn, Stephen. 2005. “An Alternative Measure of Relative Education to Explain Voter Turnout.” Journal of Politics 67: 271282.CrossRefGoogle Scholar
Tufte, Edward. 1983. The Visual Display of Quantative Information. Cheshire, CT: Graphics.Google Scholar
Tufte, Edward. 1990. Envisioning Information. Cheshire, CT: Graphics.Google Scholar
Tufte, Edward. 2003. The Cognitive Style of Powerpoint. Cheshire, CT: Graphics.Google Scholar
Tufte, Edward R. 1997. Visual Explanations: Images and Quantities, Evidence and Narrative. Cheshire, CT: Graphics.Google Scholar
Tukey, John W. 1977. Exploratory Data Analysis. Reading, MA: Addison-Wesley.Google Scholar
Venables, W. N., and Ripley, B. D. 2002. Modern Applied Statistics with S-PLUS, 4th ed. New York: Springer.CrossRefGoogle Scholar
Verba, Sidney, Schlozman, Kay L., and Brady, Henry. 1995. Voice and Equality: Civic Voluntarism in American Politics. Cambridge: Harvard University Press.CrossRefGoogle Scholar
Wooldridge, Jeffrey M. 2002. Econometric Analysis of Cross Section and Panel Data. Cambridge, MA: MIT Press.Google Scholar
Yohai, V., Stahel, W. A., and Zamar, R. H. 1991. A Procedure for Robust Estimation and Inference in Linear Regression. In Directions in Robust Statistics and Diagnostics, Part II, eds. Stahel, W. A. and Weisberg, S. W. New York: Springer-Verlag.Google Scholar