Walking readers step by step through complex concepts, this book translates missing data techniques into something that applied researchers and graduate students can understand and utilize in their own research. Enders explains the rationale and procedural details for maximum likelihood estimation, Bayesian estimation, multiple imputation, and models for handling missing not at random (MNAR) data. Easy-to-follow examples and small simulated data sets illustrate the techniques and clarify the underlying principles. The companion website (www.appliedmissingdata.com) includes data files and syntax for the examples in the book as well as up-to-date information on software. The book is accessible to substantive researchers while providing a level of detail that will satisfy quantitative specialists.

This title is part of the Methodology in the Social Sciences Series, edited by Todd D. Little, PhD.

“The book is well written, and successfully achieves the goal, stated in the Preface, of 'translat[ing] the technical missing data literature into an accessible reference text' (p. vii) for the social sciences. The author successfully achieved the goal of helping the reader to become familiar with basic concepts in missing data analysis procedures, and to feel comfortable using these procedures in a variety of practical and social science applications. It contains very useful examples and illustrations in the applied social sciences. In addition, those example and illustration datasets and detailed software implementations are available on the book's website http://www.appliedmissingdata.com, which is invaluable.”

“This is a well-written book that will be particularly useful for analysts who are not PhD statisticians. Enders provides a much-needed overview and explication of the current technical literature on missing data. The book should become a popular text for applied methodologists.”

“Many applied researchers are not trained in statistics to the level that would make the classic sources on missing data accessible. Enders makes a concerted—and successful—attempt to convey the statistical concepts and models that define missing data methods in a way that does not assume high statistical literacy. He writes in a conceptually clear manner, often using a simple example or simulation to show how an equation or procedure works. This book is a refreshing addition to the literature for applied social researchers and graduate students doing quantitative data analysis. It covers the full range of state-of-the-art methods of handling missing data in a clear and accessible manner, making it an excellent supplement or text for a graduate course on advanced, but widely used, statistical methods.”

“A useful overview of missing data issues, with practical guidelines for making decisions about real-world data. This book is all about an issue that is usually ignored in work on OLS regression—but that most of us spend significant time dealing with. The writing is clear and accessible, a great success for a challenging topic. Enders provides useful reminders of what we need to know and why. I appreciated the interpretation of formulas, terms, and output. This book provides comprehensive and vital information in an easy-to-consume style. I learned a great deal reading it.”

“I would certainly recommend this book to anybody who deals with missing data at any level. I have no doubt that this book will serve as a solid reference for quantitative social and behavioral scientists.”

“The chapter on MNAR provides a good overview of the current state of the art. I would recommend it to anyone working with missing data, as well as to developers of multilevel and structural equation modeling software who are interested in adding new features, such as pattern mixture models. The focus is on the 'how-tos' of working with MNAR data. The author illustrates the many pitfalls and how different model assumptions could lead to different parameter estimates and standard error estimates, and hence to different conclusions.”

“I would highly recommend this book to colleagues and will require it in my advanced graduate courses on longitudinal data analysis.”

“The book contains very accessible material on missing data. I would recommend it to colleagues and students, especially those who do not have formal training in mathematical statistics.”

“A needed and valuable addition to the literature on missing data. The simulations are excellent and are a clear strength of the book.”

1.1 Introduction

1.2 Chapter Overview

1.3 Missing Data Patterns

1.4 A Conceptual Overview of Missing Data Theory

1.5 A More Formal Description of Missing Data Theory

1.6 Why Is the Missing Data Mechanism Important?

1.7 How Plausible Is the Missing at Random Mechanism?

1.8 An Inclusive Analysis Strategy

1.9 Testing the Missing Completely at Random Mechanism

1.10 Planned Missing Data Designs

1.11 The Three-Form Design

1.12 Planned Missing Data for Longitudinal Designs

1.13 Conducting Power Analyses for Planned Missing Data Designs

1.14 Data Analysis Example

1.15 Summary

1.16 Recommended Readings

2. Traditional Methods for Dealing with Missing Data

2.1 Chapter Overview

2.2 An Overview of Deletion Methods

2.3 Listwise Deletion

2.4 Pairwise Deletion

2.5 An Overview of Single Imputation Techniques

2.6 Arithmetic Mean Imputation

2.7 Regression Imputation

2.8 Stochastic Regression Imputation

2.9 Hot-Deck Imputation

2.10 Similar Response Pattern Imputation

2.11 Averaging the Available Items

2.12 Last Observation Carried Forward

2.13 An Illustrative Simulation Study

2.14 Summary

2.15 Recommended Readings

3. An Introduction to Maximum Likelihood Estimation

3.1 Chapter Overview

3.2 The Univariate Normal Distribution

3.3 The Sample Likelihood

3.4 The Log-Likelihood

3.5 Estimating Unknown Parameters

3.6 The Role of First Derivatives

3.7 Estimating Standard Errors

3.8 Maximum Likelihood Estimation with Multivariate Normal Data

3.9 A Bivariate Analysis Example

3.10 Iterative Optimization Algorithms

3.11 Significance Testing Using the Wald Statistic

3.12 The Likelihood Ratio Test Statistic

3.13 Should I Use the Wald Test or the Likelihood Ratio Statistic?

3.14 Data Analysis Example 1

3.15 Data Analysis Example 2

3.16 Summary

3.17 Recommended Readings

4. Maximum Likelihood Missing Data Handling

4.1 Chapter Overview

4.2 The Missing Data Log-Likelihood

4.3 How Do the Incomplete Data Records Improve Estimation?

4.4 An Illustrative Computer Simulation Study

4.5 Estimating Standard Errors with Missing Data

4.6 Observed Versus Expected Information

4.7 A Bivariate Analysis Example

4.8 An Illustrative Computer Simulation Study

4.9 An Overview of the EM Algorithm

4.10 A Detailed Description of the EM Algorithm

4.11 A Bivariate Analysis Example

4.12 Extending EM to Multivariate Data

4.13 Maximum Likelihood Software Options

4.14 Data Analysis Example 1

4.15 Data Analysis Example 2

4.16 Data Analysis Example 3

4.17 Data Analysis Example 4

4.18 Data Analysis Example 5

4.19 Summary

4.20 Recommended Readings

5. Improving the Accuracy of Maximum Likelihood Analyses

5.1 Chapter Overview

5.2 The Rationale for an Inclusive Analysis Strategy

5.3 An Illustrative Computer Simulation Study

5.4 Identifying a Set of Auxiliary Variables

5.5 Incorporating Auxiliary Variables Into a Maximum Likelihood Analysis

5.6 The Saturated Correlates Model

5.7 The Impact of Non-Normal Data

5.8 Robust Standard Errors

5.9 Bootstrap Standard Errors

5.10 The Rescaled Likelihood Ratio Test

5.11 Bootstrapping the Likelihood Ratio Statistic

5.12 Data Analysis Example 1

5.13 Data Analysis Example 2

5.14 Data Analysis Example 3

5.15 Summary

5.16 Recommended Readings

6. An Introduction to Bayesian Estimation

6.1 Chapter Overview

6.2 What Makes Bayesian Statistics Different?

6.3 A Conceptual Overview of Bayesian Estimation

6.4 Bayes’ Theorem

6.5 An Analysis Example

6.6 How Does Bayesian Estimation Apply to Multiple Imputation?

6.7 The Posterior Distribution of the Mean

6.8 The Posterior Distribution of the Variance

6.9 The Posterior Distribution of a Covariance Matrix

6.10 Summary

6.11 Recommended Readings

7. The Imputation Phase of Multiple Imputation

7.1 Chapter Overview

7.2 A Conceptual Description of the Imputation Phase

7.3 A Bayesian Description of the Imputation Phase

7.4 A Bivariate Analysis Example

7.5 Data Augmentation with Multivariate Data

7.6 Selecting Variables for Imputation

7.7 The Meaning of Convergence

7.8 Convergence Diagnostics

7.9 Time-Series Plots

7.10 Autocorrelation Function Plots

7.11 Assessing Convergence from Alternate Starting Values

7.12 Convergence Problems

7.13 Generating the Final Set of Imputations

7.14 How Many Data Sets Are Needed?

7.15 Summary

7.16 Recommended Readings

8. The Analysis and Pooling Phases of Multiple Imputation

8.1 Chapter Overview

8.2 The Analysis Phase

8.3 Combining Parameter Estimates in the Pooling Phase

8.4 Transforming Parameter Estimates Prior to Combining

8.5 Pooling Standard Errors

8.6 The Fraction of Missing Information and the Relative Increase in Variance

8.7 When Is Multiple Imputation Comparable to Maximum Likelihood?

8.8 An Illustrative Computer Simulation Study

8.9 Significance Testing Using the t Statistic

8.10 An Overview of Multiparameter Significance Tests

8.11 Testing Multiple Parameters Using the D1 Statistic

8.12 Testing Multiple Parameters by Combining Wald Tests

8.13 Testing Multiple Parameters by Combining Likelihood Ratio Statistics

8.14 Data Analysis Example 1

8.15 Data Analysis Example 2

8.16 Data Analysis Example 3

8.17 Summary

8.18 Recommended Readings

9. Practical Issues in Multiple Imputation

9.1 Chapter Overview

9.2 Dealing with Convergence Problems

9.3 Dealing with Non-Normal Data

9.4 To Round or Not to Round?

9.5 Preserving Interaction Effects

9.6 Imputing Multiple-Item Questionnaires

9.7 Alternate Imputation Algorithms

9.8 Multiple Imputation Software Options

9.9 Data Analysis Example 1

9.10 Data Analysis Example 2

9.11 Summary

9.12 Recommended Readings

10. Models for Missing Not at Random Data

10.1 Chapter Overview

10.2 An Ad Hoc Approach to Dealing with MNAR Data

10.3 The Theoretical Rationale for MNAR Models

10.4 The Classic Selection Model

10.5 Estimating the Selection Model

10.6 Limitations of the Selection Model

10.7 An Illustrative Analysis

10.8 The Pattern Mixture Model

10.9 Limitations of the Pattern Mixture Model

10.10 An Overview of the Longitudinal Growth Model

10.11 A Longitudinal Selection Model

10.12 Random Coefficient Selection Models

10.13 Pattern Mixture Models for Longitudinal Analyses

10.14 Identification Strategies for Longitudinal Pattern Mixture Models

10.15 Delta Method Standard Errors

10.16 Overview of the Data Analysis Examples

10.17 Data Analysis Example 1

10.18 Data Analysis Example 2

10.19 Data Analysis Example 3

10.20 Data Analysis Example 4

10.21 Summary

10.22 Recommended Readings

11. Wrapping Things Up: Some Final Practical Considerations

11.1 Chapter Overview

11.2 Maximum Likelihood Software Options

11.3 Multiple Imputation Software Options

11.4 Choosing between Maximum Likelihood and Multiple Imputation

11.5 Reporting the Results from a Missing Data Analysis

11.6 Final Thoughts

11.7 Recommended Readings