The Theory and Practice of Item Response Theory

R. J. de Ayala

Hardcovere-bookprint + e-book
December 30, 2008
ISBN 9781593858698
Price: $71.00
448 Pages
Size: 7" x 10"
October 15, 2013
Price: $71.00
448 Pages
print + e-book
Hardcover + e-Book (PDF) ?
Price: $142.00 $78.10
448 Pages

Item response theory (IRT) is a latent variable modeling approach used to minimize bias and optimize the measurement power of educational and psychological tests and other psychometric applications. Designed for researchers, psychometric professionals, and advanced students, this book clearly presents both the “how-to” and the “why” of IRT. It describes simple and more complex IRT models and shows how they are applied with the help of widely available software packages. Chapters follow a consistent format and build sequentially, taking the reader from model development through the fit analysis and interpretation phases that one would perform in practice. The use of common empirical data sets across the chapters facilitates understanding of the various models and how they relate to one another.

This title is part of the Methodology in the Social Sciences Series, edited by Todd D. Little, PhD.

“The main strength of the text is in the descriptions and elaborations of the common IRT models....De Ayala also covers fundamental relationships that exist between models, such as the relationships between the parameters of the nominal response model and the partial credit model. In addition, the chapters contain practical advice for sample sizes commonly used with each model and how to interpret the parameters. De Ayala also presents results as statistical indices and graphics for various examples across different contexts, which allows readers the ability to see how the models work from several different perspectives....Does a good job of introducing common estimation strategies employed in IRT software packages. Especially helpful are the illustrations de Ayala includes with the code from IRT software packages.”


“A must read for practitioners who use item response theory to calibrate test data. It also would serve as a tremendous resource for measurement researchers who daily navigate the circuitous paths of various IRT estimation software programs to analyze and understand their assessment data....Each of the 12 chapters is packed with annotated examples of how to use IRT estimation software and the subsequent output....The author does an excellent job of supplementing explanations of various models with calibration examples and output of multiple data sets using several different IRT calibration software programs including BILOG, MULTILOG, BIGSTEPS, and NOHARM....The book is more practitioner-oriented and applied than previous classic books that provide foundational understanding of IRT models and applications....Would be an excellent text for a graduate level IRT class in which the goal of the course would be to review dichotomous, polytomous, and multidimensional IRT models an how to estimate parameters in the various models using a variety of commercially available software....I would encourage all testing practitioners who work with various IRT models, as well as graduate students who plan to go into the measurement field, to seriously consider this book. It is an excellent resource book and one that provides the reader with insight into the rationale and application of the different IRT models. I applaud Dr. de Ayala for all the time and effort he has put into this book. He has clearly done the measurement field a great service.”

Journal of Educational Measurement

“This book is jam-packed with useful information. It includes basic, practical programming examples, with clear explanations of WinSTEPS and BILOG scripts, and step-by-step interpretations of goodness of fit in IRT problems. The author also covers more advanced forms of IRT, including multicategory items, multidimensional latent influences, and advanced multiple-group problems of linking and equating. A tour de force!”

—John J. McArdle, PhD, Head, Quantitative Methods Area, Department of Psychology, University of Southern California

“A very well-organized and useful introduction to IRT. The book has an excellent structure that covers widely used IRT models and most of their major applications. The author has done an outstanding job of balancing the mathematical with the conceptual, and each chapter contains examples of applications to real data using commercially available software. The book is liberally supplemented by the kinds of graphic displays that can help neophytes understand the complexities of IRT. An especially useful feature is the up-front glossary of notation and acronyms. This is an excellent text for a one-semester graduate-level course in IRT, and should provide students with the knowledge they require to delve deeper into IRT models and their applications. It is also a useful reference for psychological and educational researchers who apply IRT in their work.”

—David J. Weiss, PhD, Department of Psychology, University of Minnesota; Editor Emeritus, Applied Psychological Measurement

“This book provides a thorough overview of item response theory methodology, with a nice blend of theoretical psychometrics and practical applications. The coverage is quite complete, including the standard dichotomous and polytomous unidimensional models as well as multidimensional models. The examples are very useful. This book will serve very well as a technical reference and as a text for upper-level psychometric theory courses.”

—Mark D. Reckase, PhD, Department of Counseling, Educational Psychology, and Special Education, Michigan State University

“De Ayala does a masterful job of describing the fundamental theory and the many applications of IRT. I am impressed by the breadth of models he covers and the detail he presents on various estimation methods. Coverage includes the standard Rasch; one-, two-, and three-parameter models; polytomous and multidimensional models; and applications to linking/equating and differential item functioning. This is a well-written book that will be useful for graduate students, researchers, and practicing measurement specialists in education, health, and psychology. The greatest strength of this book is de Ayala's ability to present IRT in an engaging, accessible manner.”

—Bruno D. Zumbo, PhD, Paragon–UBC Professor of Psychometrics and Measurement, University of British Columbia

“Offers a good roadmap to the complex array of IRT model parameters, estimation methods, and readily available IRT programs. By juxtaposing algebraic development of IRT models (and model estimation) alongside annotated results and software output from applied examples, this book provides an excellent resource for both intermediate and advanced IRT practitioners. The applied researcher will find this book to be an excellent practical resource with numerous examples that use multiple software packages to analyze the same datasets.”

—Scott M. Hofer, PhD, Department of Human Development and Family Sciences, Oregon State University

“The book has an excellent balance among the technical, conceptual, and practical aspects of item response theory. It is comprehensive; provides example scripts and output from a variety of popular item response programs; and uses selected data sets throughout the book, making model and program comparisons possible. I also liked the coverage of commonly asked questions related to model fit, item fit, and appropriate sample sizes, which are often missing in item response theory texts.”

—Kevin J. Grimm, PhD, Department of Psychology, Arizona State University

Table of Contents

Symbols and Acronyms

1. Introduction to Measurement

- Measurement

- Some Measurement Issues

- Item Response Theory

- Classical Test Theory

- Latent Class Analysis

- Summary

2. The One-Parameter Model

- Conceptual Development of the Rasch Model

- The One-Parameter Model

- The One-Parameter Logistic Model and the Rasch Model

- Assumptions underlying the Model

- An Empirical Data Set: The Mathematics Data Set

- Conceptually Estimating an Individual's Location

- Some Pragmatic Characteristics of Maximum Likelihood Estimates

- The Standard Error of Estimate and Information

- An Instrument's Estimation Capacity

- Summary

3. Joint Maximum Likelihood Parameter Estimation

- Joint Maximum Likelihood Estimation

- Indeterminacy of Parameter Estimates

- How Large a Calibration Sample?

- Example: Application of the Rasch Model to the Mathematics Data, JMLE

- Summary

4. Marginal Maximum Likelihood Parameter Estimation

- Marginal Maximum Likelihood Estimation

- Estimating an Individual's Location: Expected A Posteriori

- Example: Application of the Rasch Model to the Mathematics Data, MMLE

- Metric Transformation and the Total Characteristic Function

- Summary

5. The Two-Parameter Model

- Conceptual Development of the Two-Parameter Model

- Information for the Two-Parameter Model

- Conceptual Parameter Estimation for the 2PL Model

- How Large a Calibration Sample?

- Metric Transformation, 2PL Model

- Example: Application of the 2PL Model to the Mathematics Data, MMLE

- Information and Relative Efficiency

- Summary

6. The Three-Parameter Model

- Conceptual Development of the Three-Parameter Model

- Additional Comments about the Pseudo-Guessing Parameter, X

- Conceptual Estimation for the 3PL Model

- How Large a Calibration Sample?

- Assessing Conditional Independence

- Example: Application of the 3PL Model to the Mathematics Data, MMLE

- Assessing Person Fit: Appropriateness Measurement

- Information for the Three-Parameter Model

- Metric Transformation, 3PL Model

- Handling Missing Responses

- Issues to Consider in Selecting among the 1PL, 2PL, and 3PL Models

- Summary

7. Rasch Models for Ordered Polytomous Data

- Conceptual Development of the Partial Credit Model

- Conceptual Parameter Estimation of the PC Model

- Example: Application of the PC Model to a Reasoning Ability Instrument, MMLE

- The Rating Scale Model

- Conceptual Estimation of the RS Model

- Example: Application of the RS Model to an Attitudes toward Condom Scale, JMLE

- How Large a Calibration Sample?

- Information for the PC and RS Models

- Metric Transformation, PC and RS Models

- Summary

8. Non-Rasch Models for Ordered Polytomous Data

- The Generalized Partial Credit Model

- Example: Application of the GPC Model to a Reasoning Ability Instrument, MMLE

- Conceptual Development of the Graded Response Model

- How Large a Calibration Sample?

- Example: Application of the GR Model to an Attitudes toward Condom Scale, MMLE

- Information for Graded Data

- Metric Transformation, GPC and GR Models

- Summary

9. Models for Nominal Polytomous Data

- Conceptual Development of the Nominal Response Model

- How Large a Calibration Sample?

- Example: Application of the NR Model to a Science Test, MMLE

- Example: Mixed Model Calibration of the Science Test—NR and PC Models, MMLE

- Example: NR and PC Mixed Model Calibration of the Science Test, Collapsed Options, MMLE

- Information for the NR Model

- Metric Transformation, NR Model

- Conceptual Development of the Multiple-Choice Model

- Example: Application of the MC Model to a Science Test, MMLE

- Example: Application of the BS Model to a Science Test, MMLE

- Summary

10. Models for Multidimensional Data

- Conceptual Development of a Multidimensional IRT Model

- Multidimensional Item Location and Discrimination

- Item Vectors and Vector Graphs

- The Multidimensional Three-Parameter Logistic Model

- Assumptions of the MIRT Model

- Estimation of the M2PL Model

- Information for the M2PL Model

- Indeterminacy in MIRT

- Metric Transformation, M2PL Model

- Example: Application of the M2PL Model, Normal-Ogive Harmonic Analysis Robust Method

- Obtaining Person Location Estimates

- Summary

11. Linking and Equating

- Equating Defined

- Equating: Data Collection Phase

- Equating: Transformation Phase

- Example: Application of the Total Characteristic Function Equating

- Summary

12. Differential Item Functioning

- Differential Item Functioning and Item Bias

- Mantel–Haenszel Chi-Square

- The TSW Likelihood Ratio Test

- Logistic Regression

- Example: DIF Analysis

- Summary

Appendix A. Maximum Likelihood Estimation of Person Locations

- Estimating an Individual's Location: Empirical Maximum Likelihood Estimation

- Estimating an Individual's Location: Newton's Method for MLE

- Revisiting Zero Variance Binary Response Patterns

Appendix B. Maximum Likelihood Estimation of Item Locations

Appendix C. The Normal Ogive Models

- Conceptual Development of the Normal Ogive Model

- The Relationship between IRT Statistics and Traditional Item Analysis Indices

- Relationship of the Two-Parameter Normal Ogive and Logistic Models

- Extending the Two-Parameter Normal Ogive Model to a Multidimensional Space

Appendix D. Computerized Adaptive Testing

- A Brief History

- Fixed-Branching Techniques

- Variable-Branching Techniques

- Advantages of Variable-Branching over Fixed-Branching Methods

- IRT-Based Variable-Branching Adaptive Testing Algorithm

Appendix E. Miscellanea

- Linear Logistic Test Model (LLTM)

- Using Principal Axis for Estimating Item Discrimination

- Infinite Item Discrimination Parameter Estimates

- Example: NOHARM Unidimensional Calibration

- An Approximate Chi-Square Statistic for NOHARM

- Mixture Models

- Relative Efficiency, Monotonicity, and Information

- FORTRAN Formats

- Example: Mixed Model Calibration of the Science Test—NR and 2PL Models, MMLE

- Example: Mixed Model Calibration of the Science Test—NR and GR Models, MMLE

- Odds, Odds Ratios, and Logits

- The Person Response Function

- Linking: A Temperature Analogy Example

- Should DIF Analyses Be Based on Latent Classes?

- The Separation and Reliability Indices

- Dependency in Traditional Item Statistics and Observed Scores


Author Index

Subject Index

About the Author

R. J. de Ayala is Professor of Educational Psychology at the University of Nebraska-Lincoln. His research interests include psychometrics, item response theory, computerized adaptive testing, applied statistics, and multilevel models. His work has appeared in Applied Psychological Measurement, Applied Measurement in Education, the British Journal of Mathematical and Statistical Psychology, Educational and Psychological Measurement, the Journal of Applied Measurement, and the Journal of Educational Measurement. He is a Fellow of the American Psychological Association’s Division 5: Evaluation, Measurement, and Statistics, as well as of the American Educational Research Association.


Graduate students and researchers in educational, social, and clinical psychology; public health; management; sociology; and public policy; as well as psychometric professionals employed by testing companies, school districts, and medical schools or organizations.

Course Use

Will serve as a text in advanced graduate seminars such as Item Response Theory, Intermediate/Advanced Psychometrics, Modern Measurement, and Latent Variable Analysis.