Definition:
The difficulty index (also called the p-value or facility index) is the proportion of test-takers who answer a particular test item correctly. It is the simplest item-level statistic in Classical Test Theory. The value ranges from 0.00 (nobody answered correctly) to 1.00 (everybody answered correctly). Counterintuitively, a higher p-value means an easier item.
In-Depth Explanation
Formula:
Difficulty Index (p) = Number of correct responses ÷ Total number of test-takers
Example: If 75 out of 100 students answered item #12 correctly, the difficulty index is 0.75.
Interpreting the p-value:
| p-value Range | Interpretation |
|---|---|
| 0.90–1.00 | Very easy — almost everyone gets it right |
| 0.70–0.89 | Easy |
| 0.30–0.69 | Moderate difficulty — ideal range for most tests |
| 0.10–0.29 | Difficult |
| 0.00–0.09 | Very difficult — almost nobody gets it right |
Optimal difficulty depends on test purpose:
- Achievement tests (checking what students learned): Items should cluster around 0.60–0.80 to confirm learning occurred
- Proficiency/placement tests (maximizing discrimination): Items around 0.40–0.60 provide the most information about ability differences
- Mastery tests (pass/fail): Higher p-values are acceptable if the goal is confirming minimum competence
Limitations:
- The difficulty index is sample-dependent — it changes based on who takes the test. An item might have p = 0.30 for beginners and p = 0.90 for advanced students.
- It doesn’t account for guessing. On a 4-option multiple-choice item, random guessing alone produces p = 0.25.
- A very easy item (p = 0.95) has almost no ability to discriminate between students — see Discrimination Index.
In language testing:
JLPT items are calibrated so that items within each level span a range of difficulties. N5 items are designed to be easier than N1 items, but within each level, there’s a spread: some items most test-takers get right, some only the strongest at that level get right.
Related Terms
See Also
Research
- Crocker, L., & Algina, J. (2008). Introduction to Classical and Modern Test Theory. Cengage Learning. — Standard reference for difficulty index calculation and interpretation.
- Brown, J. D. (2005). Testing in Language Programs: A Comprehensive Guide to English Language Assessment. McGraw-Hill. — Applied difficulty index analysis for language tests.