Licensed reuse rights only

Fairness concerns central to the validity of test scores articulated in the current professional standards for educational testing apply to all types of scoring. In this exploratory study, we apply an established method for detecting differential item functioning in its generalized form to examine the extent of differential functioning associated with text features in the context of automated essay scoring. We address the primary research question of whether the mean feature scores conditional on the machine score differ between the two gender groups. We also examine the machine score as a stratification variable with different levels of precision, and what happens when human scores are used in place of the machine score as the stratification variable. Results of the analyses and implications for future research are also discussed.

You do not currently have access to this chapter.
Don't already have an account? Register

Purchased this content as a guest? Enter your email address to restore access.

Please enter valid email address.
Email address must be 94 characters or fewer.