Towards AI Assisted Essay Marking: Developing and Evaluating Automated Writing Scoring Models

Поділитися
Вставка
  • Опубліковано 27 лип 2024
  • (Michelle Chen)
    This study provides an overview of the stages and considerations in developing automated essay scoring models. The writing test under investigation brings some challenges for machine scoring, including many possible score points (0-12) and a diverse test taker population regarding their language, cultural, and educational backgrounds. Our primary goals are to demonstrate the development steps and evaluate the performance of automatic scoring models by comparing the results with human scoring. Two models were trained, one for each writing task type, using techniques drawing from artificial intelligence, natural language processing, and statistics. To train the models, we selected writing samples covering broad proficiency levels from a diverse test-taker group. After each model was trained (approximately n=260 for each) and validated on a small sample (n=50 for each model), they were applied to score a larger sample (total n=271) for us to evaluate their performance. The study finds that automated scoring showed satisfactory overall agreement with human scoring. The machine scores showed a central tendency, and, as a result, the agreement levels are lower at the high and low ends of the score distribution. The results are comparable for all test takers and different gender subgroups. Together, the findings of this study show that automated scoring models are a promising solution to mark essays for the general language proficiency test. At the same time, these findings suggest that there is room for the models to be improved and refined with more input. More studies are needed to evaluate the models and their results to ensure that they are valid and fair to all test takers.

КОМЕНТАРІ •