December 2024

IZA DP No. 17511: Man vs Machine: Can AI Grade and Give Feedback Like a Human?

Grading and providing feedback are two of the most time-consuming activities in education. We developed a randomised controlled trial (RCT) to test whether they could be performed by generative artificial intelligence (Gen-AI). We randomly allocated undergraduate students to feedback provided either by a human instructor, ChatGPT 3.5, or ChatGPT 4. Our results show that: (i) Students treated with the freely accessible ChatGPT 3.5 received lower grades in subsequent assessments than their peers in the control group who always received human feedback; (ii) No such penalty was observed for ChatGPT 4. Separately, we tested the capacity of Gen-AI to grade student work. Gen-AI grades and ranks were significantly different than human-generated grades. Overall, while the newest LLM helps learning as well as a human, its ability to grade student work is still inferior.