Grading and providing feedback are two of the most time-consuming activities in education. We developed a randomised controlled trial (RCT) to test whether they could be performed by generative artificial intelligence (Gen-AI). We randomly allocated undergraduate students to feedback provided either by a human instructor, ChatGPT 3.5, or ChatGPT 4. Our results show that: (i) Students treated with the freely accessible ChatGPT 3.5 received lower grades in subsequent assessments than their peers in the control group who always received human feedback; (ii) No such penalty was observed for ChatGPT 4. Separately, we tested the capacity of Gen-AI to grade student work. Gen-AI grades and ranks were significantly different than human-generated grades. Overall, while the newest LLM helps learning as well as a human, its ability to grade student work is still inferior.
We use cookies to provide you with an optimal website experience. This includes cookies that are necessary for the operation of the site as well as cookies that are only used for anonymous statistical purposes, for comfort settings or to display personalized content. You can decide for yourself which categories you want to allow. Please note that based on your settings, you may not be able to use all of the site's functions.
Cookie settings
These necessary cookies are required to activate the core functionality of the website. An opt-out from these technologies is not available.
In order to further improve our offer and our website, we collect anonymous data for statistics and analyses. With the help of these cookies we can, for example, determine the number of visitors and the effect of certain pages on our website and optimize our content.