Quality and Accountability of Large Language Models (LLMs) in Healthcare in Low- And Middle-Income Countries (LMIC): A Simulated Patient Study Using ChatGPT

August 2024

IZA DP No. 17204: Quality and Accountability of Large Language Models (LLMs) in Healthcare in Low- And Middle-Income Countries (LMIC): A Simulated Patient Study Using ChatGPT

Yafei Si, Yuyi Yang, Xi Wang, Ruopeng An, Jiaqi Zu, Xi Chen, Xiaojing Fan, Sen Gong

published as 'Quality and Accountability of ChatGPT in Health Care in Low- and Middle-Income Countries: Simulated Patient Study' in: Journal of Medical Internet Research, 2024, 26, e56121

Using simulated patients to mimic nine established non-communicable and infectious diseases over 27 trials, we assess ChatGPT's effectiveness and reliability in diagnosing and treating common diseases in low- and middle-income countries. We find ChatGPT's performance varied within a single disease, despite a high level of accuracy in both correct diagnosis (74.1%) and medication prescription (84.5%). Additionally, ChatGPT recommended a concerning level of unnecessary or harmful medications (85.2%) even with correct diagnoses. Finally, ChatGPT performed better in managing non-communicable diseases compared to infectious ones. These results highlight the need for cautious AI integration in healthcare systems to ensure quality and safety.

Download

IZA DP No. 17204: Quality and Accountability of Large Language Models (LLMs) in Healthcare in Low- And Middle-Income Countries (LMIC): A Simulated Patient Study Using ChatGPT

Keywords

JEL Codes