Études fondées sur les communautés Reddit

Evaluation of Glaucoma Treatment Information on Social Media Using Large Language Models.

Bulusu A, Cotran PR, Alwreikat AM, Jiang Y, Cooper ML, Ramsey KM, Verghese AP, Ramsey DJ

J Glaucoma . 2026;35 (3) :173-178

📅 01/03/2026 PMID : 41401013 DOI : 10.1097/IJG.0000000000002673

Résumé

PRÉCIS: This study investigates the accuracy, readability, utility, and educational value of glaucoma treatment content on social media platforms and explores how large language models assess the quality of social media posts compared with glaucoma experts.PURPOSE: To assess the quality of information on glaucoma treatment available on social media platforms.METHODS: A 30-question survey consisting of the "top posts" from three social media platforms (X, Instagram, and Reddit) was assessed by 5 board-certified glaucoma experts across four domains (readability, utility, educational value, and accuracy) by using a 5-point Likert scale. The overall quality of each post was calculated as the average of the median score assigned to each of the four domains to create a reference standard. Expert agreement was assessed using Kendall's coefficient of concordance ( W ). A large language model (LLM), GPT-4 (OpenAI), was then prompted to evaluate the same posts with identical instructions. Agreement with expert consensus was compared using Cohen weighted kappa ( κ ), and the difference in favorability of each post assessed using McNemar exact test.RESULTS: Fewer than half of social media posts on glaucoma treatment were judged favorably by glaucoma experts (40%). GPT-4 was less critical of social media content and provided a favorable rating nearly twice as often (77%, P =0.017). Despite this difference, there was moderate agreement between the LLM compared with the glaucoma experts ( κ =0.421, P =0.005). The lack of agreement predominantly stemmed from cases where the experts rated the content unfavorably, with disagreement occurring in 56% of cases, compared with 0% when the content was deemed favorable ( P =0.005).CONCLUSIONS: Although glaucoma experts and artificial intelligence (AI)-based systems were in moderate agreement when evaluating the quality of posts, the LLM was less able to discriminate posts of low quality.