Études fondées sur les communautés Reddit

Patient Experiences in the Cochlear Implant Reddit Community: Comparing Human and Large Language Model Categorization.

Habib DRS, Depala K, Lin J, Le S, McFall N, Dewan SS, Huang J, Habib MWS, Bishay AE, Siebor K, Babaoglu G, Chowdhury NI, Moberly AC

Am J Audiol . 2026;35 (2) :487-496

📅 02/06/2026 PMID : 41746205 DOI : 10.1044/2025_AJA-25-00216

Résumé

PURPOSE: Although some work has leveraged automated analyses of online communities to gain cochlear implant (CI) patient insights, there remains a gap in comparing human versus automated analysis of the nuanced, real-world experiences patients share outside clinical settings. This study characterizes experiences within the r/Cochlearimplants Reddit community and compares human to large language model (LLM) performance in annotating posts.METHOD: Using reflexive thematic analysis, 996 publicly available r/Cochlearimplants posts (October 2024-June 2025) were manually coded and consolidated into themes. Three LLMs-OpenAI o3, Gemini 2.5 Pro, and Claude Sonnet 4-were prompted with the posts and human-generated codebook to perform post categorization. Model performance was evaluated against human coding using Cohen's kappa, percent agreement, sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and time.RESULTS: Five themes emerged. Community engagement and support ( = 944, 94.8%) frequently involved eliciting advice ( = 721, 72.4%), seeking shared experiences ( = 249, 25.0%), and sharing negative experiences ( = 247, 24.8%). Other themes included the medical/surgical journey ( = 463, 46.5%), device/technical issues ( = 343, 34.4%), daily life/adjustments ( = 236, 23.7%), and media/outreach (7.2%, = 72). OpenAI o3 and Gemini 2.5 Pro achieved the highest interrater reliability with human annotators (κ = .35 and κ = .34, respectively). OpenAI o3 had higher sensitivity (46.7%) but lower specificity (90.4%) than Gemini 2.5 Pro, which had the highest specificity (93.4%) but lower sensitivity (38.0%). Claude Sonnet 4 showed the lowest agreement (κ = .25) and PPV (30.9%). Compared to human annotation requiring 52 hr across all annotators, each LLM required less than 20 min.CONCLUSIONS: Reddit posts revealed rich discourse across CI topics. LLMs demonstrated fair agreement with human coders and can quickly aid in large-scale qualitative analysis. Although careful model selection and human expertise remain essential for accurate interpretation, LLM annotation shows potential for real-time monitoring of patient concerns to inform counseling, rehabilitation strategies, and iterative device design.SUPPLEMENTAL MATERIAL: https://doi.org/10.23641/asha.31362847.