☆ 4.6 Article

Evaluating the Effectiveness of Artificial Intelligence-powered Large Language Models Application in Disseminating Appropriate and Readable Health Information in Urology

JOURNAL OF UROLOGY (2023)

期刊

JOURNAL OF UROLOGY

卷 210, 期 4, 页码 688-694

出版社

LIPPINCOTT WILLIAMS & WILKINS

DOI: 10.1097/JU.0000000000003615

关键词

artificial intelligence; communication; health; urology; signs and symptoms; therapeutics

类别

Urology & Nephrology

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

This study evaluated the appropriateness and readability of natural language processor-generated responses to urology-related medical inquiries. The study found that while natural language processors showed promise in terms of appropriateness, there were still issues such as lacking vital information. Therefore, refinement is needed before adopting natural language processors as sources of medical information.

Purpose: The Internet is a ubiquitous source of medical information, and natural language processors are gaining popularity as alternatives to traditional search engines. However, suitability of their generated content for patients is not well understood. We aimed to evaluate the appropriateness and readability of natural language processor-generated responses to urology-related medical inquiries. Materials and Methods: Eighteen patient questions were developed based on Google Trends and were used as inputs in ChatGPT. Three categories were assessed: oncologic, benign, and emergency. Questions in each category were either treatment or sign/symptom-related questions. Three native English-speaking Board-Certified urologists independently assessed appropriateness of ChatGPT outputs for patient counseling using accuracy, comprehensiveness, and clarity as proxies for appropriateness. Readability was assessed using the Flesch Reading Ease and Flesh-Kincaid Reading Grade Level formulas. Additional measures were created based on validated tools and assessed by 3 independent reviewers. Results: Fourteen of 18 (77.8%) responses were deemed appropriate, with clarity having the most 4 and 5 scores (P =.01). There was no significant difference in appropriateness of the responses between treatments and symptoms or between different categories of conditions. The most common reason from urologists for low scores was responses lacking informationdsometimes vital information. The mean (SD) Flesch Reading Ease score was 35.5 (SD=10.2) and the mean Flesh-Kincaid Reading Grade Level score was 13.5 (1.74). Additional quality assessment scores showed no significant differences between different categories of conditions. Conclusions: Despite impressive capabilities, natural language processors have limitations as sources of medical information. Refinement is crucial before adoption for this purpose.

Evaluating the Effectiveness of Artificial Intelligence-powered Large Language Models Application in Disseminating Appropriate and Readable Health Information in Urology

期刊

JOURNAL OF UROLOGY

出版社

LIPPINCOTT WILLIAMS & WILKINS

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Evaluating the Effectiveness of Artificial Intelligence-powered Large Language Models Application in Disseminating Appropriate and Readable Health Information in Urology

期刊

JOURNAL OF UROLOGY

出版社

LIPPINCOTT WILLIAMS & WILKINS

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文