대규모 언어 모델의 건강 데이터 해석 능력: 다단계 평가 항목을 중심으로
Information
| 일자 | 2025년 05월 08일 |
|---|---|
| 저자 | 유지원, 신항식 |
| 학술대회 | 제 65회 대한의용생체공학회 춘계학술대회 |
Video
Overview
Large Language Models (LLMs) have shown potential in medical data interpretation, but accurately analyzing clinical indicators like blood pressure, fasting glucose, total cholesterol, and low-density lipoprotein cholesterol remains challenging due to their complexity. This study evaluates LLM performance in interpreting these indicators and assesses the effect of providing explicit diagnostic criteria. We used 1,000 health checkup records from the Korean National Health Insurance Service and tested four LLMs: Claude 3.5 Sonnet, Gemini 1.5 Pro, GPT-4o, and LLaMA 3.1-70B. We analyzed basic prompt and constraint prompt with clear criteria. Providing criteria significantly improved accuracy, especially for fasting glucose and cholesterol, where most models achieved over 0.95 accuracy. Blood pressure interpretation showed less improvement due to its complexity. This study confirms that structured prompts enhance LLM performance, especially for complex clinical data.