Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Currently submitted to: Journal of Medical Internet Research

Date Submitted: Feb 2, 2026
Open Peer Review Period: Feb 3, 2026 - Mar 31, 2026
(currently open for review)

Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.

Extracting Quality of Life Information from Forum Posts Using Open-Source Large Language Models: Feasibility Study

  • Karolina Hanna Czok; 
  • David Maria Schmidt; 
  • Brian Chen; 
  • Deborah Kuk; 
  • Joseph L. Smith; 
  • Philipp Cimiano

ABSTRACT

Background:

Quality of Life (QoL) questionnaires are an established instrument designed to assess overall wellbeing and quality of life of patients. They are important in predicting the outcome of the disease and understanding the needs of individual patients. However, their repeated collection imposes substantial burden on both patients and clinical professionals. Many patients seek emotional support and mutual exchange in online communities for peer-support, where they frequently share detailed descriptions of symptoms and treatment experiences, addressing topics covered in QoL questionnaires. The emergence of large language models (LLMs) uncover potential for automatic extraction of relevant QoL information from patient-generated text.

Objective:

The aim of this study is to evaluate and compare various open-source LLMs and optimization approaches for automated extraction of QoL information from forum posts.

Methods:

The dataset consisted of 2,683 English-language posts from breast cancer patients recruited on Inspire.com online communities, manually annotated with sentence-level text spans indicating whether and where posts contained information relevant to 53 QoL questions from EORTC QLQ-C30 and QLQ-BR23 questionnaires. 11 open-source LLMs (8B-70B parameters) were evaluated in a zero-shot setup, generating 4,452 post-question predictions per model under two input conditions: post-only and post with additional context. For the best-performing model, additional experiments assessed the impact of chain-of-thought prompting, instruction optimization, few-shot prompting and parameter-efficient fine-tuning. For correctly classified yes/no instances, the overlap between model-generated evidence and human-annotated spans was evaluated.

Results:

Across 11 evaluated LLMs, GPT-OSS 20B achieved the highest macro F1-score (0.79) in the zero-shot post-only setting. Providing additional context consistently reduced performance of all models. Model size did not correlate with F1-score, with several mid-sized models (14B-30B) outperforming 70B models. For GPT-OSS 20B, chain-of-thought prompting did not improve performance (0.77). Instruction optimization produced results similar to the baseline in both zero-shot and few-shot settings (0.78-0.80). Bootstrap few-shot prompting with random search achieved the highest score overall (0.81). Parameter-efficient fine-tuning decreased performance (0.71). Most classification errors occurred in semantically broad or ambiguous terms and the fallback question. For correctly predicted yes/no answers, model-generated evidence matched or partially matched human-annotated spans in 89% of cases.

Conclusions:

Open-source LLMs are a promising tool for extracting QoL information that aligns with standardized questionnaire responses from online health forums. Mid-sized models achieved the highest accuracy, particularly in zero-shot, post-only settings. Few-shot prompting can further improve the results. Models were also able to generate evidence spans that closely matched human annotations. However, they consistently struggled with ambiguous and semantically overlapping terms. Overall, automated extraction of QoL information from patient-generated content may offer a faster, lower-cost and low-burden complement to traditional QoL questionnaires, given that limitations such as symptom ambiguity are addressed in future work.


 Citation

Please cite as:

Czok KH, Schmidt DM, Chen B, Kuk D, Smith JL, Cimiano P

Extracting Quality of Life Information from Forum Posts Using Open-Source Large Language Models: Feasibility Study

JMIR Preprints. 02/02/2026:92716

DOI: 10.2196/preprints.92716

URL: https://preprints.jmir.org/preprint/92716

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.