Comment to "Artificial Intelligence (AI)-Assisted Patient Education and Concerns Following Facelift Surgery: A Study on ChatGPT-4 and Gemini"

IRIS

IntroductionArtificial intelligence (AI) is increasingly integrated into patient education and postoperative care. Almousa et al. recently evaluated ChatGPT-4 and Gemini for postoperative facelift counseling, reporting high accuracy and clarity. While their study represents an important step toward AI-assisted communication in aesthetic surgery, several methodological issues may limit the validity and clinical applicability of their findings.MethodsWe critically appraised Almousa et al.'s study design, data collection, and analytic methods. Specific attention was given to question selection, evaluation metrics, reproducibility, and statistical robustness, comparing them with established standards for AI evaluation and inter-rater reliability.ResultsThe study used ChatGPT-4 itself to generate the five "most common" postoperative questions, introducing circularity and potential selection bias. Responses were assessed on a dichotomous (Yes/No) scale by five surgeons, without reporting inter-rater reliability or use of scaled metrics. It was unclear whether prompts were entered sequentially or independently, raising reproducibility concerns. The limited sample size (five questions per model) provided only 25 binary data points per system, precluding meaningful statistical inference. Furthermore, AI responses lacked individualized safety guidance and escalation advice, limiting clinical safety in real-world postoperative settings.ConclusionAlthough the study highlights the promise of LLMs in aesthetic surgery, future studies should employ patient-derived question sets, graded and reproducible evaluation scales, transparent prompt protocols, and inclusion of complication-related queries to accurately determine the safety and educational value of AI-generated postoperative information.Level of Evidence VThis journal requires that authors assign a level of evidence to each article. For a full description of these Evidence-Based Medicine ratings, please refer to the Table of Contents or the online Instructions to Authors www.springer.com/00266

Comment to "Artificial Intelligence (AI)-Assisted Patient Education and Concerns Following Facelift Surgery: A Study on ChatGPT-4 and Gemini"

Caimi, Edoardo;Vaccari, Stefano;Vinci, Valeriano

2026-01-01

Abstract

IntroductionArtificial intelligence (AI) is increasingly integrated into patient education and postoperative care. Almousa et al. recently evaluated ChatGPT-4 and Gemini for postoperative facelift counseling, reporting high accuracy and clarity. While their study represents an important step toward AI-assisted communication in aesthetic surgery, several methodological issues may limit the validity and clinical applicability of their findings.MethodsWe critically appraised Almousa et al.'s study design, data collection, and analytic methods. Specific attention was given to question selection, evaluation metrics, reproducibility, and statistical robustness, comparing them with established standards for AI evaluation and inter-rater reliability.ResultsThe study used ChatGPT-4 itself to generate the five "most common" postoperative questions, introducing circularity and potential selection bias. Responses were assessed on a dichotomous (Yes/No) scale by five surgeons, without reporting inter-rater reliability or use of scaled metrics. It was unclear whether prompts were entered sequentially or independently, raising reproducibility concerns. The limited sample size (five questions per model) provided only 25 binary data points per system, precluding meaningful statistical inference. Furthermore, AI responses lacked individualized safety guidance and escalation advice, limiting clinical safety in real-world postoperative settings.ConclusionAlthough the study highlights the promise of LLMs in aesthetic surgery, future studies should employ patient-derived question sets, graded and reproducible evaluation scales, transparent prompt protocols, and inclusion of complication-related queries to accurately determine the safety and educational value of AI-generated postoperative information.Level of Evidence VThis journal requires that authors assign a level of evidence to each article. For a full description of these Evidence-Based Medicine ratings, please refer to the Table of Contents or the online Instructions to Authors www.springer.com/00266

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2026
			
	Parole chiave
	
				Artificial intelligence
Facelift surgery
Large language models
Patient education
Postoperative care
			
	Appare nelle tipologie:
	
				1.1 Articolo in rivista

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11699/107732

Citazioni

ND

ND

1

social impact