IntroductionArtificial intelligence (AI) is increasingly integrated into patient education and postoperative care. Almousa et al. recently evaluated ChatGPT-4 and Gemini for postoperative facelift counseling, reporting high accuracy and clarity. While their study represents an important step toward AI-assisted communication in aesthetic surgery, several methodological issues may limit the validity and clinical applicability of their findings.MethodsWe critically appraised Almousa et al.'s study design, data collection, and analytic methods. Specific attention was given to question selection, evaluation metrics, reproducibility, and statistical robustness, comparing them with established standards for AI evaluation and inter-rater reliability.ResultsThe study used ChatGPT-4 itself to generate the five "most common" postoperative questions, introducing circularity and potential selection bias. Responses were assessed on a dichotomous (Yes/No) scale by five surgeons, without reporting inter-rater reliability or use of scaled metrics. It was unclear whether prompts were entered sequentially or independently, raising reproducibility concerns. The limited sample size (five questions per model) provided only 25 binary data points per system, precluding meaningful statistical inference. Furthermore, AI responses lacked individualized safety guidance and escalation advice, limiting clinical safety in real-world postoperative settings.ConclusionAlthough the study highlights the promise of LLMs in aesthetic surgery, future studies should employ patient-derived question sets, graded and reproducible evaluation scales, transparent prompt protocols, and inclusion of complication-related queries to accurately determine the safety and educational value of AI-generated postoperative information.Level of Evidence VThis journal requires that authors assign a level of evidence to each article. For a full description of these Evidence-Based Medicine ratings, please refer to the Table of Contents or the online Instructions to Authors www.springer.com/00266
Comment to "Artificial Intelligence (AI)-Assisted Patient Education and Concerns Following Facelift Surgery: A Study on ChatGPT-4 and Gemini"
Vinci, Valeriano
2026-01-01
Abstract
IntroductionArtificial intelligence (AI) is increasingly integrated into patient education and postoperative care. Almousa et al. recently evaluated ChatGPT-4 and Gemini for postoperative facelift counseling, reporting high accuracy and clarity. While their study represents an important step toward AI-assisted communication in aesthetic surgery, several methodological issues may limit the validity and clinical applicability of their findings.MethodsWe critically appraised Almousa et al.'s study design, data collection, and analytic methods. Specific attention was given to question selection, evaluation metrics, reproducibility, and statistical robustness, comparing them with established standards for AI evaluation and inter-rater reliability.ResultsThe study used ChatGPT-4 itself to generate the five "most common" postoperative questions, introducing circularity and potential selection bias. Responses were assessed on a dichotomous (Yes/No) scale by five surgeons, without reporting inter-rater reliability or use of scaled metrics. It was unclear whether prompts were entered sequentially or independently, raising reproducibility concerns. The limited sample size (five questions per model) provided only 25 binary data points per system, precluding meaningful statistical inference. Furthermore, AI responses lacked individualized safety guidance and escalation advice, limiting clinical safety in real-world postoperative settings.ConclusionAlthough the study highlights the promise of LLMs in aesthetic surgery, future studies should employ patient-derived question sets, graded and reproducible evaluation scales, transparent prompt protocols, and inclusion of complication-related queries to accurately determine the safety and educational value of AI-generated postoperative information.Level of Evidence VThis journal requires that authors assign a level of evidence to each article. For a full description of these Evidence-Based Medicine ratings, please refer to the Table of Contents or the online Instructions to Authors www.springer.com/00266I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


