TY - JOUR
T1 - A practical guide to the implementation of AI in orthopaedic research, Part 6: How to evaluate the performance of AI research?
AU - Oettl, Felix C
AU - Pareek, Ayoosh
AU - Winkler, Philipp
AU - Zsidai, Balint
AU - Pruneski, James A
AU - Senorski, Eric Hamrin
AU - Kopf, Sebastian
AU - Ley, Christophe
AU - Herbst, Elmar
AU - Oeding, Jacob F
AU - Grassi, Alberto
AU - Hirschmann, Michael T
AU - Musahl, Volker
AU - Samuelsson, Kristian
AU - Tischer, Thomas
AU - Feldt, Robert
PY - 2024/5
Y1 - 2024/5
N2 - Artificial intelligence's (AI) accelerating progress demands rigorous evaluation standards to ensure safe, effective integration into healthcare's high-stakes decisions. As AI increasingly enables prediction, analysis and judgement capabilities relevant to medicine, proper evaluation and interpretation are indispensable. Erroneous AI could endanger patients; thus, developing, validating and deploying medical AI demands adhering to strict, transparent standards centred on safety, ethics and responsible oversight. Core considerations include assessing performance on diverse real-world data, collaborating with domain experts, confirming model reliability and limitations, and advancing interpretability. Thoughtful selection of evaluation metrics suited to the clinical context along with testing on diverse data sets representing different populations improves generalisability. Partnering software engineers, data scientists and medical practitioners ground assessment in real needs. Journals must uphold reporting standards matching AI's societal impacts. With rigorous, holistic evaluation frameworks, AI can progress towards expanding healthcare access and quality.
AB - Artificial intelligence's (AI) accelerating progress demands rigorous evaluation standards to ensure safe, effective integration into healthcare's high-stakes decisions. As AI increasingly enables prediction, analysis and judgement capabilities relevant to medicine, proper evaluation and interpretation are indispensable. Erroneous AI could endanger patients; thus, developing, validating and deploying medical AI demands adhering to strict, transparent standards centred on safety, ethics and responsible oversight. Core considerations include assessing performance on diverse real-world data, collaborating with domain experts, confirming model reliability and limitations, and advancing interpretability. Thoughtful selection of evaluation metrics suited to the clinical context along with testing on diverse data sets representing different populations improves generalisability. Partnering software engineers, data scientists and medical practitioners ground assessment in real needs. Journals must uphold reporting standards matching AI's societal impacts. With rigorous, holistic evaluation frameworks, AI can progress towards expanding healthcare access and quality.
UR - https://www.scopus.com/pages/publications/85195108728
U2 - 10.1002/jeo2.12039
DO - 10.1002/jeo2.12039
M3 - Article
C2 - 38826500
SN - 2197-1153
VL - 11
JO - Journal of Experimental Orthopaedics
JF - Journal of Experimental Orthopaedics
IS - 3
M1 - e12039
ER -