Robustness Testing of Medical AI Models: MIMIC-III, eICU, and SEER Datasets

This study evaluates the accuracy of machine learning models in predicting serious disease outcomes: 48-hour in-hospital mortality risk, 5-year breast cancer survivability, and 5-year lung cancer survivability. Three datasets—MIMIC-III, eICU, and SEER—were used, employing models such as LSTM, MLP, and XGBoost. To test model robustness, various test case generation methods were designed, including attribute-based variations, gradient ascent, and Glasgow Coma Scale-based approaches. The study assessed model performance on these challenging cases, revealing varying performance across datasets and methods, highlighting the need for further improvements to enhance reliability.
Read more