FairLens: Auditing Black-box Clinical Decision Support Systems
Exploratory: Social Impact of AI and Explainable ML
The growing availability of Electronic Health Records (EHR) and the continually increasing predictive power of Machine Learning (ML) models are boosting both research advances and the creation of business opportunities to deploy clinical decision support systems (DSS) in healthcare facilities. Since such models are still not equipped to differentiate between correlation and causation, they might leverage spurious correlations and undesired biases to boost their performance. While there is an increasing interest of the AI community to commit to interdisciplinary endeavors to define, investigate and provide guidelines to tackle biases and fairness-related issues, quantitative and systematic auditing for real-world datasets and ML models is still in its infancy. This work investigates the potential biases in ML models trained on patients’ clinical history represented as diagnostic codes. This type of structured data allows for a machine-ready representation of the patient’s clinical history. It is commonly used in longitudinal ML modeling for phenotyping, multi-morbidity diagnosis, and sequential clinical events prediction. The implicit assumption behind the use of ICD codes in ML applications is that these codes are a good proxy for the patient’s actual health status. However, ICD codes can misrepresent such status because of many potential errors in translating the patient’s actual disease into the respective code. Data quality assessment for the secondary use of health data is of pivotal importance to ease the transition of ML-based DSS from academic prototypes into real-world clinical practice. This is particularly true when ICD codes are fed into black-box ML models, i.e., models whose internal decision-making process is opaque. FairLens, is a new methodology to discover and explain biases for ML models trained on ICD structured healthcare data. It takes bias analysis a step further by explaining the reasons behind the bad model’s performance on specific subgroups. FairLens embeds explainability techniques in order to explain the reasons behind model mistakes instead of simply explaining model predictions. The algorithm is designed to be applied to any sequential ML model trained on ICD codes. FairLens first stratifies patients according to attributes of interest such as age, gender, ethnicity, and insurance type; it then applies an appropriate metric to identify patients' subgroups where the model performs poorly. Lastly, FairLens identifies the clinical conditions that are most frequently misclassified for the selected subgroup and explains which elements in the patients’ clinical histories are influencing the misclassification. In the paper they also present a use case for their methodology using the most recent update of one of the largest freely available ICU datasets, the MIMIC-IV dataset.
FairLens pipeline: Firstly they group patients based on some attributes (Age,insurance,...). Then a disparity function rank these groups. FairLens thus displays the top three over- and under-represented codes to the domain expert who can ask for an explanation for the highlighted conditions
In this scenario, MIMIC-IV acts as the healthcare facility’s historical medical database. They show how a domain expert can use FairLens to audit a multilabel clinical DSS acting as a fictional commercial black-box model. They believe that applied research and quantitative tools to perform systematic audits specific to healthcare data are very much needed in order to establish and reinforce trust in FairLens: Auditing Black-box Clinical Decision Support Systems 3 in the application of AI-based systems in such a high-stakes domain. FairLens is a first step to make fairness and bias auditing a standard procedure for clinical DSS. They envision such a procedure to monitor bias and fairness issues in all clinical DSSs’ life-cycle stages.
Written by: Francesco Bodria
References:
Cecilia Panigutti and Alan Perotti and Andrè Panisson and Paolo Bajardi and Dino Pedreschi “FairLens: Auditing Black-box Clinical Decision Support Systems” arXiv:2011.04049 (2020).