DiabetesSense
93% accurate clinical risk scoring with SHAP interpretability
- Timeline
- Jan — Jul 2024
- Role
- Solo ML capstone, COMSATS University Islamabad
- Status
- Published
- Primary stack
- scikit-learn · SHAP · React.js · Flask
Headline metrics
What this project tackles
Clinical prediction models live or die by interpretability. A black-box classifier can hit 95% accuracy and still be useless if a clinician can't see why a particular patient was flagged. Diabetes risk already has good baseline accuracy from logistic regression and tree ensembles, so the real research question wasn't 'can we predict?' but 'can we predict and explain in a way clinicians will actually trust?'
Adoption literature on clinical ML is consistent: when clinicians can't trace a prediction back to features they recognise, they reject the tool — even when the tool is more accurate than their own judgement. The interpretability layer isn't optional polish; it's the load-bearing part.
System design
Built on a public diabetes risk dataset with stratified k-fold splits to handle class imbalance. Trained two complementary tree models — Random Forest for variance reduction across heterogeneous feature interactions, Gradient Boosting for sequential refinement on hard examples — and combined them via soft voting. Hyperparameter search via grid search with CV-internal validation.
Wrapped the ensemble in SHAP TreeExplainer, which exploits the additive structure of tree models to compute exact Shapley values rather than approximations. Per-prediction explanations surface the top contributing features as a horizontal bar chart with positive (risk-increasing) and negative (risk-decreasing) contributions colour-coded.
Deployed the model behind a Flask REST API with a React.js frontend that lets clinicians input patient features and get back a risk score plus the feature attribution chart in real time. The chart is the actual product — the score alone wouldn't have been adopted.
Co-authored a Springer book chapter and presented the work at ICSMAI 2024 in Casablanca, Morocco.
Key technical decisions
— Ensemble over single model
Random Forest and Gradient Boosting fail differently. RF over-fits less on noisy features; GBM corrects RF's smooth-loss bias on boundary cases. Soft voting picked up the gain from each without the variance hit of stacking.
— SHAP over LIME
LIME's local linear approximations are fast but unstable across runs on the same input. SHAP TreeExplainer gives exact Shapley values for tree models, which means a clinician asking 'why this score?' gets the same answer twice. Reproducibility is non-negotiable for clinical use.
— TreeExplainer specifically
KernelSHAP is model-agnostic but slow and approximate. TreeExplainer leverages tree structure for exact computation in polynomial time, so per-prediction explanations stay sub-second even at the API layer.
— Flask + React over a notebook prototype
A notebook would have been faster for the conference paper. The deployed API forced production discipline — serialisation, input validation, error handling — that surfaced a feature-encoding bug the notebook had silently absorbed.
What it delivers
~93% classification accuracy on stratified validation, with sub-second per-prediction SHAP explanations served via the REST API. Feature attributions consistently surfaced clinically meaningful drivers (glucose, BMI, age) as the top contributors, which became the basis for clinician trust during pilot review.
Work was peer-reviewed and presented at ICSMAI 2024 in Casablanca, Morocco, with a Springer book chapter publication tied to the conference.
What I'd do next
For clinical deployment beyond a paper, the next blockers are calibration and population shift: 93% on a single curated dataset doesn't mean 93% on a different hospital's intake. The model needs Platt-scaled probabilities and a population-shift detector before it's safe at the bedside.
/* TODO: Hammad — add a reflection on the dataset (Pima or other), what you'd want to redo with hindsight, and how you'd partner with a clinical team to validate properly. */
Continue reading
FinLaw-UK
Graph-augmented RAG for UK financial regulation
Jobzyl
Unified job-search aggregator with ATS resume matching