Publishedml

DiabetesSense

93% accurate clinical risk scoring with SHAP interpretability

Timeline: Jan — Jul 2024
Role: Solo ML capstone, COMSATS University Islamabad
Status: Published
Primary stack: scikit-learn · SHAP · React.js · Flask

By the numbers

Headline metrics

93%

Classification accuracy

SHAP

Interpretability

ICSMAI

Published & presented

The problem

What this project tackles

Clinical prediction models live or die by interpretability. A black-box classifier can hit 95% accuracy and still be useless if a clinician can't see why a particular patient was flagged. Diabetes risk already has good baseline accuracy from logistic regression and tree ensembles, so the real research question wasn't 'can we predict?' but 'can we predict and explain in a way clinicians will actually trust?'

Adoption literature on clinical ML is consistent: when clinicians can't trace a prediction back to features they recognise, they reject the tool — even when the tool is more accurate than their own judgement. The interpretability layer isn't optional polish; it's the load-bearing part.

Approach

System design

Built on a public diabetes risk dataset with stratified k-fold splits to handle class imbalance. Trained two complementary tree models — Random Forest for variance reduction across heterogeneous feature interactions, Gradient Boosting for sequential refinement on hard examples — and combined them via soft voting. Hyperparameter search via grid search with CV-internal validation.

Wrapped the ensemble in SHAP TreeExplainer, which exploits the additive structure of tree models to compute exact Shapley values rather than approximations. Per-prediction explanations surface the top contributing features as a horizontal bar chart with positive (risk-increasing) and negative (risk-decreasing) contributions colour-coded.

Deployed the model behind a Flask REST API with a React.js frontend that lets clinicians input patient features and get back a risk score plus the feature attribution chart in real time. The chart is the actual product — the score alone wouldn't have been adopted.

Co-authored a Springer book chapter and presented the work at ICSMAI 2024 in Casablanca, Morocco.

Engineering

Key technical decisions

— Ensemble over single model

Random Forest and Gradient Boosting fail differently. RF over-fits less on noisy features; GBM corrects RF's smooth-loss bias on boundary cases. Soft voting picked up the gain from each without the variance hit of stacking.

— SHAP over LIME

LIME's local linear approximations are fast but unstable across runs on the same input. SHAP TreeExplainer gives exact Shapley values for tree models, which means a clinician asking 'why this score?' gets the same answer twice. Reproducibility is non-negotiable for clinical use.

— TreeExplainer specifically

KernelSHAP is model-agnostic but slow and approximate. TreeExplainer leverages tree structure for exact computation in polynomial time, so per-prediction explanations stay sub-second even at the API layer.

— Flask + React over a notebook prototype

A notebook would have been faster for the conference paper. The deployed API forced production discipline — serialisation, input validation, error handling — that surfaced a feature-encoding bug the notebook had silently absorbed.

Results

What it delivers

~93% classification accuracy on stratified validation, with sub-second per-prediction SHAP explanations served via the REST API. Feature attributions consistently surfaced clinically meaningful drivers (glucose, BMI, age) as the top contributors, which became the basis for clinician trust during pilot review.

Work was peer-reviewed and presented at ICSMAI 2024 in Casablanca, Morocco, with a Springer book chapter publication tied to the conference.

Reflections

What I'd do next

For clinical deployment beyond a paper, the next blockers are calibration and population shift: 93% on a single curated dataset doesn't mean 93% on a different hospital's intake. The model needs Platt-scaled probabilities and a population-shift detector before it's safe at the bedside.

/* TODO: Hammad — add a reflection on the dataset (Pima or other), what you'd want to redo with hindsight, and how you'd partner with a clinical team to validate properly. */

Other case studies

Continue reading

Research

FinLaw-UK

Graph-augmented RAG for UK financial regulation

+19%Answer accuracy

0.76RAGAS faithfulness

Mistral 7BNeo4jSentence TransformersRAGAS

Read case study

Shipped

Jobzyl

Unified job-search aggregator with ATS resume matching

6Job boards aggregated

11RLS-locked tables

Next.jsSupabaseFastAPIAWS

Read case study

All projects

Publishedml

DiabetesSense

93% accurate clinical risk scoring with SHAP interpretability

Timeline: Jan — Jul 2024
Role: Solo ML capstone, COMSATS University Islamabad
Status: Published
Primary stack: scikit-learn · SHAP · React.js · Flask

By the numbers

Headline metrics

93%

Classification accuracy

SHAP

Interpretability

ICSMAI

Published & presented

The problem

What this project tackles

Approach

System design

Co-authored a Springer book chapter and presented the work at ICSMAI 2024 in Casablanca, Morocco.

Engineering

Key technical decisions

— Ensemble over single model

— SHAP over LIME

— TreeExplainer specifically

— Flask + React over a notebook prototype

Results

What it delivers

Work was peer-reviewed and presented at ICSMAI 2024 in Casablanca, Morocco, with a Springer book chapter publication tied to the conference.

Reflections

What I'd do next

/* TODO: Hammad — add a reflection on the dataset (Pima or other), what you'd want to redo with hindsight, and how you'd partner with a clinical team to validate properly. */

Other case studies

Continue reading

Research

FinLaw-UK

Graph-augmented RAG for UK financial regulation

+19%Answer accuracy

0.76RAGAS faithfulness

Mistral 7BNeo4jSentence TransformersRAGAS

Read case study

Shipped

Jobzyl

Unified job-search aggregator with ATS resume matching

6Job boards aggregated

11RLS-locked tables

Next.jsSupabaseFastAPIAWS

Read case study

All projects