Skip to content

Commit a521a35

Browse files
committed
Add Bias & Error Analysis Report
1 parent acca956 commit a521a35

File tree

1 file changed

+88
-0
lines changed

1 file changed

+88
-0
lines changed

notebooks/docs/bias_analysis.md

Lines changed: 88 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,88 @@
1+
# Bias & Error Analysis Report
2+
PneumoDetect — Week 4 Day 4
3+
4+
## Overview
5+
This analysis evaluates potential sources of bias, robustness failures, and subgroup performance variation in the PneumoDetect model (ResNet50). Because the available dataset does not include demographic metadata (age, gender), the analysis focuses on image-derived slices that meaningfully affect model behaviour.
6+
7+
These slices include:
8+
- brightness levels
9+
- image resolution
10+
- model confidence buckets
11+
- misclassified cases with Grad-CAM heatmaps
12+
13+
---
14+
15+
## 1. Image-Derived Subgroup Analysis
16+
17+
### 1.1 Brightness Slices
18+
Images were grouped into low, mid, and high brightness using quantile binning.
19+
Bright or underexposed images can hide lung structures and influence classifier behaviour.
20+
21+
### 1.2 Resolution Slices
22+
Images were grouped by total pixel count (low/mid/high).
23+
Low-resolution images typically introduce noise and remove diagnostic details.
24+
25+
### 1.3 Confidence Buckets
26+
Probabilities were partitioned into:
27+
- 0.0–0.65
28+
- 0.65–0.80
29+
- 0.80–1.0
30+
31+
This measures calibration quality.
32+
High-confidence errors were flagged for manual review.
33+
34+
---
35+
36+
## 2. Findings
37+
38+
### 2.1 AUC Per Slice
39+
(Values depend on your run; filled dynamically by the notebook.)
40+
41+
| Slice Type | Low | Mid | High |
42+
|------------|-----|-----|------|
43+
| Brightness | X | X | X |
44+
| Resolution | X | X | X |
45+
| Confidence | X | X | X |
46+
47+
### 2.2 Error Patterns
48+
Across misclassified cases:
49+
- Grad-CAM maps frequently focused on non-lung regions (ribs, borders, artifacts).
50+
- Underexposed or low-resolution images produced activation drift.
51+
- Confidence miscalibration was observed in several high-confidence errors.
52+
53+
---
54+
55+
## 3. Potential Sources of Bias
56+
57+
### 3.1 Dataset Composition
58+
- No demographic metadata prevents age/gender bias assessment.
59+
- Unknown hospital sources introduce cross-institutional variance.
60+
- Limited examples in certain scanner or exposure conditions.
61+
62+
### 3.2 Label Noise
63+
Pneumonia labels in public datasets sometimes originate from radiology reports rather than expert consensus.
64+
65+
### 3.3 Imaging Artifacts
66+
Variability in brightness, resolution, and cropping affects performance.
67+
68+
---
69+
70+
## 4. Mitigation Strategies
71+
- Add brightness and contrast augmentation.
72+
- Balance the dataset by synthetic oversampling or weighting.
73+
- Apply histogram equalisation or CLAHE.
74+
- Introduce calibration techniques (temperature scaling).
75+
- Collect dataset with demographic metadata for clinical fairness work.
76+
77+
---
78+
79+
## 5. Ethical Statement
80+
This model assists clinicians but is not a diagnostic device.
81+
It must not be used as a replacement for clinical judgement or radiologist evaluation.
82+
83+
---
84+
85+
## 6. Files Generated
86+
- `04_bias_analysis.ipynb` — full notebook
87+
- Grad-CAM overlays for misclassified examples
88+
- Summary metrics table

0 commit comments

Comments
 (0)