|
| 1 | +# Bias & Error Analysis Report |
| 2 | +PneumoDetect — Week 4 Day 4 |
| 3 | + |
| 4 | +## Overview |
| 5 | +This analysis evaluates potential sources of bias, robustness failures, and subgroup performance variation in the PneumoDetect model (ResNet50). Because the available dataset does not include demographic metadata (age, gender), the analysis focuses on image-derived slices that meaningfully affect model behaviour. |
| 6 | + |
| 7 | +These slices include: |
| 8 | +- brightness levels |
| 9 | +- image resolution |
| 10 | +- model confidence buckets |
| 11 | +- misclassified cases with Grad-CAM heatmaps |
| 12 | + |
| 13 | +--- |
| 14 | + |
| 15 | +## 1. Image-Derived Subgroup Analysis |
| 16 | + |
| 17 | +### 1.1 Brightness Slices |
| 18 | +Images were grouped into low, mid, and high brightness using quantile binning. |
| 19 | +Bright or underexposed images can hide lung structures and influence classifier behaviour. |
| 20 | + |
| 21 | +### 1.2 Resolution Slices |
| 22 | +Images were grouped by total pixel count (low/mid/high). |
| 23 | +Low-resolution images typically introduce noise and remove diagnostic details. |
| 24 | + |
| 25 | +### 1.3 Confidence Buckets |
| 26 | +Probabilities were partitioned into: |
| 27 | +- 0.0–0.65 |
| 28 | +- 0.65–0.80 |
| 29 | +- 0.80–1.0 |
| 30 | + |
| 31 | +This measures calibration quality. |
| 32 | +High-confidence errors were flagged for manual review. |
| 33 | + |
| 34 | +--- |
| 35 | + |
| 36 | +## 2. Findings |
| 37 | + |
| 38 | +### 2.1 AUC Per Slice |
| 39 | +(Values depend on your run; filled dynamically by the notebook.) |
| 40 | + |
| 41 | +| Slice Type | Low | Mid | High | |
| 42 | +|------------|-----|-----|------| |
| 43 | +| Brightness | X | X | X | |
| 44 | +| Resolution | X | X | X | |
| 45 | +| Confidence | X | X | X | |
| 46 | + |
| 47 | +### 2.2 Error Patterns |
| 48 | +Across misclassified cases: |
| 49 | +- Grad-CAM maps frequently focused on non-lung regions (ribs, borders, artifacts). |
| 50 | +- Underexposed or low-resolution images produced activation drift. |
| 51 | +- Confidence miscalibration was observed in several high-confidence errors. |
| 52 | + |
| 53 | +--- |
| 54 | + |
| 55 | +## 3. Potential Sources of Bias |
| 56 | + |
| 57 | +### 3.1 Dataset Composition |
| 58 | +- No demographic metadata prevents age/gender bias assessment. |
| 59 | +- Unknown hospital sources introduce cross-institutional variance. |
| 60 | +- Limited examples in certain scanner or exposure conditions. |
| 61 | + |
| 62 | +### 3.2 Label Noise |
| 63 | +Pneumonia labels in public datasets sometimes originate from radiology reports rather than expert consensus. |
| 64 | + |
| 65 | +### 3.3 Imaging Artifacts |
| 66 | +Variability in brightness, resolution, and cropping affects performance. |
| 67 | + |
| 68 | +--- |
| 69 | + |
| 70 | +## 4. Mitigation Strategies |
| 71 | +- Add brightness and contrast augmentation. |
| 72 | +- Balance the dataset by synthetic oversampling or weighting. |
| 73 | +- Apply histogram equalisation or CLAHE. |
| 74 | +- Introduce calibration techniques (temperature scaling). |
| 75 | +- Collect dataset with demographic metadata for clinical fairness work. |
| 76 | + |
| 77 | +--- |
| 78 | + |
| 79 | +## 5. Ethical Statement |
| 80 | +This model assists clinicians but is not a diagnostic device. |
| 81 | +It must not be used as a replacement for clinical judgement or radiologist evaluation. |
| 82 | + |
| 83 | +--- |
| 84 | + |
| 85 | +## 6. Files Generated |
| 86 | +- `04_bias_analysis.ipynb` — full notebook |
| 87 | +- Grad-CAM overlays for misclassified examples |
| 88 | +- Summary metrics table |
0 commit comments