Releases: v-dvorak/omr-layout-analysis
NotAX model v2.0 2025-02-09
Model for notehead and accidental detection. Trained with added zoom in/out augmentation. Uses the largest YOLOv11 model available (yolo11x), see YOLO detection docs for reference.
NotAX v2.0 F1 scores over multiple IoU thresholds
ID \ IoU | 0.50 | 0.55 | 0.60 | 0.65 | 0.70 | 0.75 | 0.80 | 0.85 | 0.90 | 0.95 | 1.0 |
---|---|---|---|---|---|---|---|---|---|---|---|
noteheadFull | 0.9873 | 0.9873 | 0.9850 | 0.9816 | 0.9720 | 0.9443 | 0.8759 | 0.7436 | 0.4648 | 0.1838 | 0.0498 |
noteheadHalf | 0.9827 | 0.9827 | 0.9827 | 0.9827 | 0.9827 | 0.9827 | 0.9689 | 0.9550 | 0.8235 | 0.4221 | 0.0277 |
accidentalFlat | 0.5161 | 0.5161 | 0.5161 | 0.5161 | 0.5161 | 0.5161 | 0.5161 | 0.4516 | 0.3226 | 0.1290 | 0.0000 |
accidentalNatural | 0.8980 | 0.8980 | 0.8980 | 0.8571 | 0.8571 | 0.8163 | 0.8163 | 0.7347 | 0.6122 | 0.2449 | 0.0000 |
accidentalSharp | 0.8996 | 0.8996 | 0.8922 | 0.8922 | 0.8848 | 0.8773 | 0.8625 | 0.8030 | 0.6989 | 0.3197 | 0.0149 |
NotA model v3.0 2025-02-27
Model for notehead and accidental detection. Trained with added zoom in/out augmentations. Compared to previous models, a new training dataset has been added - MuscimaSharp. TODO: add reference to MuscimaSharp
ID \ IoU | 0.50 | 0.55 | 0.60 | 0.65 | 0.70 | 0.75 | 0.80 | 0.85 | 0.90 | 0.95 | 1.00 |
---|---|---|---|---|---|---|---|---|---|---|---|
noteheadFull | 0.9892 | 0.9887 | 0.9842 | 0.9762 | 0.9570 | 0.9128 | 0.8381 | 0.7102 | 0.4527 | 0.1986 | 0.0470 |
noteheadHalf | 0.9896 | 0.9896 | 0.9896 | 0.9896 | 0.9896 | 0.9896 | 0.9758 | 0.9204 | 0.8235 | 0.4360 | 0.0277 |
accidentalFlat | 0.5185 | 0.5185 | 0.5185 | 0.5185 | 0.5185 | 0.5185 | 0.5185 | 0.5185 | 0.2963 | 0.0000 | 0.0000 |
accidentalNatural | 0.9565 | 0.9565 | 0.9565 | 0.9130 | 0.9130 | 0.9130 | 0.8261 | 0.7391 | 0.4783 | 0.2609 | 0.0000 |
accidentalSharp | 0.9382 | 0.9382 | 0.9382 | 0.9309 | 0.9091 | 0.9018 | 0.8509 | 0.7491 | 0.6473 | 0.2545 | 0.0145 |
NotA model v2.0 2025-02-07
Model for notehead and accidental detection. Trained on the same data as the previous version with added zoom in/out augmentation.
NotA v2.0 F1 scores over multiple IoU thresholds
ID \ IoU | 0.50 | 0.55 | 0.60 | 0.65 | 0.70 | 0.75 | 0.80 | 0.85 | 0.90 | 0.95 | 1.0 |
---|---|---|---|---|---|---|---|---|---|---|---|
noteheadFull | 0.9842 | 0.9831 | 0.9803 | 0.9730 | 0.9623 | 0.9268 | 0.8558 | 0.7290 | 0.4738 | 0.1932 | 0.0400 |
noteheadHalf | 0.9896 | 0.9896 | 0.9896 | 0.9896 | 0.9896 | 0.9896 | 0.9758 | 0.9481 | 0.8720 | 0.4083 | 0.0554 |
accidentalFlat | 0.5517 | 0.5517 | 0.5517 | 0.5517 | 0.5517 | 0.5517 | 0.4828 | 0.4828 | 0.3448 | 0.2759 | 0.0690 |
accidentalNatural | 0.9787 | 0.9787 | 0.9362 | 0.8936 | 0.8936 | 0.8936 | 0.8085 | 0.7660 | 0.5957 | 0.1702 | 0.0000 |
accidentalSharp | 0.8971 | 0.8971 | 0.8897 | 0.8897 | 0.8897 | 0.8529 | 0.8382 | 0.7794 | 0.6324 | 0.2794 | 0.0074 |
NotA v0.1 F1 scores over multiple IoU thresholds
ID \ IoU | 0.50 | 0.55 | 0.60 | 0.65 | 0.70 | 0.75 | 0.80 | 0.85 | 0.90 | 0.95 | 1.0 |
---|---|---|---|---|---|---|---|---|---|---|---|
noteheadFull | 0.9687 | 0.9687 | 0.9675 | 0.9631 | 0.9531 | 0.9326 | 0.8860 | 0.7695 | 0.5476 | 0.2424 | 0.0610 |
noteheadHalf | 0.9463 | 0.9463 | 0.9463 | 0.9463 | 0.9463 | 0.9463 | 0.9463 | 0.9396 | 0.8591 | 0.4698 | 0.0470 |
NotA model v0.1 2025-02-04
Model for notehead detection.
NotA evaluation dataset specs
The evaluation dataset contains eight randomly selected images.
Average number of annotations per page
ID | Name | Mean | Stddev |
---|---|---|---|
0 | noteheadFull | 220.50 | 139.19 |
1 | noteheadHalf | 18.00 | 19.71 |
2 | accidentalFlat | 1.00 | 2.14 |
3 | accidentalNatural | 2.88 | 3.09 |
4 | accidentalSharp | 18.00 | 18.02 |
Average annotation sizes per page
ID | Name | Mean | Stddev |
---|---|---|---|
0 | small | 225.50 | 141.51 |
1 | medium | 34.88 | 32.05 |
2 | large | 0.00 | 0.00 |
Annotation center heat map
Annotation relative width and height heat map
OLA model v1.0 2024-10-10
The model file comes with a set of arguments that were used to create it.
- Model performance evaluation at release
- TODO demo interference
OLA model v0.9-2024-08-28
The model file comes with a set of arguments that were used to create it.
- TODO demo interference
Evaluation at release
We compare the YOLOv8m model with the Faster R-CNN model implemented using TensorFlow, previously utilized for a measure detector by A. Pacha.
Three out-of-domain tests and one in domain (using 90/10 train/test split) tests were performed to evaluate models' performance and the results were measured with three metrics - recall, mAP50 and mAP50-90. We used the pycocotools
Python library to calculate these metrics.
Out-of-domain evaluation
id | AudioLabs v2 | Muscima++ | OSLiC | MZKBlank |
---|---|---|---|---|
1 | ✅ | ✅ | ❌ | ✅ |
2 | ❌ | ✅ | ✅ | ✅ |
3 | ✅ | ❌ | ✅ | ✅ |
In-domain evaluation
id | AudioLabs v2 | Muscima++ | OSLiC | MZKBlank |
---|---|---|---|---|
4 | ✅ | ✅ | ✅ | ✅ |
(90/10 train/test split)
Results
Test 1
Class | Instances | Pacha's R-CNN | YOLOv8m | ||||
---|---|---|---|---|---|---|---|
Recall | mAP50 | mAP50-95 | Recall | mAP50 | mAP50-95 | ||
System measures | 72,028 | 0.620 | 0.727 | 0.507 | 0.679 | 0.554 | 0.571 |
Stave measures | 220,868 | 0.278 | 0.678 | 0.204 | 0.336 | 0.580 | 0.249 |
Staves | 55,038 | 0.355 | 0.921 | 0.295 | 0.430 | 0.829 | 0.334 |
Systems | 17,991 | 0.736 | 0.945 | 0.697 | 0.965 | 0.978 | 0.949 |
Grand staff | 17,959 | 0.744 | 0.982 | 0.701 | 0.815 | 0.901 | 0.792 |
All | 383,884 | 0.547 | 0.851 | 0.481 | 0.654 | 0.790 | 0.579 |
Test 2
Class | Instances | Pacha's R-CNN | YOLOv8m | ||||
---|---|---|---|---|---|---|---|
Recall | mAP50 | mAP50-95 | Recall | mAP50 | mAP50-95 | ||
System measures | 24,186 | 0.864 | 0.989 | 0.827 | 0.804 | 0.934 | 0.770 |
Stave measures | 50,064 | 0.565 | 0.976 | 0.494 | 0.596 | 0.921 | 0.535 |
Staves | 11,143 | 0.581 | 0.939 | 0.511 | 0.643 | 0.939 | 0.584 |
Systems | 5,376 | 0.873 | 0.989 | 0.832 | 0.892 | 0.960 | 0.860 |
Grand staff | 5,375 | 0.763 | 0.973 | 0.699 | 0.893 | 0.960 | 0.859 |
All | 96,144 | 0.729 | 0.973 | 0.673 | 0.766 | 0.943 | 0.722 |
Test 3
Class | Instances | Pacha's R-CNN | YOLOv8m | ||||
---|---|---|---|---|---|---|---|
Recall | mAP50 | mAP50-95 | Recall | mAP50 | mAP50-95 | ||
System measures | 2,888 | 0.217 | 0.256 | 0.140 | 0.166 | 0.153 | 0.123 |
Stave measures | 4,616 | 0.062 | 0.196 | 0.026 | 0.243 | 0.420 | 0.174 |
Staves | 883 | 0.045 | 0.061 | 0.008 | 0.416 | 0.723 | 0.329 |
Systems | 484 | 0.173 | 0.237 | 0.111 | 0.191 | 0.192 | 0.140 |
Grand staff | 94 | 0.369 | 0.393 | 0.164 | 0.889 | 0.758 | 0.747 |
All | 8,965 | 0.173 | 0.229 | 0.090 | 0.381 | 0.449 | 0.303 |
Test 4
Class | Instances | Pacha's R-CNN | YOLOv8m | ||||
---|---|---|---|---|---|---|---|
Recall | mAP50 | mAP50-95 | Recall | mAP50 | mAP50-95 | ||
System measures | 9,151 | 0.962 | 0.989 | 0.943 | 0.980 | 0.987 | 0.975 |
Stave measures | 27,294 | 0.876 | 0.979 | 0.831 | 0.946 | 0.989 | 0.930 |
Staves | 6,816 | 0.888 | 0.980 | 0.854 | 0.900 | 0.989 | 0.888 |
Systems | 2,326 | 0.963 | 0.990 | 0.947 | 0.993 | 0.990 | 0.986 |
Grand staff | 2,285 | 0.949 | 0.996 | 0.931 | 0.996 | 1.000 | 0.993 |
All | 47,872 | 0.927 | 0.987 | 0.901 | 0.960 | 0.991 | 0.954 |
Full Changelog: Datasets...evaluation-release
Datasets at release
The final dataset is split into four logical parts:
- AudioLabs v2
- Muscima++
- OSLiC
- MZKBlank
Due to GitHub's restrictions on file size, the OSLiC dataset is split into two parts. OSLiC in COCO format keeps the same folder structure as the original dataset.
Quick Start
To train a YOLO model on the datasets, download all archives that are not tagged with COCO and combine them into one. When setting up the training pass the config.yaml
file as an argument to to the script.
Dataset Overview
images | system measures | stave measures | staves | systems | grand staves | |
---|---|---|---|---|---|---|
AudioLabs v2 | 940 | 24 186 | 50 064 | 11 143 | 5 376 | 5 375 |
Muscima++ | 140 | 2 888 | 4 616 | 883 | 484 | 94 |
OSLiC | 4 927 | 72 028 | 220 868 | 55 038 | 17 991 | 17 959 |
MZKBlank | 1 006 | 0 | 0 | 0 | 0 | 0 |
total | 7 013 | 99 102 | 275 548 | 67 064 | 23851 | 23 428 |
COCO format
zip/
img/ ... all images
json/ ... corresponding labels in COCO format
{
"width": 3483,
"height": 1693,
"system_measures": [
{
"left": 211,
"top": 726,
"width": 701,
"height": 120
},
...
YOLO format
zip/
images/ ... all images
labels/ ... corresponding labels in YOLO format
The *.txt file is formatted with one row per object in class x_center y_center width height format. Box coordinates must be in normalized xywh format (from 0 to 1).
0 0.163365 0.429003 0.205570 0.090634
0 0.328309 0.429003 0.112834 0.090634
0 0.462245 0.429003 0.138961 0.090634
0 0.598048 0.429003 0.124605 0.090634
0 0.741746 0.429003 0.150158 0.090634
0 0.889176 0.429003 0.136090 0.090634