Skip to content

Releases: v-dvorak/omr-layout-analysis

NotAX model v2.0 2025-02-09

09 Feb 13:15
f87cf7d
Compare
Choose a tag to compare

Model for notehead and accidental detection. Trained with added zoom in/out augmentation. Uses the largest YOLOv11 model available (yolo11x), see YOLO detection docs for reference.

NotAX v2.0 F1 scores over multiple IoU thresholds

ID \ IoU 0.50 0.55 0.60 0.65 0.70 0.75 0.80 0.85 0.90 0.95 1.0
noteheadFull 0.9873 0.9873 0.9850 0.9816 0.9720 0.9443 0.8759 0.7436 0.4648 0.1838 0.0498
noteheadHalf 0.9827 0.9827 0.9827 0.9827 0.9827 0.9827 0.9689 0.9550 0.8235 0.4221 0.0277
accidentalFlat 0.5161 0.5161 0.5161 0.5161 0.5161 0.5161 0.5161 0.4516 0.3226 0.1290 0.0000
accidentalNatural 0.8980 0.8980 0.8980 0.8571 0.8571 0.8163 0.8163 0.7347 0.6122 0.2449 0.0000
accidentalSharp 0.8996 0.8996 0.8922 0.8922 0.8848 0.8773 0.8625 0.8030 0.6989 0.3197 0.0149

large-model- fscores

NotA model v3.0 2025-02-27

27 Feb 18:40
f87cf7d
Compare
Choose a tag to compare

Model for notehead and accidental detection. Trained with added zoom in/out augmentations. Compared to previous models, a new training dataset has been added - MuscimaSharp. TODO: add reference to MuscimaSharp

ID \ IoU 0.50 0.55 0.60 0.65 0.70 0.75 0.80 0.85 0.90 0.95 1.00
noteheadFull 0.9892 0.9887 0.9842 0.9762 0.9570 0.9128 0.8381 0.7102 0.4527 0.1986 0.0470
noteheadHalf 0.9896 0.9896 0.9896 0.9896 0.9896 0.9896 0.9758 0.9204 0.8235 0.4360 0.0277
accidentalFlat 0.5185 0.5185 0.5185 0.5185 0.5185 0.5185 0.5185 0.5185 0.2963 0.0000 0.0000
accidentalNatural 0.9565 0.9565 0.9565 0.9130 0.9130 0.9130 0.8261 0.7391 0.4783 0.2609 0.0000
accidentalSharp 0.9382 0.9382 0.9382 0.9309 0.9091 0.9018 0.8509 0.7491 0.6473 0.2545 0.0145

new-training-dataset

NotA model v2.0 2025-02-07

07 Feb 18:14
f87cf7d
Compare
Choose a tag to compare

Model for notehead and accidental detection. Trained on the same data as the previous version with added zoom in/out augmentation.

NotA v2.0 F1 scores over multiple IoU thresholds

ID \ IoU 0.50 0.55 0.60 0.65 0.70 0.75 0.80 0.85 0.90 0.95 1.0
noteheadFull 0.9842 0.9831 0.9803 0.9730 0.9623 0.9268 0.8558 0.7290 0.4738 0.1932 0.0400
noteheadHalf 0.9896 0.9896 0.9896 0.9896 0.9896 0.9896 0.9758 0.9481 0.8720 0.4083 0.0554
accidentalFlat 0.5517 0.5517 0.5517 0.5517 0.5517 0.5517 0.4828 0.4828 0.3448 0.2759 0.0690
accidentalNatural 0.9787 0.9787 0.9362 0.8936 0.8936 0.8936 0.8085 0.7660 0.5957 0.1702 0.0000
accidentalSharp 0.8971 0.8971 0.8897 0.8897 0.8897 0.8529 0.8382 0.7794 0.6324 0.2794 0.0074

NotA v0.1 F1 scores over multiple IoU thresholds

ID \ IoU 0.50 0.55 0.60 0.65 0.70 0.75 0.80 0.85 0.90 0.95 1.0
noteheadFull 0.9687 0.9687 0.9675 0.9631 0.9531 0.9326 0.8860 0.7695 0.5476 0.2424 0.0610
noteheadHalf 0.9463 0.9463 0.9463 0.9463 0.9463 0.9463 0.9463 0.9396 0.8591 0.4698 0.0470

NotA model v0.1 2025-02-04

04 Feb 16:53
f87cf7d
Compare
Choose a tag to compare

Model for notehead detection.

NotA evaluation dataset specs

09 Feb 13:25
f87cf7d
Compare
Choose a tag to compare

The evaluation dataset contains eight randomly selected images.

Average number of annotations per page

ID Name Mean Stddev
0 noteheadFull 220.50 139.19
1 noteheadHalf 18.00 19.71
2 accidentalFlat 1.00 2.14
3 accidentalNatural 2.88 3.09
4 accidentalSharp 18.00 18.02

Average annotation sizes per page

ID Name Mean Stddev
0 small 225.50 141.51
1 medium 34.88 32.05
2 large 0.00 0.00

Annotation center heat map

test-dataset-heatmap

Annotation relative width and height heat map

test-dataset-heatmap-wh

OLA model v1.0 2024-10-10

04 Oct 20:47
Compare
Choose a tag to compare

The model file comes with a set of arguments that were used to create it.

OLA model v0.9-2024-08-28

28 Aug 07:11
Compare
Choose a tag to compare

The model file comes with a set of arguments that were used to create it.

  • TODO demo interference

Evaluation at release

04 Oct 15:24
Compare
Choose a tag to compare

We compare the YOLOv8m model with the Faster R-CNN model implemented using TensorFlow, previously utilized for a measure detector by A. Pacha.

Three out-of-domain tests and one in domain (using 90/10 train/test split) tests were performed to evaluate models' performance and the results were measured with three metrics - recall, mAP50 and mAP50-90. We used the pycocotools Python library to calculate these metrics.

Out-of-domain evaluation

id AudioLabs v2 Muscima++ OSLiC MZKBlank
1
2
3

In-domain evaluation

id AudioLabs v2 Muscima++ OSLiC MZKBlank
4

(90/10 train/test split)

Results

Test 1

Class Instances Pacha's R-CNN YOLOv8m
Recall mAP50 mAP50-95 Recall mAP50 mAP50-95
System measures 72,028 0.620 0.727 0.507 0.679 0.554 0.571
Stave measures 220,868 0.278 0.678 0.204 0.336 0.580 0.249
Staves 55,038 0.355 0.921 0.295 0.430 0.829 0.334
Systems 17,991 0.736 0.945 0.697 0.965 0.978 0.949
Grand staff 17,959 0.744 0.982 0.701 0.815 0.901 0.792
All 383,884 0.547 0.851 0.481 0.654 0.790 0.579

Test 2

Class Instances Pacha's R-CNN YOLOv8m
Recall mAP50 mAP50-95 Recall mAP50 mAP50-95
System measures 24,186 0.864 0.989 0.827 0.804 0.934 0.770
Stave measures 50,064 0.565 0.976 0.494 0.596 0.921 0.535
Staves 11,143 0.581 0.939 0.511 0.643 0.939 0.584
Systems 5,376 0.873 0.989 0.832 0.892 0.960 0.860
Grand staff 5,375 0.763 0.973 0.699 0.893 0.960 0.859
All 96,144 0.729 0.973 0.673 0.766 0.943 0.722

Test 3

Class Instances Pacha's R-CNN YOLOv8m
Recall mAP50 mAP50-95 Recall mAP50 mAP50-95
System measures 2,888 0.217 0.256 0.140 0.166 0.153 0.123
Stave measures 4,616 0.062 0.196 0.026 0.243 0.420 0.174
Staves 883 0.045 0.061 0.008 0.416 0.723 0.329
Systems 484 0.173 0.237 0.111 0.191 0.192 0.140
Grand staff 94 0.369 0.393 0.164 0.889 0.758 0.747
All 8,965 0.173 0.229 0.090 0.381 0.449 0.303

Test 4

Class Instances Pacha's R-CNN YOLOv8m
Recall mAP50 mAP50-95 Recall mAP50 mAP50-95
System measures 9,151 0.962 0.989 0.943 0.980 0.987 0.975
Stave measures 27,294 0.876 0.979 0.831 0.946 0.989 0.930
Staves 6,816 0.888 0.980 0.854 0.900 0.989 0.888
Systems 2,326 0.963 0.990 0.947 0.993 0.990 0.986
Grand staff 2,285 0.949 0.996 0.931 0.996 1.000 0.993
All 47,872 0.927 0.987 0.901 0.960 0.991 0.954

Full Changelog: Datasets...evaluation-release

Datasets at release

28 Aug 07:06
Compare
Choose a tag to compare

The final dataset is split into four logical parts:

  • AudioLabs v2
  • Muscima++
  • OSLiC
  • MZKBlank

Due to GitHub's restrictions on file size, the OSLiC dataset is split into two parts. OSLiC in COCO format keeps the same folder structure as the original dataset.

Quick Start

To train a YOLO model on the datasets, download all archives that are not tagged with COCO and combine them into one. When setting up the training pass the config.yaml file as an argument to to the script.

Dataset Overview

images system measures stave measures staves systems grand staves
AudioLabs v2 940 24 186 50 064 11 143 5 376 5 375
Muscima++ 140 2 888 4 616 883 484 94
OSLiC 4 927 72 028 220 868 55 038 17 991 17 959
MZKBlank 1 006 0 0 0 0 0
total 7 013 99 102 275 548 67 064 23851 23 428

COCO format

zip/
    img/     ... all images
    json/  ... corresponding labels in COCO format
{
 "width": 3483,
 "height": 1693,
 "system_measures": [
  {
   "left": 211,
   "top": 726,
   "width": 701,
   "height": 120
  },
...

YOLO format

zip/
    images/    ... all images
    labels/      ... corresponding labels in YOLO format

The *.txt file is formatted with one row per object in class x_center y_center width height format. Box coordinates must be in normalized xywh format (from 0 to 1).

0	0.163365	0.429003	0.205570	0.090634
0	0.328309	0.429003	0.112834	0.090634
0	0.462245	0.429003	0.138961	0.090634
0	0.598048	0.429003	0.124605	0.090634
0	0.741746	0.429003	0.150158	0.090634
0	0.889176	0.429003	0.136090	0.090634