Skip to content

KeyError: 85 when training YOLO11n #562

@antoniodourado

Description

@antoniodourado

Describe the Bug

I'm'trying to train the YOLO 11n model with CometML On. However, after the first epoch, I get a "KeyError: 85" and the stack trace seems to lead to comet.

Expected behavior

It should've followed to the next epochs.

Where is the issue?

  • Comet Python SDK
  • Comet UI
  • Third Party Integrations (Huggingface, TensorboardX, Pytorch Lighting etc)

To Reproduce

Here is my python code:

import comet_ml
from ultralytics import YOLO

comet_ml.login(project_name='yolo11code_teste1')

model = YOLO("yolo11n.pt")

results = model.train(
    data="coco.yaml",
    project="yolo11code_teste1",
    batch=16,
    save_period=1,
    save_json=True,
    epochs=100,
    imgsz=320,
)

Stack Trace

If possible please include the full stack trace of your issue here

Ultralytics 8.3.59 🚀 Python-3.10.12 torch-2.0.0+cu117 CUDA:0 (NVIDIA GeForce RTX 3060 Laptop GPU, 5938MiB)
engine/trainer: task=detect, mode=train, model=yolo11n.pt, data=coco.yaml, epochs=100, time=None, patience=100, batch=16, imgsz=320, save=True, save_period=1, cache=False, device=None, workers=8, project=yolo11code_teste1, name=train14, exist_ok=False, pretrained=True, optimizer=auto, verbose=True, seed=0, deterministic=True, single_cls=False, rect=False, cos_lr=False, close_mosaic=10, resume=False, amp=True, fraction=1.0, profile=False, freeze=None, multi_scale=False, overlap_mask=True, mask_ratio=4, dropout=0.0, val=True, split=val, save_json=True, save_hybrid=False, conf=None, iou=0.7, max_det=300, half=False, dnn=False, plots=True, source=None, vid_stride=1, stream_buffer=False, visualize=False, augment=False, agnostic_nms=False, classes=None, retina_masks=False, embed=None, show=False, save_frames=False, save_txt=False, save_conf=False, save_crop=False, show_labels=True, show_conf=True, show_boxes=True, line_width=None, format=torchscript, keras=False, optimize=False, int8=False, dynamic=False, simplify=True, opset=None, workspace=None, nms=False, lr0=0.01, lrf=0.01, momentum=0.937, weight_decay=0.0005, warmup_epochs=3.0, warmup_momentum=0.8, warmup_bias_lr=0.1, box=7.5, cls=0.5, dfl=1.5, pose=12.0, kobj=1.0, nbs=64, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, degrees=0.0, translate=0.1, scale=0.5, shear=0.0, perspective=0.0, flipud=0.0, fliplr=0.5, bgr=0.0, mosaic=1.0, mixup=0.0, copy_paste=0.0, copy_paste_mode=flip, auto_augment=randaugment, erasing=0.4, crop_fraction=1.0, cfg=None, tracker=botsort.yaml, save_dir=yolo11code_teste1/train14

                   from  n    params  module                                       arguments                     
  0                  -1  1       464  ultralytics.nn.modules.conv.Conv             [3, 16, 3, 2]                 
  1                  -1  1      4672  ultralytics.nn.modules.conv.Conv             [16, 32, 3, 2]                
  2                  -1  1      6640  ultralytics.nn.modules.block.C3k2            [32, 64, 1, False, 0.25]      
  3                  -1  1     36992  ultralytics.nn.modules.conv.Conv             [64, 64, 3, 2]                
  4                  -1  1     26080  ultralytics.nn.modules.block.C3k2            [64, 128, 1, False, 0.25]     
  5                  -1  1    147712  ultralytics.nn.modules.conv.Conv             [128, 128, 3, 2]              
  6                  -1  1     87040  ultralytics.nn.modules.block.C3k2            [128, 128, 1, True]           
  7                  -1  1    295424  ultralytics.nn.modules.conv.Conv             [128, 256, 3, 2]              
  8                  -1  1    346112  ultralytics.nn.modules.block.C3k2            [256, 256, 1, True]           
  9                  -1  1    164608  ultralytics.nn.modules.block.SPPF            [256, 256, 5]                 
 10                  -1  1    249728  ultralytics.nn.modules.block.C2PSA           [256, 256, 1]                 
 11                  -1  1         0  torch.nn.modules.upsampling.Upsample         [None, 2, 'nearest']          
 12             [-1, 6]  1         0  ultralytics.nn.modules.conv.Concat           [1]                           
 13                  -1  1    111296  ultralytics.nn.modules.block.C3k2            [384, 128, 1, False]          
 14                  -1  1         0  torch.nn.modules.upsampling.Upsample         [None, 2, 'nearest']          
 15             [-1, 4]  1         0  ultralytics.nn.modules.conv.Concat           [1]                           
 16                  -1  1     32096  ultralytics.nn.modules.block.C3k2            [256, 64, 1, False]           
 17                  -1  1     36992  ultralytics.nn.modules.conv.Conv             [64, 64, 3, 2]                
 18            [-1, 13]  1         0  ultralytics.nn.modules.conv.Concat           [1]                           
 19                  -1  1     86720  ultralytics.nn.modules.block.C3k2            [192, 128, 1, False]          
 20                  -1  1    147712  ultralytics.nn.modules.conv.Conv             [128, 128, 3, 2]              
 21            [-1, 10]  1         0  ultralytics.nn.modules.conv.Concat           [1]                           
 22                  -1  1    378880  ultralytics.nn.modules.block.C3k2            [384, 256, 1, True]           
 23        [16, 19, 22]  1    464912  ultralytics.nn.modules.head.Detect           [80, [64, 128, 256]]          
YOLO11n summary: 319 layers, 2,624,080 parameters, 2,624,064 gradients, 6.6 GFLOPs

Transferred 499/499 items from pretrained weights
COMET WARNING: To get all data logged automatically, import comet_ml before the following modules: torch.
COMET WARNING: As you are running in a Jupyter environment, you will need to call `experiment.end()` when finished to ensure all metrics and code are logged before exiting.
COMET INFO: Experiment is live on comet.com https://www.comet.com/antoniodourado/yolo11code-teste1/512d631dea2143288b7b45553c89faa2

COMET INFO: Couldn't find a Git repository in '/home/dourado/ml/yolo11code' nor in any parent directory. Set `COMET_GIT_DIRECTORY` if your Git Repository is elsewhere.
Freezing layer 'model.23.dfl.conv.weight'
AMP: running Automatic Mixed Precision (AMP) checks...
AMP: checks passedtrain: Scanning /home/dourado/ml/datasets/coco/labels/train2017.cache... 117266 images, 1021 backgrounds, 0 corrupt: 100%|██████████| 118287/118287 [00:00<?, ?it/s]
val: Scanning /home/dourado/ml/datasets/coco/labels/val2017.cache... 4952 images, 48 backgrounds, 0 corrupt: 100%|██████████| 5000/5000 [00:00<?, ?it/s]
Plotting labels to yolo11code_teste1/train14/labels.jpg... 
optimizer: 'optimizer=auto' found, ignoring 'lr0=0.01' and 'momentum=0.937' and determining best 'optimizer', 'lr0' and 'momentum' automatically... 
optimizer: SGD(lr=0.01, momentum=0.9) with parameter groups 81 weight(decay=0.0), 88 weight(decay=0.0005), 87 bias(decay=0.0)
Image sizes 320 train, 320 val
Using 8 dataloader workers
Logging results to yolo11code_teste1/train14
Starting training for 100 epochs...

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
      1/100     0.856G      1.298      1.607       1.17        238        320: 100%|██████████| 7393/7393 [07:33<00:00, 16.31it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 157/157 [00:17<00:00,  8.78it/s]
                   all       5000      36335      0.538      0.332      0.345      0.232
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
/tmp/ipykernel_186493/1161498898.py in <module>
----> 1 results = model.train(
      2     data="coco.yaml",
      3     project="yolo11code_teste1",
      4     batch=16,
      5     save_period=1,

~/.local/lib/python3.10/site-packages/ultralytics/engine/model.py in train(self, trainer, **kwargs)
    804 
    805         self.trainer.hub_session = self.session  # attach optional HUB session
--> 806         self.trainer.train()
    807         # Update model and cfg after training
    808         if RANK in {-1, 0}:

~/.local/lib/python3.10/site-packages/ultralytics/engine/trainer.py in train(self)
    205 
    206         else:
--> 207             self._do_train(world_size)
    208 
    209     def _setup_scheduler(self):

~/.local/lib/python3.10/site-packages/ultralytics/engine/trainer.py in _do_train(self, world_size)
    451                 self.scheduler.last_epoch = self.epoch  # do not move
    452                 self.stop |= epoch >= self.epochs  # stop if exceeded epochs
--> 453             self.run_callbacks("on_fit_epoch_end")
    454             self._clear_memory()
    455 

~/.local/lib/python3.10/site-packages/ultralytics/engine/trainer.py in run_callbacks(self, event)
    166         """Run all existing callbacks associated with a particular event."""
    167         for callback in self.callbacks.get(event, []):
--> 168             callback(self)
    169 
    170     def train(self):

~/.local/lib/python3.10/site-packages/ultralytics/utils/callbacks/comet.py in on_fit_epoch_end(trainer)
    358         _log_confusion_matrix(experiment, trainer, curr_step, curr_epoch)
    359     if _should_log_image_predictions():
--> 360         _log_image_predictions(experiment, trainer.validator, curr_step)
    361 
    362 

~/.local/lib/python3.10/site-packages/ultralytics/utils/callbacks/comet.py in _log_image_predictions(experiment, validator, curr_step)
    261 
    262             image_path = Path(image_path)
--> 263             annotations = _fetch_annotations(
    264                 img_idx,
    265                 image_path,

~/.local/lib/python3.10/site-packages/ultralytics/utils/callbacks/comet.py in _fetch_annotations(img_idx, image_path, batch, prediction_metadata_map, class_label_map)
    192         img_idx, image_path, batch, class_label_map
    193     )
--> 194     prediction_annotations = _format_prediction_annotations_for_detection(
    195         image_path, prediction_metadata_map, class_label_map
    196     )

~/.local/lib/python3.10/site-packages/ultralytics/utils/callbacks/comet.py in _format_prediction_annotations_for_detection(image_path, metadata, class_label_map)
    180         cls_label = prediction["category_id"]
    181         if class_label_map:
--> 182             cls_label = str(class_label_map[cls_label])
    183 
    184         data.append({"boxes": [boxes], "label": cls_label, "score": score})

KeyError: 85

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions