Skip to content

Question - Training includes all images instead of only annotated ones (marked as done) + data.yaml path ignored #1340

@H1ghSyst3m

Description

@H1ghSyst3m

Search before asking

  • I have searched the X-AnyLabeling Docs and issues and found no similar questions.

Question

Hi,

I have two related questions about the built-in Ultralytics trainer:

1. Training uses all images, not just annotated ones

I have a dataset of ~600 images, of which 115 are annotated (visible in the Data tab as "115 Labels in Total" and marked as done in the bottom-right corner). However, after training I noticed that all 600 images were included in the internal dataset generated under xanylabeling_data/trainer/ultralytics/datasets/detect/.

Is there a parameter, option, or workaround to make the trainer only include annotated images? My goal is to train a custom model on the 115 annotated images and use it for auto-labeling the remaining ones.

2. What should the data.yaml contain / what is actually used?

In the Config tab, the Data field requires a data.yaml file. I exported my dataset using "Export YOLO-HBB Annotations" with "Skip empty labels" enabled, and created a data.yaml pointing to the exported folder. However, the trainer ignores the path, train, and val fields and generates its own internal dataset instead.

What is the data.yaml actually used for in this context? Is only nc and names relevant? And what is the recommended workflow here?

Thanks!

Additional

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions