Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update dataset names from array to dictionary #9000

Merged
merged 7 commits into from
Aug 17, 2022

Conversation

glenn-jocher
Copy link
Member

@glenn-jocher glenn-jocher commented Aug 17, 2022

Created with

    from utils.general import yaml_load
    for f in list(Path('data').glob('*.yaml')):
        d = yaml_load(f)
        d['names'] = {i: v for i, v in enumerate(d['names'])}
        yaml_save(f.with_stem(f.stem + '_new'), d)

@AyushExel @kalenmike implements names dict in yamls as discussed.

🛠️ PR Summary

Made with ❤️ by Ultralytics Actions

🌟 Summary

Refinement of class representation and minor code adjustments in Ultralytics YOLOv5.

📊 Key Changes

  • 🔢 YAML files updated from list to dictionary format for class names.
  • 📝 Code modified to handle the YAML changes and ensure compatibility with dictionary format.
  • 🧹 Small tweak in predict.py to convert tensor to list when logging.

🎯 Purpose & Impact

  • 🧑‍💻 Helps developers with more flexible and accessible class indexing.
  • 📈 Ensures that downstream code correctly interprets the new class name format.
  • 🛠 Facilitates potential future extensions to class handling.

These changes create a more intuitive way of handling class names and data, positively impacting maintainability and scalability of the model configurations.

@AyushExel
Copy link
Contributor

Just to confirm, this will be backwards compatible right?

@glenn-jocher
Copy link
Member Author

@AyushExel yeah that's the idea. All data yamls pass through check_dataset(), so I'll add some logic to check for isinstance(names, list) and convert to dict.

@glenn-jocher glenn-jocher changed the title Migrate dataset names to dictionary Migrate dataset names from array to dictionary Aug 17, 2022
@glenn-jocher glenn-jocher changed the title Migrate dataset names from array to dictionary Update dataset names from array to dictionary Aug 17, 2022
@glenn-jocher glenn-jocher merged commit e83b422 into master Aug 17, 2022
@glenn-jocher glenn-jocher deleted the update/data_names_dict branch August 17, 2022 15:52
@glenn-jocher
Copy link
Member Author

@AyushExel @kalenmike PR is merged, backwards compatible with older data yamls and should output the same results for HUB dataset stats.

Let me know if you run into any problems.

@austin1howard
Copy link

FYI @glenn-jocher this doesn't break loading/training, but because of the change at https://github.com/ultralytics/yolov5/pull/9000/files#diff-cfb1ff087a99a34369673c9f34bdcd22f2d429ab3599a89f386c5de1fd9a2566R452 (which sets model.names to the dictionary now) any code (like ours) that was using model.names breaks now. In our case, because we were iterating over it, our iteration went from class1, class2, class3 to 0,1,2.

ctjanuhowski pushed a commit to ctjanuhowski/yolov5 that referenced this pull request Sep 8, 2022
* Migrate dataset names to dictionary

* fix check

* backwards compat

* predict fix

* val fix

* Keep dataset stats behavior identical

Signed-off-by: Glenn Jocher <glenn.jocher@ultralytics.com>

Signed-off-by: Glenn Jocher <glenn.jocher@ultralytics.com>
@glenn-jocher
Copy link
Member Author

@austin1howard thanks for bringing this to our attention. I've made a note to check for this scenario and will make sure to address it in the next update. We aim to maintain compatibility and apologize for any inconvenience caused.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants