Improve performance of dataset Logger #2943
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR discards unnecessary operations performed on labels before logging which should speed up things for large datsets.
@glenn-jocher there's still a bug in this which I couldn't explain properly over mail, let me try again in the meeting today.
Before the images are logged, the dataset directory is registered as artifacts, so we need to log the images from the registered path to make sure they don't get duplicated.
The bug that I'm seeing is that this logging code works perfectly when the dataset consists of all square images(case1) but the boxs are misplaced in cases where images are raw/unaugmented(case2)
See Example output -> case 1-> augmented dataset, case 2-> raw dataset
[I have marked the exact logging code L245-L255 in wandb_utils.py]
To reproduce:
python utils/wandb_logging/log_dataset.py
this logs raw coco128 dataset as W&B Table with slightly misplaced bboxes.🛠️ PR Summary
Made with ❤️ by Ultralytics Actions
🌟 Summary
Improvement in Weights & Biases (
wandb
) dataset artifact logging in the YOLOv5 repository.📊 Key Changes
LoadImagesAndLabels
calls to userect=True
andbatch_size=1
when creating dataset artifacts for training and validation sets.scores
anddomain
.🎯 Purpose & Impact
wandb
more efficient and the bounding box data more intuitive.wandb
interface.wandb
will find it simpler to review and understand their object detection dataset metrics due to these modifications. 🚀