Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update selectable device Profile #12353

Merged
merged 2 commits into from
Jan 3, 2024
Merged

Conversation

Yakuho
Copy link
Contributor

@Yakuho Yakuho commented Nov 9, 2023

When model using device cuda:1 or cuda:2 or whatever (Anyway, it's not cuda:0) to inference model , Profile just occupy cuda:0 GPU memory forever. Found Pytorch documentation that torch.cuda.synchronize can select device. So torch.cuda.synchronize maybe use cuda:0 by default when the parameter is None.

After experiment, GPU memory no longer just occupies cuda:0 forever.

The code which using Profile has been changed, but there are still 2 places ( export.py #L124, models/common.py #L680 ) have not been modified due to my uncertainty. I guess it should also be possible to pass in parameters model.device.

class Profile(contextlib.ContextDecorator):
    # YOLOv5 Profile class. Usage: @Profile() decorator or 'with Profile():' context manager
    def __init__(self, t=0.0, device: torch.device = None):
        self.t = t
        self.device = device
        self.cuda = True if (device and str(device)[:4] == "cuda") else False

    def __enter__(self):
        self.start = self.time()
        return self

    def __exit__(self, type, value, traceback):
        self.dt = self.time() - self.start  # delta-time
        self.t += self.dt  # accumulate dt

    def time(self):
        if self.cuda:
            torch.cuda.synchronize(self.device)
        return time.time()

🛠️ PR Summary

Made with ❤️ by Ultralytics Actions

🌟 Summary

This PR enhances profiling by incorporating device specificity in performance measurements.

📊 Key Changes

  • Profile class now accepts an optional device argument to enable device-specific synchronization.
  • All instances of the Profile class are now instantiated with the device parameter.

🎯 Purpose & Impact

  • 🎯 Purpose: To provide more accurate performance profiling depending on whether the code runs on CPU or a specific GPU.
  • 💡 Impact: Users can expect more precise measurement of code execution times, especially when using multiple GPUs, leading to better debugging and optimization of model inference and validation.

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👋 Hello @Yakuho, thank you for submitting a YOLOv5 🚀 PR! To allow your work to be integrated as seamlessly as possible, we advise you to:

  • ✅ Verify your PR is up-to-date with ultralytics/yolov5 master branch. If your PR is behind you can update your code by clicking the 'Update branch' button or by running git pull and git merge master locally.
  • ✅ Verify all YOLOv5 Continuous Integration (CI) checks are passing.
  • ✅ Reduce changes to the absolute minimum required for your bug fix or feature addition. "It is not daily increase but daily decrease, hack away the unessential. The closer to the source, the less wastage there is." — Bruce Lee

@glenn-jocher
Copy link
Member

@Yakuho thanks for your contribution! It’s wonderful to see your experiment and the changes you've made to the code by utilizing torch.cuda.synchronize to avoid occupying only cuda:0 GPU memory forever. Your proposed adjustments to export.py #L124 and models/common.py #L680 appear sensible. Passing in parameters model.device should indeed alleviate any uncertainty and ensure the effective utilization of GPU resources. Your continued contributions are highly appreciated!

@glenn-jocher glenn-jocher merged commit 69b0faf into ultralytics:master Jan 3, 2024
7 checks passed
pleb631 pushed a commit to pleb631/yolov5 that referenced this pull request Jan 6, 2024
* Update selectable device Profile

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants