Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ClearML experiment tracking integration #8620

Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
62 commits
Select commit Hold shift + click to select a range
ec7986a
Add titles to matplotlib plots
thepycoder Jul 6, 2022
11da722
Add ClearML Experiment Tracking integration.
thepycoder Jul 6, 2022
d395a28
Add ClearML Data Version Management automatic download when requested
thepycoder Jul 6, 2022
a160dfc
Add ClearML Hyperparameter Optimization
thepycoder Jul 6, 2022
16a1a48
ClearML save period integration
thepycoder Jul 12, 2022
8e957c9
Fix wandb breaking when used with ClearML dataset
thepycoder Jul 12, 2022
027ca12
Fix wandb breaking when used with ClearML resume and dataset
thepycoder Jul 12, 2022
a5ae4bb
Add ClearML documentation
thepycoder Jul 7, 2022
29a2686
fixed small bug in clearml integration that misreports epoch number
thepycoder Jul 11, 2022
be45d1b
Final ClearMl additions before refactor
thepycoder Jul 12, 2022
bd20628
Add correct epoch reporting
thepycoder Jul 12, 2022
c69d56f
Add remote execution and autoscaling docs for ClearML integration
thepycoder Jul 12, 2022
358354d
Added images to clearml integration docs
thepycoder Jul 12, 2022
fd0b10d
fixed logo alignment bug and added hpo screenshot clearml
thepycoder Jul 12, 2022
1cbe74b
Fixed small epoch number bug in clearml integration
thepycoder Jul 12, 2022
51f051d
Remove saved model flush clearml
thepycoder Jul 12, 2022
2557ace
Cleanup clearml readme section
thepycoder Jul 12, 2022
3c9403b
Cleaned up clearml logger docstring
thepycoder Jul 12, 2022
85ac912
Remove resume readme section clearml
thepycoder Jul 12, 2022
a9bb3be
Clearml integration cleanup
thepycoder Jul 12, 2022
a29eb1c
Updated ClearML documentation
thepycoder Jul 14, 2022
806c22a
Added dark vs light icons ClearML Readme
thepycoder Jul 14, 2022
194cf62
Clearml Readme styling
thepycoder Jul 14, 2022
421eb19
Add better gifs
thepycoder Jul 14, 2022
36ce901
Fixed gif file size
thepycoder Jul 14, 2022
aa36080
Add better images in tutorial notebook
thepycoder Jul 14, 2022
ba99667
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jul 18, 2022
c77575f
Merge branch 'master' into feature/clearml_integration_no_resume
glenn-jocher Jul 30, 2022
359d4fa
Merge branch 'master' into feature/clearml_integration_no_resume
glenn-jocher Jul 30, 2022
02bada8
Addressed comments in PR #8620
thepycoder Aug 1, 2022
eda35ea
Merge branch 'feature/clearml_integration_no_resume' of github.com:th…
thepycoder Aug 1, 2022
bae0c2f
Fixed circular import
thepycoder Aug 1, 2022
c3c394f
Fixed circular import
thepycoder Aug 1, 2022
c370935
Update tutorial.ipynb
glenn-jocher Aug 1, 2022
a770823
Update tutorial.ipynb
glenn-jocher Aug 1, 2022
8561281
Merge branch 'master' into feature/clearml_integration_no_resume
glenn-jocher Aug 1, 2022
d4bbc92
Inline comment
glenn-jocher Aug 1, 2022
5d31d2d
Merge branch 'master' into feature/clearml_integration_no_resume
glenn-jocher Aug 1, 2022
40a9616
Merge branch 'master' into feature/clearml_integration_no_resume
glenn-jocher Aug 2, 2022
0bf8969
Restructured tutorial notebook
thepycoder Aug 3, 2022
c49f827
Add correct ClearML link to README
thepycoder Aug 3, 2022
efa3d09
Merge branch 'master' into feature/clearml_integration_no_resume
glenn-jocher Aug 3, 2022
28fedbd
Merge branch 'master' into feature/clearml_integration_no_resume
glenn-jocher Aug 4, 2022
0c10d66
Update tutorial.ipynb
glenn-jocher Aug 4, 2022
424b524
Merge branch 'master' into feature/clearml_integration_no_resume
glenn-jocher Aug 5, 2022
7bde104
Update general.py
glenn-jocher Aug 5, 2022
92657cb
Update __init__.py
glenn-jocher Aug 5, 2022
a98dc0f
Update __init__.py
glenn-jocher Aug 5, 2022
9c0eab3
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Aug 5, 2022
fd0cff6
Update __init__.py
glenn-jocher Aug 5, 2022
c97a2ad
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Aug 5, 2022
8bbf04e
Update __init__.py
glenn-jocher Aug 5, 2022
0930d59
Update README.md
glenn-jocher Aug 5, 2022
68f9a9c
Update __init__.py
glenn-jocher Aug 5, 2022
b618614
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Aug 5, 2022
918d861
spelling
glenn-jocher Aug 5, 2022
092fe34
Update tutorial.ipynb
glenn-jocher Aug 5, 2022
e6eca5f
notebook cutt.ly links
glenn-jocher Aug 5, 2022
b45cac4
Update README.md
glenn-jocher Aug 5, 2022
40d3b95
Update README.md
glenn-jocher Aug 5, 2022
5ed6620
cutt.ly links in tutorial
glenn-jocher Aug 5, 2022
00eda91
Removed labels as they show up on last subplot only
glenn-jocher Aug 5, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Add remote execution and autoscaling docs for ClearML integration
  • Loading branch information
thepycoder committed Jul 18, 2022
commit c69d56f5139e1342030d4bd39b53c73ae4305342
28 changes: 25 additions & 3 deletions tutorial.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -575,9 +575,9 @@
"\n",
"[ClearML](https://clear.ml) is completely integrated into YOLOv5 to track your experimentation, manage dataset versions and even remotely execute training runs.\n",
"\n",
"To enable ClearML:\n",
"To enable ClearML (Check cells above):\n",
"- `pip install clearml`\n",
"- run `clearml-init` to connect to a ClearML server (**deploy your own open-source server [here]()**, or use our free hosted server [here]())\n",
"- run `clearml-init` to connect to a ClearML server (**deploy your own open-source server [here](https://github.com/allegroai/clearml-server)**, or use our free hosted server [here](https://app.clear.ml))\n",
"\n",
"You'll get all the great expected features from an experiment manager: live updates, model upload, experiment comparison etc. but we also track uncommitted changes and installed packages for example.\n",
"Thanks to that ClearML Tasks (which is what we call experiments) are also reproducible on different machines! With only 1 extra line, we can schedule a YOLOv5 training task on a queue to be executed by any number of ClearML Agents (workers).\n",
Expand All @@ -587,7 +587,29 @@
"\n",
"#### Data Versioning\n",
"\n",
"#### Resumable runs"
"<img src=\"https://github.com/thepycoder/clearml_screenshots/raw/main/dataset_version.png\" alt=\"Data versioning UI\" width=\"1280\"/>\n",
"\n",
"#### Resumable runs\n",
"\n",
"### ClearML Agents for remote execution\n",
"If you want to spin up some queues and agents (ClearML workers) yourself to remotely execute the training process of this repository, head over to our resources on the topic:\n",
"\n",
"- [Youtube video](https://youtu.be/MX3BrXnaULs)\n",
"- [Documentation](https://clear.ml/docs/latest/docs/clearml_agent)\n",
"- [Example code](https://clear.ml/docs/latest/docs/guides/advanced/execute_remotely)\n",
"\n",
"But in short: every experiment tracked by the experiment manager contains enough information to reproduce it on a different machine (installed packages, uncommitted changes etc.). So a ClearML agent does just that: it listens to a queue for incoming tasks and when it finds one, it recreates the environment and runs it while still reporting scalars, plots etc. to the experiment manager.\n",
"\n",
"You can turn any machine (a cloud VM, a local GPU machine, your own laptop ... ) into a ClearML agent by simply running:\n",
"```\n",
"clearml-agent daemon --queue <queues_to_listen_to> [--docker]\n",
"```\n",
"Now you can clone a task like we explained above, or simply mark your current script by adding `task.execute_remotely()` and on execution it will be put into a queue, for the agent to start working on! \n",
"\n",
"### Autoscaling workers\n",
"ClearML comes with autoscalers too! This tool will automatically spin up new remote machines in the cloud of your choice (AWS, GCP, Azure) and turn them into ClearML agents for you whenever there are experiments detected in the queue. Once the tasks are processed, the autoscaler will automatically shut down the remote machines and you stop paying!\n",
"\n",
"Check out the autoscalers [here](https://youtu.be/j4XVMAaUt3E)."
]
},
{
Expand Down
28 changes: 27 additions & 1 deletion utils/loggers/clearml/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
![ClearML scalars dashboard](https://github.com/thepycoder/clearml_screenshots/raw/main/experiment_manager.gif)
# ClearML Integration


## About ClearML
ClearML is an [open-source](https://github.com/allegroai/clearml) toolbox designed to save you time. It features (click on the arrow for screenshots):

Expand All @@ -25,6 +25,8 @@ ClearML is an [open-source](https://github.com/allegroai/clearml) toolbox design

And so much more. It's up to you how many of these tools you want to use, you can stick to the experiment manager, or chain them all together into an impressive pipeline!

![ClearML scalars dashboard](https://github.com/thepycoder/clearml_screenshots/raw/main/experiment_manager.gif)


## 🦾 Setting things up
To keep track of your experiments and/or data, ClearML needs to communicate to a server. You have 2 options to get one:
Expand Down Expand Up @@ -67,9 +69,33 @@ So we can actually clone a task by right clicking it and it will be set to draft

PS: if you want to change the `project_name` or `task_name`, head over to our custom logger, where you can change it :) `utils/loggers/clearml/clearml_utils.py`

![Experiment Management Interface](https://github.com/thepycoder/clearml_screenshots/raw/main/scalars.png)

### ClearML Agents for remote execution
If you want to spin up some queues and agents (ClearML workers) yourself to remotely execute the training process of this repository, head over to our resources on the topic:

- [Youtube video](https://youtu.be/MX3BrXnaULs)
- [Documentation](https://clear.ml/docs/latest/docs/clearml_agent)
- [Example code](https://clear.ml/docs/latest/docs/guides/advanced/execute_remotely)

But in short: every experiment tracked by the experiment manager contains enough information to reproduce it on a different machine (installed packages, uncommitted changes etc.). So a ClearML agent does just that: it listens to a queue for incoming tasks and when it finds one, it recreates the environment and runs it while still reporting scalars, plots etc. to the experiment manager.

You can turn any machine (a cloud VM, a local GPU machine, your own laptop ... ) into a ClearML agent by simply running:
```
clearml-agent daemon --queue <queues_to_listen_to> [--docker]
```
Now you can clone a task like we explained above, or simply mark your current script by adding `task.execute_remotely()` and on execution it will be put into a queue, for the agent to start working on!

### Autoscaling workers
ClearML comes with autoscalers too! This tool will automatically spin up new remote machines in the cloud of your choice (AWS, GCP, Azure) and turn them into ClearML agents for you whenever there are experiments detected in the queue. Once the tasks are processed, the autoscaler will automatically shut down the remote machines and you stop paying!

Check out the autoscalers [here](https://youtu.be/j4XVMAaUt3E).

## 🔗 Data versioning
Versioning your data separately from your code is generally a good idea. This repository supports supplying a dataset version ID and it will make sure to get the data if it's not there yet. Next to that, this workflow also saves the used dataset ID as part of the task parameters, so you will always know for sure which data was used in which experiment!

![ClearML Dataset Interface](https://github.com/thepycoder/clearml_screenshots/raw/main/dataset_version.png)

### Prepare Dataset
This repository supports a number of different datasets by using yaml files containing their information. By default datasets are downloaded to the `../datasets` folder in relation to the repository root folder. So if you downloaded the `coco128` dataset using the link in the yaml or with the scripts provided by yolov5, you get this folder structure:

Expand Down