Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI: split tests-examples #990

Merged
merged 38 commits into from
Mar 25, 2020
Merged

CI: split tests-examples #990

merged 38 commits into from
Mar 25, 2020

Conversation

Borda
Copy link
Member

@Borda Borda commented Mar 1, 2020

What does this PR do?

This is resolving #942 and follow-up work for #986
Also fixes random crashes with CUDA core
moved model's test to one folder

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

Did you have fun?

Make sure you had fun coding 🙃

@Borda Borda added feature Is an improvement or enhancement ci Continuous Integration labels Mar 1, 2020
@Borda Borda requested a review from a team March 1, 2020 18:29
@Borda Borda changed the title CI: split tests-examples [wip] CI: split tests-examples Mar 1, 2020
@Borda Borda marked this pull request as ready for review March 1, 2020 21:03
@Borda
Copy link
Member Author

Borda commented Mar 1, 2020

integrating fix of actions/checkout#23

@Borda
Copy link
Member Author

Borda commented Mar 1, 2020

failing tests because of pytorch/pytorch#20030

@awaelchli awaelchli mentioned this pull request Mar 4, 2020
@Borda Borda added this to the 0.7.2 milestone Mar 12, 2020
@awaelchli
Copy link
Contributor

Looks like the tests related to ddp and amp are failing. Could it be that it is because you changed some of the numbers in tests (max_epoch, min_acc, ...)?

@Borda
Copy link
Member Author

Borda commented Mar 14, 2020

Looks like the tests related to ddp and amp are failing. Could it be that it is because you changed some of the numbers in tests (max_epoch, min_acc, ...)?

I have increased the run to get better acc but I did not touch the functionalities...
@luiscape was looking at the GPU CI and his thought is dome driver error :{

@awaelchli
Copy link
Contributor

did you already try to run the tests without changing the hyperparameters? maybe ddp/amp gives slightly different numbers that don't match the new threshold for acc.

@Borda
Copy link
Member Author

Borda commented Mar 15, 2020

just working on it... lol

@Borda
Copy link
Member Author

Borda commented Mar 16, 2020

@luiscape @PyTorchLightning/core-contributors any idea why GPU test crashes on:

>           ret = torch.addmm(bias, input, weight.t())
--
2284 | E           RuntimeError: CUDA error: no kernel image is available for execution on the device

nut when I run it on the same machine with the mase image it passes...

@Borda
Copy link
Member Author

Borda commented Mar 17, 2020

@jeremyjordan have you seen this, any suggestion?

@pep8speaks
Copy link

pep8speaks commented Mar 19, 2020

Hello @Borda! Thanks for updating this PR.

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-03-24 20:39:00 UTC

@codecov
Copy link

codecov bot commented Mar 19, 2020

Codecov Report

Merging #990 into master will not change coverage by %.
The diff coverage is n/a.

@@          Coverage Diff          @@
##           master   #990   +/-   ##
=====================================
  Coverage      91%    91%           
=====================================
  Files          61     61           
  Lines        3153   3153           
=====================================
  Hits         2879   2879           
  Misses        274    274           

@Borda
Copy link
Member Author

Borda commented Mar 19, 2020

@awaelchli @williamFalcon @luiscape ^^ pls

pl_examples/basic_examples/lightning_module_template.py Outdated Show resolved Hide resolved
requirements-extra.txt Outdated Show resolved Hide resolved
tests/models/utils.py Show resolved Hide resolved
@Borda Borda added the ready PRs ready to be merged label Mar 19, 2020
@awaelchli
Copy link
Contributor

Awesome! Tests finally pass!

... )
>>> from argparse import Namespace
>>> hparams = Namespace(**params)
>>> model = LightningTemplateModel(hparams)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can just pass in a dict now... no need to cast as Namespace

Copy link
Member Author

@Borda Borda Mar 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here it is used in init to get some values to do here has to be Namespace
so I will prepare follow-up pr with this change to have hparams setter/getter to accept

@mergify
Copy link
Contributor

mergify bot commented Mar 24, 2020

This pull request is now in conflict... :(

@williamFalcon
Copy link
Contributor

@Borda this PR is huge lol. was there a merge or something? can we make it smaller?

@Borda
Copy link
Member Author

Borda commented Mar 24, 2020

@williamFalcon the splitting test and examples are quite complex, I could move a few smaller steps elsewhere but this stays huge anyway... and it would take quite some time to debug and duplicate some fixes...

@@ -1,57 +0,0 @@
"""Models for testing."""
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is just moving old tests.models to tests.base

@Borda Borda requested review from a team and jeremyjordan March 24, 2020 20:58
Copy link
Contributor

@tullie tullie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is hard to review because lots of small changes but lgtm

@@ -5,11 +5,11 @@
import numpy as np
import torch

from pl_examples import LightningTemplateModel
# from pl_examples import LightningTemplateModel
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it'd be nice to not check in commented out code

@@ -21,7 +21,7 @@


def run_model_test_no_loggers(trainer_options, model, min_acc=0.50):
save_dir = trainer_options['default_save_path']
# save_dir = trainer_options['default_save_path']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment here, it'd be nice to not check in commented out code

@@ -114,7 +114,7 @@ def test_running_test_after_fitting(tmpdir):
trainer.test()

# test we have good test accuracy
tutils.assert_ok_model_acc(trainer)
tutils.assert_ok_model_acc(trainer, thr=0.35)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we want to define a DESIRED_THRESHOLD var in the base file and import that? rather than referencing a float throughout the tests

@williamFalcon williamFalcon merged commit 45d671a into Lightning-AI:master Mar 25, 2020
@Borda Borda deleted the split-test-examples branch March 25, 2020 13:47
williamFalcon added a commit that referenced this pull request Mar 28, 2020
* Example: Simple RL example using DQN/Lightning

* DQN RL Agent using Lightning

* Uses Iterable Dataset for Replay Buffer

* Buffer is populated by agent as training is carried out, updating the
dataset

* Applied autopep8 fixes

* * Updated line length from 120 to 110

* Update pl_examples/domain_templates/dqn.py

simplify get_device method

Co-Authored-By: Jirka Borovec <Borda@users.noreply.github.com>

* Update pl_examples/domain_templates/dqn.py

Re-ordered imports

Co-Authored-By: Jirka Borovec <Borda@users.noreply.github.com>

* CI: split tests-examples (#990)

* CI: split tests-examples

* tests without template

* comment depends

* CircleCI typo

* add doctest

* update test req.

* CI tests

* setup macOS

* longer train

* lover pred acc

* fix model

* rename default model

* lower tests acc

* typo

* imports

* fix test optimizer

* update calls

* fix Win

* lower Drone image

* fix call

* pytorch image

* fix test

* add dev image

* add dev image

* update image

* drone volume

* lint

* update test notes

* rename tests/models >> tests/base

* group models

* conftest

* optim imports

* typos

* fix import

* fix tests

* install AMP

* tests

* fix import

* Clean up

* added module docstring

* renamed variables to be more descriptive

* Added missing docstrings and type annotations

* Added gym to example requirements

* Added note to changelog

* updated example image

* update types

* rename script

* Update CHANGELOG.md

Co-Authored-By: Jirka Borovec <Borda@users.noreply.github.com>

* another rename

* Disable validation when val_percent_check=0 (#1251)

* fix disable validation

* add test

* update changelog

* update docs for val_percent_check

* make "fast training" docs consistent

* calling self.forward() -> self() (#1211)

* self.forward() -> self()

* update changelog

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* Fix requirements-extra.txt Trains package to release version (#1229)

* Fix requirement-extra use released Trains package

* Update README.md add Trains and links to the external Visualization section

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* Remove unnecessary parameters to super() in documentation and source code (#1240)

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* update deprecation warning (#1258)

* update docs for progress bat values (#1253)

* lower timeouts for inactive issues (#1250)

* update contrib list (#1241)

Co-authored-by: William Falcon <waf2107@columbia.edu>

* Fix outdated docs (#1227)

* Fix typo (#1224)

* drop unused Tox (#1242)

* system info (#1234)

* system info

* update big info

* test script

* update config

* rename script

* import path

* Changed smoothing in tqdm to decrease variability of time remaining between training / eval (#1194)

* Example: Simple RL example using DQN/Lightning

* DQN RL Agent using Lightning

* Uses Iterable Dataset for Replay Buffer

* Buffer is populated by agent as training is carried out, updating the
dataset

* Applied autopep8 fixes

* * Updated line length from 120 to 110

* Update pl_examples/domain_templates/dqn.py

simplify get_device method

Co-Authored-By: Jirka Borovec <Borda@users.noreply.github.com>

* Update pl_examples/domain_templates/dqn.py

Re-ordered imports

Co-Authored-By: Jirka Borovec <Borda@users.noreply.github.com>

* Clean up

* added module docstring

* renamed variables to be more descriptive

* Added missing docstrings and type annotations

* Added gym to example requirements

* Added note to changelog

* update types

* rename script

* Update CHANGELOG.md

Co-Authored-By: Jirka Borovec <Borda@users.noreply.github.com>

* another rename

Co-authored-by: Donal Byrne <Donal.Byrne@xperi.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: William Falcon <waf2107@columbia.edu>
Co-authored-by: Adrian Wälchli <adrian.waelchli@students.unibe.ch>
Co-authored-by: Jeremy Jordan <13970565+jeremyjordan@users.noreply.github.com>
Co-authored-by: Martin.B <51887611+bmartinn@users.noreply.github.com>
Co-authored-by: Tyler Yep <tyep@stanford.edu>
Co-authored-by: Shunta Komatsu <59395084+skmatz@users.noreply.github.com>
Co-authored-by: Jack Pertschuk <jackpertschuk@gmail.com>
alexeykarnachev pushed a commit to alexeykarnachev/pytorch-lightning that referenced this pull request Apr 3, 2020
* CI: split tests-examples

* tests without template

* comment depends

* CircleCI typo

* add doctest

* update test req.

* CI tests

* setup macOS

* longer train

* lover pred acc

* fix model

* rename default model

* lower tests acc

* typo

* imports

* fix test optimizer

* update calls

* fix Win

* lower Drone image

* fix call

* pytorch image

* fix test

* add dev image

* add dev image

* update image

* drone volume

* lint

* update test notes

* rename tests/models >> tests/base

* group models

* conftest

* optim imports

* typos

* fix import

* fix tests

* install AMP

* tests

* fix import
alexeykarnachev pushed a commit to alexeykarnachev/pytorch-lightning that referenced this pull request Apr 3, 2020
* Example: Simple RL example using DQN/Lightning

* DQN RL Agent using Lightning

* Uses Iterable Dataset for Replay Buffer

* Buffer is populated by agent as training is carried out, updating the
dataset

* Applied autopep8 fixes

* * Updated line length from 120 to 110

* Update pl_examples/domain_templates/dqn.py

simplify get_device method

Co-Authored-By: Jirka Borovec <Borda@users.noreply.github.com>

* Update pl_examples/domain_templates/dqn.py

Re-ordered imports

Co-Authored-By: Jirka Borovec <Borda@users.noreply.github.com>

* CI: split tests-examples (Lightning-AI#990)

* CI: split tests-examples

* tests without template

* comment depends

* CircleCI typo

* add doctest

* update test req.

* CI tests

* setup macOS

* longer train

* lover pred acc

* fix model

* rename default model

* lower tests acc

* typo

* imports

* fix test optimizer

* update calls

* fix Win

* lower Drone image

* fix call

* pytorch image

* fix test

* add dev image

* add dev image

* update image

* drone volume

* lint

* update test notes

* rename tests/models >> tests/base

* group models

* conftest

* optim imports

* typos

* fix import

* fix tests

* install AMP

* tests

* fix import

* Clean up

* added module docstring

* renamed variables to be more descriptive

* Added missing docstrings and type annotations

* Added gym to example requirements

* Added note to changelog

* updated example image

* update types

* rename script

* Update CHANGELOG.md

Co-Authored-By: Jirka Borovec <Borda@users.noreply.github.com>

* another rename

* Disable validation when val_percent_check=0 (Lightning-AI#1251)

* fix disable validation

* add test

* update changelog

* update docs for val_percent_check

* make "fast training" docs consistent

* calling self.forward() -> self() (Lightning-AI#1211)

* self.forward() -> self()

* update changelog

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* Fix requirements-extra.txt Trains package to release version (Lightning-AI#1229)

* Fix requirement-extra use released Trains package

* Update README.md add Trains and links to the external Visualization section

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* Remove unnecessary parameters to super() in documentation and source code (Lightning-AI#1240)

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* update deprecation warning (Lightning-AI#1258)

* update docs for progress bat values (Lightning-AI#1253)

* lower timeouts for inactive issues (Lightning-AI#1250)

* update contrib list (Lightning-AI#1241)

Co-authored-by: William Falcon <waf2107@columbia.edu>

* Fix outdated docs (Lightning-AI#1227)

* Fix typo (Lightning-AI#1224)

* drop unused Tox (Lightning-AI#1242)

* system info (Lightning-AI#1234)

* system info

* update big info

* test script

* update config

* rename script

* import path

* Changed smoothing in tqdm to decrease variability of time remaining between training / eval (Lightning-AI#1194)

* Example: Simple RL example using DQN/Lightning

* DQN RL Agent using Lightning

* Uses Iterable Dataset for Replay Buffer

* Buffer is populated by agent as training is carried out, updating the
dataset

* Applied autopep8 fixes

* * Updated line length from 120 to 110

* Update pl_examples/domain_templates/dqn.py

simplify get_device method

Co-Authored-By: Jirka Borovec <Borda@users.noreply.github.com>

* Update pl_examples/domain_templates/dqn.py

Re-ordered imports

Co-Authored-By: Jirka Borovec <Borda@users.noreply.github.com>

* Clean up

* added module docstring

* renamed variables to be more descriptive

* Added missing docstrings and type annotations

* Added gym to example requirements

* Added note to changelog

* update types

* rename script

* Update CHANGELOG.md

Co-Authored-By: Jirka Borovec <Borda@users.noreply.github.com>

* another rename

Co-authored-by: Donal Byrne <Donal.Byrne@xperi.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: William Falcon <waf2107@columbia.edu>
Co-authored-by: Adrian Wälchli <adrian.waelchli@students.unibe.ch>
Co-authored-by: Jeremy Jordan <13970565+jeremyjordan@users.noreply.github.com>
Co-authored-by: Martin.B <51887611+bmartinn@users.noreply.github.com>
Co-authored-by: Tyler Yep <tyep@stanford.edu>
Co-authored-by: Shunta Komatsu <59395084+skmatz@users.noreply.github.com>
Co-authored-by: Jack Pertschuk <jackpertschuk@gmail.com>
@Borda Borda modified the milestones: v0.7., v0.7.x Apr 18, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci Continuous Integration feature Is an improvement or enhancement ready PRs ready to be merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants