Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[UnitTests] Parametrized Unit Tests #7

Merged
merged 5 commits into from
Aug 31, 2021

Conversation

Lunderberg
Copy link
Contributor

Migrating earlier conversation from https://discuss.tvm.apache.org/t/rfc-parametrized-unit-tests/9946. RFC should be up to date with all changes/conversations that occurred in that thread.

@areusch Let's continue any discussion here, to avoid having multiple places to check for the same RFC.

@areusch
Copy link
Contributor

areusch commented Jun 10, 2021

@Lunderberg I think there were two open items from the discuss post:

  1. shadowing local vars with module-level vars containing their fixtures
  2. caching non-serializable data

on 1, it sounds like you had found a solution, so I think we're resolved there.

on 2, you'd suggested to table the discussion of how to cache non-serializable objects and just cache serializable ones. that sounds great to me.

so then, the real reason I was having this debate with you: i'd like to propose that in all cases (cache hot or cold), we always provide test functions with a brand-new deserialized copy of the data. i think that solves all of my concerns around tests poisoning downstream tests with cached data, and hopefully has not-too-significant performance impact. what are your thoughts?

@Lunderberg
Copy link
Contributor Author

@areusch

I think that makes sense for now, and is a good way to maintain separation between unit tests. Running through copy.deepcopy would be the easiest way to do the serialize/deserialize steps on arbitrary data. It throws a TypeError if a non-serializable value (e.g. tvm.nd.array) is passed, so that gives a good way to have the serializable-only restriction be checked by the CI.

I ran a few quick benchmarks for numpy arrays, since I think those would be the most frequently cached result. copy.deepcopy is a little bit slower than calling dedicated copying functions (e.g. my_numpy_array.copy()), but wouldn't require any type-checking on our part.

image

@areusch
Copy link
Contributor

areusch commented Jun 11, 2021

@Lunderberg I was thinking you could just require tests provide bytes to be cached.

@Lunderberg
Copy link
Contributor Author

How would we deserialize the objects in that case? I thought that __bytes__ was a one-way conversion from an object to a sequence of bytes, and didn't require objects to be deserializable.

@Lunderberg
Copy link
Contributor Author

@areusch I've added verbiage to the RFC to indicate use of copy.deepcopy for the results of cached fixtures, and have updated the PR with the copying. The error message if a non-serializable type is cached gives a link to this RFC, to avoid any confusion in the future.

.gitignore Outdated
@@ -0,0 +1,2 @@
# Emacs temporary files
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think these should go in your global .gitignore not the project-specific one

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough, and I can remove it if that is preferable. I try to avoid using the global .gitignore in most cases, so that git add . is safe to use in general, and not just on my environment. The tvm repo already has this line in its .gitignore, so I figured it was good to add here. (Though in retrospect it probably should have been a separate PR.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay, please remove this from the change then. we can debate it on another RFC

@Lunderberg
Copy link
Contributor Author

Copying some notes from a 1:1, along with commentary at the bottom.

Caching options listed below, arranged from most flexibility at the top to least flexibility at the bottom.

  1. All caching allowed. A fixture tagged with cache_return_value=True can return any python object. All tests that use the fixture receive the same python object as input.
    • Pros
      • Most potential to speed up running of CI.
      • Most potential to speed up by cross-function/file use of shared setup (e.g. np.random.uniform)
      • Works for any type.
    • Cons
      • Most potential to have interacting unit tests.
      • Could make somebody think that interactions are allowed between unit tests.
  2. Cache only copy-able types. A fixture tagged with cache_return_value=True has cached return values, but each parametrized unit test receives a copy made by copy.deepcopy prior to returning the object. Any types that raise an error during copying result in a failed setup for the test.
    • Pros
      • Very simple to add caching to existing functions.
    • Cons
      • Unintentionally copy global state that is pointed to by local state. Doesn't impact correctness, but would increase runtime.
      • Less flexibility, an object that cannot be deepcopy-ed results in a type error.
  3. Cache only copy-able types, but with a custom implementation of copy.deepcopy. Would follow the same rules if a __deepcopy__ or pair of __getstate__/ __setstate__ exists, but would otherwise require the type to be in a list of allowed types. List of types
    • Pros
      • Opt-in both by fixture and by data-type.
    • Cons
      • Re-implements a common utility.
  4. Cache using explicitly specified serialize/deserialize functions. A fixture tagged with cache_return_value=True must return three values. The first is the fixture value itself. The second/third are serialize/deserialize functions for that value.
    • Pros
    • Cons
  5. Only caching of bytes type is allowed. A fixture tagged with cache_return_value=True must return a bytes object. This object is deserialized at the start of the test function into the desired type.
    • Pros
    • Cons
  6. No caching. All fixtures are recomputed for each parametrized value.
    • Pros
      • All unit tests are independent by design. Zero interaction possible.
    • Cons
      • Most potential impact to CI runtime.

Overall, @areusch 's concern is avoiding hidden failure modes, especially where there could be tests that only appear to pass due to cross-talk. As such, he is more comfortable with more explicit usage. and would be comfortable with any of implementations 3-5. @Lunderberg 's main concern is usability and ease of enabling, such that a slow test can have caching enabled with minimal effort where it is needed, and would be comfortable with any of implementations 2-3. One key point that we came up with is that we expect fixtures to be uncached by default (@tvm.testing.fixture), and to have caching enabled on a per-fixture basis (@tvm.testing.fixture(cache_return_value=True) where there are performance benefits to be gained.

Since #3 is the overlap we both like between having opt-in caching, both on a per-return-value and per-fixture basis, I've put together a test implementation. It calls copy.deepcopy, but passes in a custom memo argument that performs type-checking of the copied type. Types can be explicitly listed as being safe to copy, but the use of the default object.__reduce__ is disabled. This implementation minimizes the amount of re-implementing that would need to be done, but does rely on ctypes to identify the object being checked.

@areusch Can you check this over to make sure I've given an accurate description of your comments?

@areusch
Copy link
Contributor

areusch commented Jul 7, 2021

@Lunderberg i think that sounds right. i agree #3 is the overlap between our positions. i'd like for others to comment whether it may be too complex to understand in debugging; otherwise i think we can proceed here. to be explicit: we intend to copy only whitelisted types, and not allow copy.deepcopy to work on arbitrary objects.

@Lunderberg
Copy link
Contributor Author

@areusch Any status changes on this?

Copy link
Contributor

@areusch areusch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hi @Lunderberg sorry for the long delay here. I reread this and just have a few minor clarifications I'd like you to make on the wording. Other than that, looks great! Feel free to ping aggressively when these are addressed and we can merge right away.

.gitignore Outdated
@@ -0,0 +1,2 @@
# Emacs temporary files
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay, please remove this from the change then. we can debate it on another RFC


Some unit tests should be tested along a variety of parameters for
better coverage. For example, a unit test that does not depend on
target-specific features should be tested on all targets that the test
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: "could be tested"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, and changed. The "should" is a stronger statement, and was what I initially intended, but I'd changed my mind given the current impact to CI runtime.

Comment on lines 22 to 26
The simplest implementation would be to write a test function that
loops over all parameters, throwing an exception if any parameter
fails the test. However, this does not give full information to a
developer, as a failure from any parameter results in the entire test
to be marked as failing. A unit-test that fails for all targets
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The simplest implementation would be to write a test function that
loops over all parameters, throwing an exception if any parameter
fails the test. However, this does not give full information to a
developer, as a failure from any parameter results in the entire test
to be marked as failing. A unit-test that fails for all targets
The simplest implementation would be to write a test function that
internally loops over all parameters and throws an exception when the
test fails. However, this does not give full information to a
developer because `pytest` does not necessarily include the parameter
value in the test report (even when it does, the value is in a different place
depending on the way the internal loop is written). A unit-test that fails for all targets

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, mostly following your modified wording.

Comment on lines 39 to 40
To make a new parameter for unit tests to use, define it with the
`tvm.testing.parameter` function. For example, the following will
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
To make a new parameter for unit tests to use, define it with the
`tvm.testing.parameter` function. For example, the following will
Before you can use a parameter in a test case, you need to register it with `pytest`.
Do this using the `tvm.testing.parameter` function. For example, the following will

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated as suggested.

```

To use a parameter, define a test function that accepts the parameter
as an input. This test will be run once for each value of the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
as an input. This test will be run once for each value of the
as an input (use the same name for the argument as you used above in the
parameter registration). This test will be run once for each value of the

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, but with a comma instead of a parenthetical.

useful to re-run tests that failed, to check whether the failure is
due to modification/re-use of a cached value.

A fixture can depend on parameters, or on other fixtures. This is
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
A fixture can depend on parameters, or on other fixtures. This is
A fixture can also depend on parameters or other fixtures. This is

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated as suggested.


## Target/Device Parametrization

The TVM test configuration contains definitions for `target` and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The TVM test configuration contains definitions for `target` and
The global TVM test configuration contains definitions for `target` and

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated as suggested.


# After
if __name__=='__main__':
sys.exit(pytest.main(sys.argv))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should ideally split this out into a helper function and eventually add a pylint plugin to enforce, but good to document standard right now

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, and that would help with readability. I've added it to my todo list, though it's low on the priority.

# Prior art
[prior-art]: #prior-art

Discuss prior art, both the good and the bad, in relation to this proposal.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you remove the boilerplate and just add a reference to i guess pytest.mark.parametrize and functools.lru_cache or whatever that is now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated with a reference to pytest.mark.parametrize and pytest.fixture's caching using the scope parameter.

Following comments, I agree that it doesn't belong in this PR, and may
include in a separate PR.
Copy link
Contributor

@areusch areusch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @Lunderberg !

@areusch areusch merged commit fd98681 into apache:main Aug 31, 2021
@Lunderberg Lunderberg deleted the parametrized_unit_tests branch August 31, 2021 10:43
MichaelJKlaiber added a commit to MichaelJKlaiber/tvm-rfcs that referenced this pull request Apr 6, 2022
Update 00xx_UMA_Unified_Modular_Accelerator_Interface.md
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status: need review RFC needs review status: need update RFC needs update based on feedback
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants