Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate requirements.txt from Python spec #7289

Merged
merged 15 commits into from
Feb 4, 2021

Conversation

areusch
Copy link
Contributor

@areusch areusch commented Jan 15, 2021

This PR implements the Consolidating Python Dependencies RFC.

As part of this RFC, we decided to try a solution halfway between burdening TVM with a full-blown dependency tool such as poetry and the approach we currently take (scattered Python deps). The initial patchset here contains the tool plus some examples of the dependencies. The remaining things needed to merge this PR:

  • Write tests for the tool
  • Finish consolidating various dependencies from the ci docker containers

I'm posting this now before I do any of the test work to get feedback on the approach now that there's something concrete to look at. I do think this tool is a bit more complex than it could be, and am concerned about maintaining it. In particular the logic needed to lint the dependencies list is complex. I don't see a solution that avoids joining a set of dependencies by feature against a constraints list as very workable given the current extras structure.

Another topic for discussion is the use of semver. Python does contain a compatibility operator we could use. We should probably debate between the two (let's use the forum for this), as it would remove some complexity here.

Comments would be quite welcomed!

@tqchen @tkonolige @mshawcroft @leandron @junrushao1994 @tmoreau89 @jwfromm @jroesch @mdw-octoml @zhiics

python/gen_requirements.py Outdated Show resolved Hide resolved
python/gen_requirements.py Outdated Show resolved Hide resolved
Comment on lines 390 to 391
if __name__ == "__main__":
main()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed this script is used to generate the files under ./requirements and also used inline within setup.py. When a dependency is updated, it will immediately reflect in setup.py - this is good.

However, how do we make sure that when a new dependency is introduced/changed/updated/fixed, the txt files are also correctly updated?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes this is a great point. I sort of ran out of time before I posted the initial cut, so i figured i'd post and let people comment. I think the right thing to do is to move join_requirements + the requirements.txt writing into a top-level function and call that function from both gen_requirements.main and setup.py. however, that does mean that if tools like poetry or others call setup.py egg_info, requirements.txt will be rewritten.

a developer might not expect that, though if it makes any appreciable change, they must be either syncing to main or modifying gen_requirements.py. i think that's okay...what does everyone else think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in thinking about this further, i'm starting to lean towards not updating the requirements.txt when setup.py egg_info is invoked. here's why:

  • given the discussion about constraint types in the RFC, the functional constraints aren't themselves particularly useful in pip install -r. you need additional constraints to indicate which version of packages are used by the CI.
  • in order to generate those we need another RFC. presumably, once we have those, we can change the language about installing Python dependencies in the Install From Source tutorial to tell you to use the CI-tested requirements.txt.
  • The CI-tested requirements.txt needs to be generated with slightly different inputs (I.e. the requirements here + the output of pip freeze from various CI containers). I think we would create a slightly different tool or modify this one to include configuration on how to translate pip freeze into looser requirements.txt usable for everyday work.
  • until we've completed all of this, I don't think we'd recommend people use these requirements. if they want them, it's pretty easy to generate by just running python gen_requirements.txt. I don't intend to actually check in the files in python/requirements to the repo (rather add .gitignore), so they wouldn't be out of date for most people.

"psutil",
"scipy",
"synr",
]),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you need tornado and cloudpickle too. They are requirements for tuning.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

@areusch
Copy link
Contributor Author

areusch commented Jan 15, 2021

thanks for the reviews so far everyone! one quick note:

the actual list of dependencies (and their version constraints) are still WIP. so definitely feel free to comment on things that are missing, but I know the list is wrong and I plan to fix the list and add tests if everyone's good w/ the approach.

i'll leave this open to collect reviews over the weekend and on Monday (holiday for me) and then i'll work to merge this mid-next-week if we are good w/ it.

@comaniac @tkonolige @leandron


# Maps named TVM piece (see description above) to a list of names of Python packages. Please use
# alphabetical order for each package list, and do not add version constraints here!
REQUIREMENTS_BY_PIECE = [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: You might consider storing these lists in a .yaml file and loading it, which is easier to edit/maintain than directly editing the Python code itself.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right, I generally prefer yaml for a task like this but since this is the deps script, I don't think it should have external dependencies itself.

python/gen_requirements.py Outdated Show resolved Hide resolved
SEMVER_REGEX = re.compile(r"^(?P<major>0|[1-9]\d*)\.(?P<minor>0|[1-9]\d*)\.(?P<patch>0|[1-9]\d*)(?:-(?P<prerelease>(?:0|[1-9]\d*|\d*[a-zA-Z-][0-9a-zA-Z-]*)(?:\.(?:0|[1-9]\d*|\d*[a-zA-Z-][0-9a-zA-Z-]*))*))?(?:\+(?P<buildmetadata>[0-9a-zA-Z-]+(?:\.[0-9a-zA-Z-]+)*))?$")


def validate_requirements_by_piece():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For new code I would recommend using Python type annotations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will add type annotations before submitting this PR

python/gen_requirements.py Outdated Show resolved Hide resolved
sys.exit(2)

if args.lint:
sys.exit(0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Recommend return, not exit, since this is a function that could be (and, in fact, is!) invoked by another Python library -- same for the error cases, raise an Exception, don't exit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually in setup.py we call join_requirements, so I think we are good here.

@@ -0,0 +1,16 @@
numpy
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd recommend adding comments at the top of each of these files to indicate what they are used for, since it may not always be obvious from the filename. PIP requirements files support # comments.https://pip.pypa.io/en/latest/reference/pip_install/#requirements-file-format

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a way to specify a comment to each piece and that's now placed at the top of each requirements.txt file

import gen_requirements
sys.path.pop(0)

requirements = gen_requirements.join_requirements()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want to print out some details on what the requirements identified are, to help with debugging?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do you mean exactly?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, just a suggestion that when this is being invoked, and the requirements being generated automatically, it might be useful to have them dumped to stdout during the invocation (perhaps optionally) so someone debugging issues with Python deps can readily see what they were. Not sure if they are exposed in an obvious way elsewhere.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess, I don't typically expect to see the requirements printed when I run python setup.py. are there cases where you do? I could add a comment saying: to produce requirements.txt files, run python gen_requirements.py above this line, which is kind of the typical case i'd expect someone to be looking for them.

@areusch
Copy link
Contributor Author

areusch commented Jan 26, 2021

@comaniac @leandron @tqchen @mdw-octoml @tkonolige please take a look at the latest update on the RFC forum thread.

@areusch areusch marked this pull request as ready for review January 28, 2021 20:59
@areusch
Copy link
Contributor Author

areusch commented Jan 29, 2021

@leandron @mdw-octoml @tkonolige @comaniac @tqchen please take another look when you have a minute

Copy link
Contributor

@tkonolige tkonolige left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all the hard work @areusch!

This seems like lot of code just to manage dependencies, but I guess we have complicated requirements.

python/.gitignore Outdated Show resolved Hide resolved
setattr(obj, prop_name, value)


PROBLEM_REQUIREMENTS = [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I think this test would be easier to maintain if the expected problem string were part of this list, rather than hardcoded in the test case in the same order below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess then though, you're doing processing to construct the test stimulus, which you could screw up. plus, each line here could create multiple problem strings or contribute to composite problem strings such as the final one.

gen_requirements,
REQUIREMENTS_BY_PIECE=TEST_REQUIREMENTS_BY_PIECE,
CONSTRAINTS=(
("unlisted", "~=3"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as above, I think it would be better if the problems were kept along with the input constraints that are being tested.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same issue though

Copy link
Member

@junrushao junrushao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@junrushao
Copy link
Member

Thanks everyone! I am going to merge it in tomorrow if there are no further comments :-)

@junrushao junrushao merged commit 1de98be into apache:main Feb 4, 2021
alexwong pushed a commit to alexwong/tvm that referenced this pull request Feb 11, 2021
* Generate requirements.txt from Python spec.

* add tests, collect actual requirements (first cut).

* add tornado and cloudpickle

* add xgboost

* add xgboost version restriction

* cleanup and prepare for merge

* black format

* add type annotations and docstrings

* remove example requirements.txt

* fix setup.py extras_require

* use typing. classes for type annotations, python 2 compatible :)

* fix python2 typing.Pattern

* retrigger CI

* address comaniac comments

* retrigger ci
Lokiiiiii pushed a commit to Lokiiiiii/tvm that referenced this pull request Mar 2, 2021
* Generate requirements.txt from Python spec.

* add tests, collect actual requirements (first cut).

* add tornado and cloudpickle

* add xgboost

* add xgboost version restriction

* cleanup and prepare for merge

* black format

* add type annotations and docstrings

* remove example requirements.txt

* fix setup.py extras_require

* use typing. classes for type annotations, python 2 compatible :)

* fix python2 typing.Pattern

* retrigger CI

* address comaniac comments

* retrigger ci
trevor-m pushed a commit to neo-ai/tvm that referenced this pull request Mar 2, 2021
* Generate requirements.txt from Python spec.

* add tests, collect actual requirements (first cut).

* add tornado and cloudpickle

* add xgboost

* add xgboost version restriction

* cleanup and prepare for merge

* black format

* add type annotations and docstrings

* remove example requirements.txt

* fix setup.py extras_require

* use typing. classes for type annotations, python 2 compatible :)

* fix python2 typing.Pattern

* retrigger CI

* address comaniac comments

* retrigger ci
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants