Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace partial applications of Dataset with Dataset subclasses #1145

Conversation

gnn
Copy link
Collaborator

@gnn gnn commented Aug 9, 2023

This should finally enable documenting those subclasses.

gnn added 6 commits August 9, 2023 18:20
[PEP 8][0] actually specifies a maximum line length of 72 characters for
"flowing long blocks of text with fewer structural restrictions
(docstrings or comments)". I extended that to 76 characters for indented
docstrings, because the reason for having 72 characters is so that the
eyes don't have to travel horizontally too much. Indentation basically
moves the start of the text block to the right, so for text blocks like
this, the line length limit can move with the indentation.
Some of these lines where even beyond 79 characters, so I had to touch
them anyways to make `flake8` happy and while at it, I also changed them
to be (mostly) PEP 8 compliant.

[0]: https://peps.python.org/pep-0008/#maximum-line-length
Moving `OsmBuildingsSynthetic.n_amenities_inside.type` into a variable
is done to work around a bug in `black`. Without this, `black` would
always reformat the line to be exactly 80 characters, even if a limit of
79 is set, making `flake8` complain.
This helps when specifying subclasses of `Dataset`. Since there was
already a `Tasks` class inside the module, that class got renamed to
`Tasks_` because it could not be used for the purposes of the type.
Make these subclasses `@dataclass`es, just like the original `Dataset`,
so that default values for the fields are picked up by the constructor.
That way we don't have to specify a constructor that calls the
superclass constructor, but can rely on the default implementation for
`__init__`.
The `dependencies=[]` line present in nearly all uses of `partial` has
been removed because it was unnecessary. The `dependencies` parameter is
usually specified when the datasets are instantiated and even if not, an
empty list of dependencies is the default anyways. Actually, an empty
tuple is the default, but that doesn't make a difference.
The `tasks=(define_mv_grid_districts)` line has been changed to drop the
parenthesis because the parenthesis where superfluous in this case.
The `tasks=(download_mastr_data,)` line has not been changed, because
even though `Dataset`s should handle a callable and a tuple
containing exactly one callable identically for the tasks attribute, I
didn't want to risk introducing any bugs.
@gnn gnn requested a review from birgits August 9, 2023 22:36
@birgits birgits merged commit ddadce1 into documentation/rli_datasets Aug 10, 2023
2 of 3 checks passed
@gnn gnn deleted the documentation/rli-datasets/replace-partial-with-class branch August 11, 2023 00:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants