Replace `partial` applications of `Dataset` with `Dataset` subclasses #1145

gnn · 2023-08-09T22:36:30Z

This should finally enable documenting those subclasses.

[PEP 8][0] actually specifies a maximum line length of 72 characters for "flowing long blocks of text with fewer structural restrictions (docstrings or comments)". I extended that to 76 characters for indented docstrings, because the reason for having 72 characters is so that the eyes don't have to travel horizontally too much. Indentation basically moves the start of the text block to the right, so for text blocks like this, the line length limit can move with the indentation. Some of these lines where even beyond 79 characters, so I had to touch them anyways to make `flake8` happy and while at it, I also changed them to be (mostly) PEP 8 compliant. [0]: https://peps.python.org/pep-0008/#maximum-line-length

Moving `OsmBuildingsSynthetic.n_amenities_inside.type` into a variable is done to work around a bug in `black`. Without this, `black` would always reformat the line to be exactly 80 characters, even if a limit of 79 is set, making `flake8` complain.

This helps when specifying subclasses of `Dataset`. Since there was already a `Tasks` class inside the module, that class got renamed to `Tasks_` because it could not be used for the purposes of the type.

Make these subclasses `@dataclass`es, just like the original `Dataset`, so that default values for the fields are picked up by the constructor. That way we don't have to specify a constructor that calls the superclass constructor, but can rely on the default implementation for `__init__`. The `dependencies=[]` line present in nearly all uses of `partial` has been removed because it was unnecessary. The `dependencies` parameter is usually specified when the datasets are instantiated and even if not, an empty list of dependencies is the default anyways. Actually, an empty tuple is the default, but that doesn't make a difference. The `tasks=(define_mv_grid_districts)` line has been changed to drop the parenthesis because the parenthesis where superfluous in this case. The `tasks=(download_mastr_data,)` line has not been changed, because even though `Dataset`s should handle a callable and a tuple containing exactly one callable identically for the tasks attribute, I didn't want to risk introducing any bugs.

gnn added 6 commits August 9, 2023 18:20

Shorten commented line to be below 79 characters

a1423b9

Shorten line to below 79 characters

3aae1d4

Moving `OsmBuildingsSynthetic.n_amenities_inside.type` into a variable is done to work around a bug in `black`. Without this, `black` would always reformat the line to be exactly 80 characters, even if a limit of 79 is set, making `flake8` complain.

Reformat with black

e0a702b

Add a Tasks type alias to the datasets module

4562882

This helps when specifying subclasses of `Dataset`. Since there was already a `Tasks` class inside the module, that class got renamed to `Tasks_` because it could not be used for the purposes of the type.

gnn requested a review from birgits August 9, 2023 22:36

birgits approved these changes Aug 10, 2023

View reviewed changes

birgits merged commit ddadce1 into documentation/rli_datasets Aug 10, 2023
2 of 3 checks passed

gnn deleted the documentation/rli-datasets/replace-partial-with-class branch August 11, 2023 00:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace `partial` applications of `Dataset` with `Dataset` subclasses #1145

Replace `partial` applications of `Dataset` with `Dataset` subclasses #1145

gnn commented Aug 9, 2023

Replace partial applications of Dataset with Dataset subclasses #1145

Replace partial applications of Dataset with Dataset subclasses #1145

Conversation

gnn commented Aug 9, 2023

Replace `partial` applications of `Dataset` with `Dataset` subclasses #1145

Replace `partial` applications of `Dataset` with `Dataset` subclasses #1145