draft for pandas examples support #7

OriolAbril · 2024-06-22T17:54:13Z

First attempt at grammar extension to support
the examples in
https://pandas.pydata.org/docs/development/contributing_docstring.html#section-3-parameters.

The only one that doesn't parse is the float, decimal.Decimal or None, not sure if it
is possible to "look ahead" for an or or to start from the rightmost comma and try to parse
as type, if it works go ahead, otherwise move one comma to the left and try again.

That being said, dict of {str : int} parses everything but I it doesn't take into account
that left of the colon are key types right of the colon value types. I have no idea if this
should happen at a grammar level, python processing or both.

Lastly, I did some changes to literals to make sure there can be no confusion between
dict subtypes or literals (colons being inside the curly brackets being the only indicator
seemed like a bad idea). I think this is also a closer match to numpydoc, as from how I understand
the description, {} for literals should only be used when only a handful of options are allowed
and therefore is incompatible with type information of any kind.

lagru

Thanks @OriolAbril!

The only one that doesn't parse is the float, decimal.Decimal or None, not sure if it is possible to "look ahead" for an or or to start from the rightmost comma and try to parse as type, if it works go ahead, otherwise move one comma to the left and try again.

I don't think I want that to work. Especially since

float or decimal.Decimal or None, optional, extra info

is perfectly "human readable" and something like

float, decimal.Decimal or None, optional, extra info

not so much. 🤔

lagru · 2024-06-23T06:54:28Z

examples/example_pkg/_basic.py

+    Parameters
+    ----------
+    a1 : {"A", "B", "C"}
+    a2 : {0 or "index", 1 or "columns", None}, default None


This pandas's type syntax seems a bit dubious. I guess this is equivalent to

Suggested change

a2 : {0 or "index", 1 or "columns", None}, default None

a2 : {0, "index", 1, "columns", None}, default None

and the alternating or is for grouping of equivalent values?

This might be a case I'd leave a third party to configure itself and not support it directly in docstub.

I think they do use the or to indicate equivalent meaning literals, the comma to indicate different meaning literals. I have never used the or in literals though

lagru · 2024-06-23T07:00:44Z

src/docstub/doctype.lark

 ?start : doctype

-doctype : type_or ("," optional)? ("," extra_info)?
+doctype : (literals | type_or) ("," optional)? ("," extra_info)?


Lastly, I did some changes to literals to make sure there can be no confusion between
dict subtypes or literals (colons being inside the curly brackets being the only indicator
seemed like a bad idea). I think this is also a closer match to numpydoc, as from how I understand
the description, {} for literals should only be used when only a handful of options are allowed
and therefore is incompatible with type information of any kind.

Restricting literals to the top-level is probably sensible? Though, currently it's nice that something like

dict[{"a", "b"}, int] -> dict[Literals["a", "b"], int]

work. Do you find that readable?

Though,

dict of {{"a", "b"}: int} -> dict[Literals["a", "b"], int]

working is something. 😅

See 5a28828.

I had never seen nor considered that option, but now thinking about it there are a couple places I could use it. If you use it or feel strongly about it maybe we could use something similar to arrays for mappings in the sense a subset of names are allowed, and only if they are present can then curly brackets indicate two subtypes with colon. My guess is dict and mapping alone will cover 90% of the cases, maybe mutablemapping could also be there.

Plus a way to extend those names for both dict and array (to allow tensor for example in projects that use it)

maybe we could use something similar to arrays for mappings in the sense a subset of names are allowed

I think it might be more confusing if we restricted who can use the mapping of {KT: VT} syntax? 🤔

Happy to keep literals as top level option only

lagru · 2024-06-23T07:04:14Z

src/docstub/doctype.lark

-container_of : NAME "of" type_or
+container_of : NAME "of" ( type_or | dict_subtypes )
+
+dict_subtypes : "{" type_or ":" type_or "}"


Actually, you made me release that we can streamline this and get rid of dict_subtypes and even the existing container_of!

contains: "[" type_or ("," type_or)* "]" | "[" type_or "," PY_ELLIPSES "]" | "of" type | "of" "(" type_or ("," type_or)* ")" | "of" "{" type_or ":" type_or "}"

That setup also makes it so that one has to enclose in (...) to allow multiple types inside the container. That get's rid of ambiguity with the top-level "or".

(BTW amazing that GitHub highlights Lark syntax!)

See 3908f3f.

That is great, I'll also open an issue or PR to numpydoc itself with these at some point. I have never known how "list of int or float" is supposed to be interpreted (list of int) or float vs list of (int or float)

Intuitively I'd say (list of int) or float. I don't think numpydoc worries about those yet and maybe they don't need to.

Part of the aim behind docstub is also to create some kind of standard, with the understanding that "hey if you want something more custom you need to configure it yourself" .

I don't remember who but someone from NumPyDoc told me at some point they'd be happy to go with whatever recommendation docstub settles on.

lagru · 2024-06-23T07:08:13Z

src/docstub/doctype.lark

-container_of : NAME "of" type_or
+container_of : NAME "of" ( type_or | dict_subtypes )
+
+dict_subtypes : "{" type_or ":" type_or "}"


That being said, dict of {str : int} parses everything but I it doesn't take into account that left of the colon are key types right of the colon value types. I have no idea if this should happen at a grammar level, python processing or both.

It doesn't have to because Python's type annotation for dicts dict[key_type, value_type] only make the distinction whether a type is used for key or value by the order they appear. So as the order in {key_type : value_type} is the same we don't have to do anything.

lagru · 2024-06-23T10:08:17Z

Note, I'm opted to incorporate your suggestions and add them to the WIP #2. The classic PR-based contribution workflow may be a bit too clunky while I'm still very much refactoring and extending the prototype.

OriolAbril · 2024-06-23T13:02:52Z

Sounds great, just wanted to get the ball rolling.

I forgot to comment on the parsing of defaults, hou would you feel about allowing space in addition to the colon and equal? Thus changing to "default" ("=" | ":")? literal. All 3 are allowed and equivalent according to numpydoc. I am not sure parsing of defaults plays any role but figured I'd mention it

lagru · 2024-06-23T14:47:29Z

Happy to use "default" ("=" | ":")? literal. 👍

OriolAbril · 2024-06-28T08:35:11Z

I think this can be closed now. Let me know if at some point you want me to test the other PR

draft for pandas examples support

bdaae76

OriolAbril mentioned this pull request Jun 22, 2024

Library scope #6

Open

lagru reviewed Jun 23, 2024

View reviewed changes

OriolAbril closed this Jun 28, 2024

lagru added the enhancement New feature or functionality label Sep 19, 2024

	a2 : {0 or "index", 1 or "columns", None}, default None
	a2 : {0, "index", 1, "columns", None}, default None

draft for pandas examples support #7

draft for pandas examples support #7

Uh oh!

Conversation

OriolAbril commented Jun 22, 2024

Uh oh!

lagru left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lagru commented Jun 23, 2024

Uh oh!

OriolAbril commented Jun 23, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lagru commented Jun 23, 2024

Uh oh!

OriolAbril commented Jun 28, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

OriolAbril commented Jun 23, 2024 •

edited

Loading