-
Notifications
You must be signed in to change notification settings - Fork 6
draft for pandas examples support #7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @OriolAbril!
The only one that doesn't parse is the
float, decimal.Decimal or None, not sure if it is possible to "look ahead" for an or or to start from the rightmost comma and try to parse as type, if it works go ahead, otherwise move one comma to the left and try again.
I don't think I want that to work. Especially since
float or decimal.Decimal or None, optional, extra info
is perfectly "human readable" and something like
float, decimal.Decimal or None, optional, extra info
not so much. 🤔
| Parameters | ||
| ---------- | ||
| a1 : {"A", "B", "C"} | ||
| a2 : {0 or "index", 1 or "columns", None}, default None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This pandas's type syntax seems a bit dubious. I guess this is equivalent to
| a2 : {0 or "index", 1 or "columns", None}, default None | |
| a2 : {0, "index", 1, "columns", None}, default None |
and the alternating or is for grouping of equivalent values?
This might be a case I'd leave a third party to configure itself and not support it directly in docstub.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think they do use the or to indicate equivalent meaning literals, the comma to indicate different meaning literals. I have never used the or in literals though
| ?start : doctype | ||
|
|
||
| doctype : type_or ("," optional)? ("," extra_info)? | ||
| doctype : (literals | type_or) ("," optional)? ("," extra_info)? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lastly, I did some changes to literals to make sure there can be no confusion between
dict subtypes or literals (colons being inside the curly brackets being the only indicator
seemed like a bad idea). I think this is also a closer match to numpydoc, as from how I understand
the description, {} for literals should only be used when only a handful of options are allowed
and therefore is incompatible with type information of any kind.
Restricting literals to the top-level is probably sensible? Though, currently it's nice that something like
dict[{"a", "b"}, int] -> dict[Literals["a", "b"], int]
work. Do you find that readable?
Though,
dict of {{"a", "b"}: int} -> dict[Literals["a", "b"], int]
working is something. 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See 5a28828.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had never seen nor considered that option, but now thinking about it there are a couple places I could use it. If you use it or feel strongly about it maybe we could use something similar to arrays for mappings in the sense a subset of names are allowed, and only if they are present can then curly brackets indicate two subtypes with colon. My guess is dict and mapping alone will cover 90% of the cases, maybe mutablemapping could also be there.
Plus a way to extend those names for both dict and array (to allow tensor for example in projects that use it)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we could use something similar to arrays for mappings in the sense a subset of names are allowed
I think it might be more confusing if we restricted who can use the mapping of {KT: VT} syntax? 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happy to keep literals as top level option only
| container_of : NAME "of" type_or | ||
| container_of : NAME "of" ( type_or | dict_subtypes ) | ||
|
|
||
| dict_subtypes : "{" type_or ":" type_or "}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, you made me release that we can streamline this and get rid of dict_subtypes and even the existing container_of!
contains: "[" type_or ("," type_or)* "]"
| "[" type_or "," PY_ELLIPSES "]"
| "of" type
| "of" "(" type_or ("," type_or)* ")"
| "of" "{" type_or ":" type_or "}"That setup also makes it so that one has to enclose in (...) to allow multiple types inside the container. That get's rid of ambiguity with the top-level "or".
(BTW amazing that GitHub highlights Lark syntax!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See 3908f3f.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is great, I'll also open an issue or PR to numpydoc itself with these at some point. I have never known how "list of int or float" is supposed to be interpreted (list of int) or float vs list of (int or float)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Intuitively I'd say (list of int) or float. I don't think numpydoc worries about those yet and maybe they don't need to.
Part of the aim behind docstub is also to create some kind of standard, with the understanding that "hey if you want something more custom you need to configure it yourself" .
I don't remember who but someone from NumPyDoc told me at some point they'd be happy to go with whatever recommendation docstub settles on.
| container_of : NAME "of" type_or | ||
| container_of : NAME "of" ( type_or | dict_subtypes ) | ||
|
|
||
| dict_subtypes : "{" type_or ":" type_or "}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That being said,
dict of {str : int}parses everything but I it doesn't take into account that left of the colon are key types right of the colon value types. I have no idea if this should happen at a grammar level, python processing or both.
It doesn't have to because Python's type annotation for dicts dict[key_type, value_type] only make the distinction whether a type is used for key or value by the order they appear. So as the order in {key_type : value_type} is the same we don't have to do anything.
|
Note, I'm opted to incorporate your suggestions and add them to the WIP #2. The classic PR-based contribution workflow may be a bit too clunky while I'm still very much refactoring and extending the prototype. |
|
Sounds great, just wanted to get the ball rolling. I forgot to comment on the parsing of defaults, hou would you feel about allowing space in addition to the colon and equal? Thus changing to |
|
Happy to use |
|
I think this can be closed now. Let me know if at some point you want me to test the other PR |
First attempt at grammar extension to support
the examples in
https://pandas.pydata.org/docs/development/contributing_docstring.html#section-3-parameters.
The only one that doesn't parse is the
float, decimal.Decimal or None, not sure if itis possible to "look ahead" for an or or to start from the rightmost comma and try to parse
as type, if it works go ahead, otherwise move one comma to the left and try again.
That being said,
dict of {str : int}parses everything but I it doesn't take into accountthat left of the colon are key types right of the colon value types. I have no idea if this
should happen at a grammar level, python processing or both.
Lastly, I did some changes to literals to make sure there can be no confusion between
dict subtypes or literals (colons being inside the curly brackets being the only indicator
seemed like a bad idea). I think this is also a closer match to numpydoc, as from how I understand
the description,
{}for literals should only be used when only a handful of options are allowedand therefore is incompatible with type information of any kind.