Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-128185: Align the grammar of the Decimal string constructor with float's #128315

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

HarryLHW
Copy link
Contributor

@HarryLHW HarryLHW commented Dec 28, 2024

This PR

  • removes the mention of underscores in the text
  • changes digits from digit [digit]... to (digit | "_")* digit (digit | "_")*
  • makes it a .. productionlist::

📚 Documentation preview 📚: https://cpython-previews--128315.org.readthedocs.build/en/128315/library/decimal.html#decimal.Decimal

Copy link
Contributor

@picnixz picnixz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually tested something and saw that Decimal("NaN__") is accepted as being a NaN. So, the grammar of NaN should also be changed. Initially, I thought that the grammar would be improved, but it might be easier to simply keep the mention in the text =/ (sorry @skirpichev, but you were probably right).

However, we can definitely use a production list instead.

Doc/library/decimal.rst Outdated Show resolved Hide resolved
exponentpart: indicator [sign] digits
infinity: "Infinity" | "Inf"
nan: "NaN" [`digits`] | "sNaN" [`digits`]
numericvalue: `decimalpart` [`exponentpart`] | `infinity`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do Sphinx production lists support numeric-value instead or should it be without any weird symbol? if not, use _ instead of - to separate words.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sphinx does not support -, but it does support _.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yet another thing I should perhaps add to my list of things to fix upstream... Thanks for checking.

HarryLHW and others added 2 commits December 29, 2024 00:37
Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>
@HarryLHW
Copy link
Contributor Author

HarryLHW commented Dec 28, 2024

I actually tested something and saw that Decimal("NaN__") is accepted as being a NaN. So, the grammar of NaN should also be changed. Initially, I thought that the grammar would be improved, but it might be easier to simply keep the mention in the text =/ (sorry @skirpichev, but you were probably right).

However, we can definitely use a production list instead.

The implementation is

for (; j < len; j++) {
ch = PyUnicode_READ(kind, data, j);
if (ignore_underscores && ch == '_') {
continue;
}

If I am correct, this ignores every '_' in the string no matter where it is, similar to string.replace('_', ''). Decimal('__I__N__F__') and Decimal('__N__A__N__') are also valid. It is difficult to generalize an expression for NaN or Infinity.

EDIT:
It is not difficult. The expression could be "_"* "N" "_"* "a" "_"* "N" "_"*, but very confusing.

@picnixz
Copy link
Contributor

picnixz commented Dec 28, 2024

If I am correct, this ignores every '' in the string no matter where it is, similar to string.replace('', ''). Decimal('I__N__F') and Decimal('N__A__N') are also valid. It is difficult to generalize an expression for NaN or Infinity.

Considering this, I think we should keep the previous text. Sorry for this blunder but I didn't know that it also applied to NaN. However, we can use a production list.

@HarryLHW
Copy link
Contributor Author

If I am correct, this ignores every '' in the string no matter where it is, similar to string.replace('', ''). Decimal('I__N__F') and Decimal('N__A__N') are also valid. It is difficult to generalize an expression for NaN or Infinity.

Considering this, I think we should keep the previous text. Sorry for this blunder but I didn't know that it also applied to NaN. However, we can use a production list.

OK. Should I keep the underscores in digits?
digits ::= (digit | "_")* digit (digit | "_")*

@picnixz
Copy link
Contributor

picnixz commented Dec 28, 2024

OK. Should I keep the underscores in digits?

No, let's just revert the changes. It would be more confusing if we keep it in the grammar (especially considering how the pattern would actually read). Just transform the current grammar into a production list. However, before that, I'll ask some docs expert for their opinion @ncoghlan @willingc (the problem with me is that I wouldn't mind the change but we're already saying "we trim whitespaces" so we're already "pre-processing" the input to ease the grammar specifications).

Copy link
Member

@skirpichev skirpichev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As it was already noted in the issue thread, I don't think there is a bug.

New docs are inaccurate, as already noted by Benedict. While it's possible to "fix" this, I doubt that will be an improvement: new grammar will be too complex, perhaps misleading and reveal implementation details, which should be rather hidden.

@picnixz
Copy link
Contributor

picnixz commented Dec 29, 2024

To summarize (unless our docs experts advise differently):

  • let's not change the grammar or the text
  • but let's change the grammar markup into a production list markup

I would recommend merging #128323 first and then this one or do both in one PR.

Copy link
Member

@skirpichev skirpichev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, except for few grammar fixes.

Though, maybe we want instead reference terms from other grammars, like https://docs.python.org/3/reference/lexical_analysis.html#floating-point-literals does (almost everything is same as for grammar in the float() docs: https://docs.python.org/3/library/functions.html#float).

But lets see what @picnixz think on this first. Maybe for readability it's better to repeat all in decimal docs.

Doc/library/decimal.rst Outdated Show resolved Hide resolved
infinity: "Infinity" | "Inf"
nan: "NaN" [`digits`] | "sNaN" [`digits`]
numeric_value: `decimal_part` [`exponent_part`] | `infinity`
numeric_string: [`sign`] `numeric_value` | [`sign`] `nan`

Other Unicode decimal digits are also permitted where ``digit``
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here you can reference grammar term instead. Though, this may add a conflict with #128323, so - you can include that change here as well.

Doc/library/decimal.rst Outdated Show resolved Hide resolved
HarryLHW and others added 2 commits December 31, 2024 22:35
Co-authored-by: Sergey B Kirpichev <skirpichev@gmail.com>
Co-authored-by: Sergey B Kirpichev <skirpichev@gmail.com>
@picnixz
Copy link
Contributor

picnixz commented Dec 31, 2024

Maybe for readability it's better to repeat all in decimal docs.

I don't have a strong preference. For instance, I personally prefer digit [digit]... instead of (digit)+ but it's a matter of taste. Now, since we're far from the page about floats, maybe it's better to keep the verbose 0 | 1 | ... etc?

@skirpichev
Copy link
Member

since we're far from the page about floats, maybe it's better to keep the verbose

Yes that were my thoughts as well. I think it's ok to repeat grammar terms, even if they introduced before on other pages.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting review docs Documentation in the Doc dir skip news
Projects
Status: Todo
Development

Successfully merging this pull request may close these issues.

3 participants