Skip to content

[mypyc] Add str.isdigit() primitive#20893

Open
VaggelisD wants to merge 1 commit intopython:masterfrom
VaggelisD:str_isdigit
Open

[mypyc] Add str.isdigit() primitive#20893
VaggelisD wants to merge 1 commit intopython:masterfrom
VaggelisD:str_isdigit

Conversation

@VaggelisD
Copy link
Contributor

Similar issue as the str.isalnum() PR, for large enough strings the primitive introduces a perf regression but is generally faster on common cases:


All-digit strings (100M calls each) Python (s) mypyc (s) Speedup
length 1 ('0') 2.089 0.656 3.2x
length 10 ('1234567890') 2.475 1.028 2.4x
length 100 ('5' * 100) 6.106 3.406 1.8x
length 1 (UCS-2: U+0660 ٠) 2.110 0.734 2.9x
length 10 (UCS-2: U+0660 * 10) 2.907 2.041 1.4x
length 100 (UCS-2: U+0660 * 100) 10.596 13.887 0.8x

Non-digit strings (100M calls each) Python (s) mypyc (s) Speedup
length 1 ('a') 2.022 0.585 3.5x
length 100 ('a' * 100) 2.068 0.587 3.5x
length 100 ('0' * 99 + 'a') 7.304 3.474 2.1x



bool CPyStr_IsDigit(PyObject *str) {
Py_ssize_t len = PyUnicode_GET_LENGTH(str);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a recurring pattern for these primitives, should we try to abstract their codegen?

Gave macros a shot for to hide the per-kind for loop, though we could go a step further and do the same for entire functions I guess.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One option would be to have an inline function which gets passed a function pointer to represent the variable functionality, on the assumption that C compilers can simplify all the overhead away (not sure if this is the case, but it might well be).

Another idea would be to add a template for all of these functions in a comment at the top of the file, and we could just ask Claude Code or Codex to create another function based on the template for a new use case. And if we update the template, we could use a coding agent to update all instances of the template in the code. The problem with this is that there would be no automatic validation against things being consistent, but we could add some comments warning against manual edits.

@VaggelisD
Copy link
Contributor Author

VaggelisD commented Feb 25, 2026

Not entirely sure why this specific CI/CD job fails here, seems to be a network issue?

Was probably one, fixed after rebasing.

Copy link
Collaborator

@JukkaL JukkaL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! Looks good, but there are now some merge conflicts. Can you fix them?



bool CPyStr_IsDigit(PyObject *str) {
Py_ssize_t len = PyUnicode_GET_LENGTH(str);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One option would be to have an inline function which gets passed a function pointer to represent the variable functionality, on the assumption that C compilers can simplify all the overhead away (not sure if this is the case, but it might well be).

Another idea would be to add a template for all of these functions in a comment at the top of the file, and we could just ask Claude Code or Codex to create another function based on the template for a new use case. And if we update the template, we could use a coding agent to update all instances of the template in the code. The problem with this is that there would be no automatic validation against things being consistent, but we could add some comments warning against manual edits.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants