Skip to content

Performance enhancements #73

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 42 commits into from
Jun 2, 2025
Merged

Conversation

ahankinson
Copy link
Contributor

It's great to see that this project has some life in it now. I have been using it for a while and maintaining my own fork with some performance enhancements.

Primarily, the main enhancements have been to enable packratting (which seems to now be enabled) and pre-declaring and compiling the regexes in the natural language processor.

If the current benchmarks are to be believed, this leads to a significant speedup. All the tests pass, so I guess it's OK?

https://rism-digital.github.io/python-edtf/dev/bench/

I've gone through and tried to integrate the latest changes, but that was a big PR so another set of eyes would be helpful.

ahankinson and others added 15 commits April 13, 2021 18:52
Delivers significant performance improvements by caching previously computed results.
# Conflicts:
#	edtf/fields.py
#	edtf/jdutil.py
#	edtf/natlang/en.py
#	edtf/natlang/tests.py
#	edtf/parser/grammar.py
#	edtf/parser/parser_classes.py
#	edtf/parser/tests.py
#	pyproject.toml
#	setup.py
@ahankinson
Copy link
Contributor Author

Not sure why it's failing -- the tests all pass and work on my end.

Also added Andrew Hankinson to the authors list in pyproject.toml
@ahankinson
Copy link
Contributor Author

It occurs to me that a big part of the reason for the speed-up is the use of the functools lru_cache, which will cache the result of the same input string and skip the computation of that value, instead returning only the cached value. While this is useful in real-world contexts, it's probably not so useful in benchmark runs!

@ahankinson
Copy link
Contributor Author

@aweakley does this PR look OK?

Copy link
Member

@aweakley aweakley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! Thank you very much. I've just put a couple of comments - I think some things were duplicated in a merge at some point maybe? and one regex looks different.

I've had a pass at the Parser Classes file, but there are a lot of problems still to be sorted out.

I've added return types and argument types whereever it makes sense.

The "UncertainOrApproximate" class is a hot mess. There are boolean values with property and method calls associated with them, and I would be surprised if it actually works. However, it doesn't seem to be tested or implemented, so I can't figure out where to go from here.
This wasn't actually used anywhere!

Also removed a redundant regex group
@ahankinson
Copy link
Contributor Author

I've done a bit of cleanup, but the parser_classes file is still quite buggy. Unfortunately I don't quite know how to trigger the bugs, but I'm at least confident that I didn't make it much worse.

@aweakley
Copy link
Member

Thank you. I'll check it out.

@aweakley
Copy link
Member

@ahankinson The tests are passing for me, are there particular bugs you were worried about in your comment above?

@ahankinson
Copy link
Contributor Author

Yeah the tests pass but I don’t think they exercise the parser classes that much. I have the ruff checker running on that and it still highlights a bunch of stuff

@ahankinson
Copy link
Contributor Author

BTW, ruff has also updated some rules so I have updated it accordingly now.

@ahankinson
Copy link
Contributor Author

This seems to be going nowhere. I'm going to be adding more changes on my fork, so to prevent it from running here I'll close it.

@ahankinson ahankinson closed this May 26, 2025
@aweakley
Copy link
Member

Gah! sorry @ahankinson , I missed your comments. I'd like to merge but if I reopen this I think I'll pick up your most recent changes on your branch - are you happy for me to merge them too?

aweakley added a commit that referenced this pull request May 27, 2025
Temporarily ignore UP031 pending merge of #73
@ahankinson
Copy link
Contributor Author

Do you have any preference on whether I keep the lru_cache in the library? The performance improvements are significant, but I can understand not wanting a caching mechanism in a library.

The other changes I've been adding are fairly benign, so I'm happy for them to be merged.

Also removed the one regex and replaced it with the "replace" method on strings.
@aweakley
Copy link
Member

I think let's keep it in please.

@aweakley aweakley reopened this May 31, 2025
@ahankinson
Copy link
Contributor Author

I probably won’t be able to check it over until next week. I’m the meantime if you see something you want removed let me know.

@ahankinson
Copy link
Contributor Author

One thing I did in my branch is remove support in the CI for python 3.9, since a) it's EOL later this year, and b) from 3.10 onwards the type annotation Optional[foo] can be written foo | None, which I prefer.

@ahankinson ahankinson force-pushed the performance-enhancements branch from 4ad6171 to 9bd142d Compare June 2, 2025 10:09
@ahankinson
Copy link
Contributor Author

This should be ready for review now. I'm not sure why it's failing though.

@aweakley
Copy link
Member

aweakley commented Jun 2, 2025

That's great, thank you very much indeed.

@aweakley aweakley merged commit 95ebb72 into ixc:main Jun 2, 2025
0 of 4 checks passed
@aweakley
Copy link
Member

aweakley commented Jun 2, 2025

I think the change to the signature here:

def parse_edtf(
input_string: str,
parse_all: bool = True,
fail_silently: bool = False,
debug: bool | None = None,
):
(parseAll->parse_all) means that we should release this as a new major version, so I'll prepare that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants