Performance enhancements #73

ahankinson · 2024-08-13T15:24:22Z

It's great to see that this project has some life in it now. I have been using it for a while and maintaining my own fork with some performance enhancements.

Primarily, the main enhancements have been to enable packratting (which seems to now be enabled) and pre-declaring and compiling the regexes in the natural language processor.

If the current benchmarks are to be believed, this leads to a significant speedup. All the tests pass, so I guess it's OK?

https://rism-digital.github.io/python-edtf/dev/bench/

I've gone through and tried to integrate the latest changes, but that was a big PR so another set of eyes would be helpful.

Delivers significant performance improvements by caching previously computed results.

# Conflicts: # edtf/fields.py # edtf/jdutil.py # edtf/natlang/en.py # edtf/natlang/tests.py # edtf/parser/grammar.py # edtf/parser/parser_classes.py # edtf/parser/tests.py # pyproject.toml # setup.py

ahankinson · 2024-08-14T08:13:26Z

Not sure why it's failing -- the tests all pass and work on my end.

Also added Andrew Hankinson to the authors list in pyproject.toml

ahankinson · 2024-08-15T09:37:55Z

It occurs to me that a big part of the reason for the speed-up is the use of the functools lru_cache, which will cache the result of the same input string and skip the computation of that value, instead returning only the cached value. While this is useful in real-world contexts, it's probably not so useful in benchmark runs!

ahankinson · 2024-09-03T15:49:39Z

@aweakley does this PR look OK?

aweakley

This looks great! Thank you very much. I've just put a couple of comments - I think some things were duplicated in a merge at some point maybe? and one regex looks different.

edtf/parser/parser_classes.py

edtf/natlang/en.py

I've had a pass at the Parser Classes file, but there are a lot of problems still to be sorted out. I've added return types and argument types whereever it makes sense. The "UncertainOrApproximate" class is a hot mess. There are boolean values with property and method calls associated with them, and I would be surprised if it actually works. However, it doesn't seem to be tested or implemented, so I can't figure out where to go from here.

This wasn't actually used anywhere! Also removed a redundant regex group

ahankinson · 2024-09-12T13:48:01Z

I've done a bit of cleanup, but the parser_classes file is still quite buggy. Unfortunately I don't quite know how to trigger the bugs, but I'm at least confident that I didn't make it much worse.

aweakley · 2024-09-12T23:39:12Z

Thank you. I'll check it out.

aweakley · 2024-09-25T00:36:25Z

@ahankinson The tests are passing for me, are there particular bugs you were worried about in your comment above?

ahankinson · 2024-09-25T11:33:11Z

Yeah the tests pass but I don’t think they exercise the parser classes that much. I have the ruff checker running on that and it still highlights a bunch of stuff

ahankinson · 2025-01-17T10:13:36Z

BTW, ruff has also updated some rules so I have updated it accordingly now.

ahankinson · 2025-05-26T11:49:34Z

This seems to be going nowhere. I'm going to be adding more changes on my fork, so to prevent it from running here I'll close it.

aweakley · 2025-05-26T23:26:47Z

Gah! sorry @ahankinson , I missed your comments. I'd like to merge but if I reopen this I think I'll pick up your most recent changes on your branch - are you happy for me to merge them too?

Temporarily ignore UP031 pending merge of #73

ahankinson · 2025-05-27T06:43:28Z

Do you have any preference on whether I keep the lru_cache in the library? The performance improvements are significant, but I can understand not wanting a caching mechanism in a library.

The other changes I've been adding are fairly benign, so I'm happy for them to be merged.

Also removed the one regex and replaced it with the "replace" method on strings.

aweakley · 2025-05-30T04:56:37Z

I think let's keep it in please.

ahankinson · 2025-05-31T07:59:45Z

I probably won’t be able to check it over until next week. I’m the meantime if you see something you want removed let me know.

ahankinson · 2025-06-02T10:04:40Z

One thing I did in my branch is remove support in the CI for python 3.9, since a) it's EOL later this year, and b) from 3.10 onwards the type annotation Optional[foo] can be written foo | None, which I prefer.

# Conflicts: # .github/workflows/ci.yml

ahankinson · 2025-06-02T14:00:36Z

This should be ready for review now. I'm not sure why it's failing though.

aweakley · 2025-06-02T23:34:03Z

That's great, thank you very much indeed.

aweakley · 2025-06-02T23:43:41Z

I think the change to the signature here:

python-edtf/edtf/parser/grammar.py

Lines 346 to 351 in 95ebb72

    
           def parse_edtf( 
        
               input_string: str, 
        
               parse_all: bool = True, 
        
               fail_silently: bool = False, 
        
               debug: bool | None = None, 
        
           ):

(parseAll->parse_all) means that we should release this as a new major version, so I'll prepare that.

ahankinson and others added 15 commits April 13, 2021 18:52

Enable packratting for pyparser

86b9451

Delivers significant performance improvements by caching previously computed results.

ixc#37 update for Django 3.x compat

7fdf8dd

Minor updates

6e4a627

Update dependency management

80fdd60

Deps

c12d759

Optimized regexes

6e508d0

Package updates

f2252f0

Further optimizations

06ab934

Update gitignore

c9cb56f

Black formatting, updates

9e51373

Update imports

1aa53cf

Merge branch 'main' into performance-enhancements

ddd8f7b

# Conflicts: # edtf/fields.py # edtf/jdutil.py # edtf/natlang/en.py # edtf/natlang/tests.py # edtf/parser/grammar.py # edtf/parser/parser_classes.py # edtf/parser/tests.py # pyproject.toml # setup.py

Merge fixes

8c4f968

ruff formatting

6f08bce

Remove accidentally committed poetry file

973ccf4

Fixed: f-string formatting

ee450a5

Also added Andrew Hankinson to the authors list in pyproject.toml

aweakley requested changes Sep 4, 2024

View reviewed changes

edtf/parser/parser_classes.py Outdated Show resolved Hide resolved

edtf/parser/parser_classes.py Outdated Show resolved Hide resolved

edtf/parser/parser_classes.py Outdated Show resolved Hide resolved

edtf/natlang/en.py Outdated Show resolved Hide resolved

ahankinson added 3 commits September 12, 2024 15:40

Fixed: return type of statement

46bdce6

Fixed: Remove SHORT_YEAR_RE

add79bd

This wasn't actually used anywhere! Also removed a redundant regex group

ahankinson added 3 commits September 12, 2024 15:50

Problem with f-string

fee0b64

Another f-string fix

89f3692

Fixed: pyproject errors

9da1d94

ahankinson added 2 commits January 21, 2025 10:02

reinstate lru cache

f7aeddb

Updates to typing etc.

4885de5

ahankinson closed this May 26, 2025

ahankinson added 4 commits May 26, 2025 13:49

Update GH actions

98bfe36

New: Add a validator helper function

af98f87

Add validator to init

f97b627

Rename validator

ae82b11

aweakley added a commit that referenced this pull request May 27, 2025

Update pyproject.toml

0cbccef

Temporarily ignore UP031 pending merge of #73

ahankinson added 3 commits May 27, 2025 10:58

Annotate appsettings

86b1546

parseString is an alias to parse_string

a771ec2

More fixes for correctness

24a5f60

Also removed the one regex and replaced it with the "replace" method on strings.

aweakley reopened this May 31, 2025

ahankinson added 3 commits June 2, 2025 11:57

Try 3.10

df15fd8

More type annotations

6a91fa0

Update supported python in pyproject

d3d0cd5

Fixed: UA is a single state, no need for append

9bd142d

ahankinson force-pushed the performance-enhancements branch from 4ad6171 to 9bd142d Compare June 2, 2025 10:09

ahankinson added 2 commits June 2, 2025 12:14

Add mypy and pip to test dependencies

517ba18

Merge branch 'main' into performance-enhancements

1c480b0

# Conflicts: # .github/workflows/ci.yml

aweakley merged commit 95ebb72 into ixc:main Jun 2, 2025
0 of 4 checks passed

Performance enhancements #73

Performance enhancements #73

Uh oh!

Conversation

ahankinson commented Aug 13, 2024

Uh oh!

ahankinson commented Aug 14, 2024

Uh oh!

ahankinson commented Aug 15, 2024

Uh oh!

ahankinson commented Sep 3, 2024

Uh oh!

aweakley left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ahankinson commented Sep 12, 2024

Uh oh!

aweakley commented Sep 12, 2024

Uh oh!

aweakley commented Sep 25, 2024

Uh oh!

ahankinson commented Sep 25, 2024

Uh oh!

ahankinson commented Jan 17, 2025

Uh oh!

ahankinson commented May 26, 2025

Uh oh!

aweakley commented May 26, 2025

Uh oh!

ahankinson commented May 27, 2025

Uh oh!

aweakley commented May 30, 2025

Uh oh!

ahankinson commented May 31, 2025

Uh oh!

ahankinson commented Jun 2, 2025

Uh oh!

ahankinson commented Jun 2, 2025

Uh oh!

aweakley commented Jun 2, 2025

Uh oh!

Uh oh!

aweakley commented Jun 2, 2025

Uh oh!

Uh oh!

aweakley left a comment •

edited

Loading