-
Notifications
You must be signed in to change notification settings - Fork 108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
matching.py: Fix incorrect fuzzy amount loop bounds #210
Conversation
It seems like the tests are failing due to unstable ordering of the posting meta keys. I saw the same issue when running locally on master with Python 3.11. |
I think the recent release of Beancount 2.3.6 broke the tests. Here's the diff between releases and the relevant PR: beancount/beancount#726 I'll submit an empty PR to confirm, then a PR to unbreak. |
Looks like it's still failing after the metadata sorting PR got merged. |
e422bda
to
5bb05bb
Compare
beancount_import/matching.py
Outdated
@@ -443,7 +443,8 @@ def _get_matches( | |||
if cur_matches is not None: | |||
lower_bound = bisect.bisect_left(cur_matches, (lower, tuple(), None, None)) | |||
upper_bound = bisect.bisect_right(cur_matches, (upper, (sys.maxsize,), None, None), lo=lower_bound) | |||
for sp in cur_matches[lower_bound-1 if lower_bound > 0 else 0:upper_bound]: | |||
for sp in cur_matches[lower_bound:upper_bound]: | |||
assert abs(sp.number - amount.number) <= self.fuzzy_match_amount |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this assert have an easy informative error message?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added the following message:
assert abs(sp.number - amount.number) <= self.fuzzy_match_amount, (
f'Bug in matching algorithm: {sp} is not within '
+ f'{self.fuzzy_match_amount} of {amount}')
5bb05bb
to
273b35f
Compare
I forgot to ask, but do you understand why this off-by-one error slipped through in previous PRs? Is it easy to construct a test that exercises just that logic? |
Good question, I was unclear about that as well. It turns out this was because the existing nonmatch test only exercises the case where the target posting is at the beginning of the list of candidates (i.e. In turn that's because the The new test instead has an You can see this by adding the following print statements: modified beancount_import/matching.py
@@ -444,7 +444,22 @@ class PostingDatabase(object):
if cur_matches is not None:
lower_bound = bisect.bisect_left(cur_matches, (lower, tuple(), None, None))
upper_bound = bisect.bisect_right(cur_matches, (upper, (sys.maxsize,), None, None), lo=lower_bound)
- for sp in cur_matches[lower_bound:upper_bound]:
+
+ print()
+ print(f'__get_matches({account}, {date}, {amount}): '
+ + f'considering {len(cur_matches)} date/currency matches')
+ printer = beancount.parser.printer.EntryPrinter()
+ for i in range(0, lower_bound-1 if lower_bound > 0 else 0):
+ print(f' Not considering match below lower bound, at {i}: '
+ + f'{printer(cur_matches[i].mp.posting)}')
+ for i in range(lower_bound-1 if lower_bound > 0 else 0, upper_bound):
+ print(f' Considering match within bounds, at {i}: '
+ + f'{printer(cur_matches[i].mp.posting)}')
+ for i in range(upper_bound, len(cur_matches)):
+ print(f' Not considering match above upper bound, at {i}: '
+ + f'{printer(cur_matches[i].mp.posting)}')
+
+ for sp in cur_matches[lower_bound-1 if lower_bound > 0 else 0:upper_bound]:
assert abs(sp.number - amount.number) <= self.fuzzy_match_amount, (
f'Bug in matching algorithm: {sp} is not within '
+ f'{self.fuzzy_match_amount} of {amount}') I simplified the new test and documented this in a comment. |
The indexing logic in
matching.py
has an off-by-one bug that manifests as #202.We currently modify the lower bound of the loop to include an element less than the lower bound:
Removing that fixes the problem:
This approach is an alternative to #206.
In addition to the loop bounds fix, this PR adds the following:
test_nonmatch_fuzzy_amount_with_dates
: a repro for transactions matched that shouldn't be #202. It previously failed and now passes.test_match_fuzzy_amount_upper_bound
: exercises the loop's upper bound handling. Injecting an off-by-one error in the upper bound by changingupper_bound
toupper_bound-1
causes this test to fail.Fixes #202.