Skip to content

StatementParser.parse_decimal delocalizes numbers the wrong way #67

Open
@lopter

Description

@lopter

Hello,

I noticed @andrewshadura ended up pulling that patch I wrote last year, which is really cool, thank you.

The reason why I didn't submit my patch back then was because, as Andrew noticed, (all) plugins need to be fixed, and I don't have a solution for that; but I wanna say that a monorepo approach would really help here, and I'm happy to talk about project management (not on this issue though).

Delocalizing shouldn't be the plugin responsibility since as we'll see it's very error prone. Indeed, @andrewshadura modified my patch to try to cope with localized numbers [1] but it doesn't work. For example: in the US a , is used to separate thousands while a . is used to separate the decimal part, therefore with the current code, a properly localized number would be converted to a bogus number with a bunch of dots.

IMO, a better approach would be to get rid of parse_float and parse_decimal, break all plugins on purpose, and introduce parse_amount since it's really what we are doing here:

import contextlib
import locale

from decimal import Decimal, Decimal as D
from typing import (
    Generator,
    Tuple,
)


@contextlib.contextmanager
def _override(categories: Tuple[int], loc: str) -> Generator[None, None, None]:
    saves = [locale.setlocale(c, loc) for c in categories]
    yield
    for i, c in enumerate(categories):
        locale.setlocale(c, saves[i])


def parse_amount(value: str, loc: str) -> Decimal:
    """Parse the given amount of money according to the given locale.

    .. note:: This function isn't thread-safe.
    """

    with _override((locale.LC_NUMERIC, locale.LC_MONETARY), loc):
        lconv = locale.localeconv()
        value = value.replace(lconv["currency_symbol"], "")
        return D(locale.delocalize(value))

Then we could have a some helpers to help plugins ask the user to configure the right locale and encoding for their system, since none of those identifiers are standard; but that's another can of worms.

Let me know what you think and thank you for writing ofxstatement!

[1] src/ofxstatement/parser.py#L63.

edit: I had a few things wrong and the original title sucked.

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions