Skip to content

Non-deterministic (and sometimes incorrect) finding of BaseModel classes #117

@slafs

Description

@slafs

When trying to run BP001 (add default None) on our project, I've noticed something interesting.
The part when the tool is "Looking for Pydantic Models" with ClassDefVisitor yields different results when running more than once. I think I've managed to boil it down to a minimal example.

Setup

Lets consider 4 simple files: a.py, common.py, utils.py and z.py like so:
==> common.py <==

from pydantic import BaseModel

class NewBaseModel(BaseModel):
    foo: str | None

==> utils.py <==

from common import NewBaseModel

class CustomBaseModel(NewBaseModel):
    bar: str | None

==> a.py <==

from utils import CustomBaseModel

class AModel(CustomBaseModel):
    a: str | None

==> z.py <==

from utils import CustomBaseModel

class ZModel(CustomBaseModel):
    z: str | None

(apologies for terrible and potentially unintuitive naming 😅)

Requirements

There are 2 important things to note here:

  • There are at least 2 levels of inheritance from pydantic's BaseModel. That is: z.ZModel and a.AModel -> utils.CustomBaseModel -> common.NewBaseModel -> pydantic.BaseModel. Apparently this is important as the problem doesn't seem to occur when we get rid of one level (common.NewBaseModel for example).
  • Two "normal" modules, a and z, are listed respectively before and after the utility modules (common and utils) when processing.

Expected outcome

Running bump-pydantic . on such "project" should add None as a default value for fields in all 4 modules.

Observed outcome

But it turns out it does that in (more or less) half of the runs. I've run the tool 100 times and only 47 of them ended up in modifying all four files, 15 times it modified 3 of them (common.py, utils.py and z.py) and 38 times only 2 of them (common.py and utils.py).

Problem research

I see two issues here:

  1. non-deterministic behaviour and
  2. incorrect finding of BaseModel classes.

Looks like the problem occurs because of the way the queue is being constructed and the way files are being added to it later in

missing_files = set(files) - visited
if not queue and missing_files:
queue.append(next(iter(missing_files)))

since set by nature is unsorted, files will be added to the queue in a "random" order.
Although a.py will be processed first every time.
Fixing that non-determinism should be relatively simple. For example, we could make the queue initially the same as files, appendleft files coming from visitor.next_file and get rid of the missing_files bit.

Now, once we fix that, AModel will never get marked as a pydantic model, at least with the way I've fixed the first issue. Which I haven't figured out why, yet. I mean, it does get marked correctly ~50% of the times currently, so... 🤷. I haven't fully grokked the ClassDefVisitor code yet. Seems like there is some consideration for those cases there already.

For solutions I was considering either going through all of the files twice or allowing callers to specify fqns of custom base models.

I can open a PR to better demonstrate what I mean here.
This is what I mean about that non-determinism fix: 13bfe61 (from #118).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions