-
Notifications
You must be signed in to change notification settings - Fork 28
Description
When trying to run BP001 (add default None) on our project, I've noticed something interesting.
The part when the tool is "Looking for Pydantic Models" with ClassDefVisitor yields different results when running more than once. I think I've managed to boil it down to a minimal example.
Setup
Lets consider 4 simple files: a.py, common.py, utils.py and z.py like so:
==> common.py <==
from pydantic import BaseModel
class NewBaseModel(BaseModel):
foo: str | None==> utils.py <==
from common import NewBaseModel
class CustomBaseModel(NewBaseModel):
bar: str | None==> a.py <==
from utils import CustomBaseModel
class AModel(CustomBaseModel):
a: str | None==> z.py <==
from utils import CustomBaseModel
class ZModel(CustomBaseModel):
z: str | None(apologies for terrible and potentially unintuitive naming 😅)
Requirements
There are 2 important things to note here:
- There are at least 2 levels of inheritance from pydantic's
BaseModel. That is:z.ZModelanda.AModel->utils.CustomBaseModel->common.NewBaseModel->pydantic.BaseModel. Apparently this is important as the problem doesn't seem to occur when we get rid of one level (common.NewBaseModelfor example). - Two "normal" modules,
aandz, are listed respectively before and after the utility modules (commonandutils) when processing.
Expected outcome
Running bump-pydantic . on such "project" should add None as a default value for fields in all 4 modules.
Observed outcome
But it turns out it does that in (more or less) half of the runs. I've run the tool 100 times and only 47 of them ended up in modifying all four files, 15 times it modified 3 of them (common.py, utils.py and z.py) and 38 times only 2 of them (common.py and utils.py).
Problem research
I see two issues here:
- non-deterministic behaviour and
- incorrect finding of BaseModel classes.
Looks like the problem occurs because of the way the queue is being constructed and the way files are being added to it later in
bump-pydantic/bump_pydantic/main.py
Lines 116 to 118 in 393dc89
| missing_files = set(files) - visited | |
| if not queue and missing_files: | |
| queue.append(next(iter(missing_files))) |
since
set by nature is unsorted, files will be added to the queue in a "random" order.Although
a.py will be processed first every time.Fixing that non-determinism should be relatively simple. For example, we could make the queue initially the same as
files, appendleft files coming from visitor.next_file and get rid of the missing_files bit.
Now, once we fix that, AModel will never get marked as a pydantic model, at least with the way I've fixed the first issue. Which I haven't figured out why, yet. I mean, it does get marked correctly ~50% of the times currently, so... 🤷. I haven't fully grokked the ClassDefVisitor code yet. Seems like there is some consideration for those cases there already.
For solutions I was considering either going through all of the files twice or allowing callers to specify fqns of custom base models.
I can open a PR to better demonstrate what I mean here.
This is what I mean about that non-determinism fix: 13bfe61 (from #118).