-
-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Build: simplify _config
attribute data modelling
#10586
Comments
Another important benefit that I'm seeing here about making querying our database easier: while working on the deprecation of config file and readthedocs.org/readthedocs/projects/tasks/utils.py Lines 235 to 251 in 1b74999
Instead, we would be able to do all that in just one query: Project.objects.filter(
builds__version__slug=F("default_version"),
builds__date__gt=timezone.now() - timezone.timedelta(days=365),
builds__success=True,
builds___config__build__image="readthedocs/build:latest",
)
.distinct() That query will give me "projects building with Note that with the current structure, we won't get the accurate data because those builds where the config file is identical to their previous build will have |
Yeah, it seems like our implementation is just a hacky way of doing a m2m, without any of the nice benefits. This seems like an obvious thing to move forward on, I don't really see a downside. I think the biggest issue will be performance on the query, since we have to do a join, but should be fine and similar to our existing approach. |
This is true. The performance will be similar to the |
Plan: Move forward with the refactor, store both the raw user config, and the rendered config. The rendered config is the most important, because it captures the defaults that were defined at the time of the build, which can change over time. |
We are currently using a
models.JSONField
for theBuild._config
attribute. This field can be used in two different ways:This is basically to avoid duplicating the data on our database. However, it only avoids duplicated config on the same projects and only for consecutive builds.
I found this approach pretty confusing when exploring
Build
objects because we have to do extra operations to find out the real YAML used for that build (see for example theconfig @property
atreadthedocs.org/readthedocs/builds/models.py
Lines 313 to 331 in 12cbb3d
Also, it makes hard to answer questions like "What are the projects using
build.os: ubuntu-22.04
?" in an easy way and many other similar questions that require querying the config file.New approach proposal
Due to these limitations and complications, I want to propose creating a new model called
BuildConfig
that's used as a foreign key from theBuild
model:Then the new model would be:
Benefits
BuildConfig.objects.get_or_create(data=...)
).select_related("readthedocs_yaml_data")
BuildConfig.objects.filter(data__build__os="ubuntu-22.04").projects.count()
to answer my previous example question)@property
to get the YAML for a particular build@config.setter
helperBuild.save
self._config_changed
Summarizing, this approach keeps the same features but simplifies the application and modelling and gives us more features to analyze in an easier way platform usage and make better decisions based on them.
Rollout proposal
Build.readthedocs_yaml_data
field and create the newBuildConfig
modelBuild.readthedocs_yaml_data
andBuild._config
while doing the migrationBuild._config
intoBuildConfig
and linkBuild.readthedocs_yaml_data
to themBuild.readthedocs_yaml_data
fieldBuild._config
fieldThe text was updated successfully, but these errors were encountered: