Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate more model's IDs to bigint #9492

Open
stsewd opened this issue Aug 10, 2022 · 3 comments
Open

Migrate more model's IDs to bigint #9492

stsewd opened this issue Aug 10, 2022 · 3 comments
Labels
Accepted Accepted issue on our roadmap Improvement Minor improvement to code

Comments

@stsewd
Copy link
Member

stsewd commented Aug 10, 2022

These are the tables that can grow quite large (the percent is how many IDs we have consumed from the largest possible, 2^31 - 1):

  • ImportedFile (22%)
  • PageView (9%)
  • SearchQuery (2%)

The other important tables (projects, versions, etc) are less than 1%, so we are fine there.

We already experienced this for the SphinxDomains table (#9482, #9483). The migration took around 15 min, and we temporally disabled all access to those models so they won't hang till the migration was completed, the current models are still small, so I think we should be fine without having to temporarily disabling them.

We should also make sure to use a bigint for all new models (django's create app already does this). We can't change the global default since it will change the IDs of existing models, and may require some downtime...

And these are the numbers for .com

  • SphinxDomain (17%)
  • ImportedFile (4%)
  • PageView (1%)
  • SearchQuery (0.18%)
  • Auditlog (0.07%)
@humitos
Copy link
Member

humitos commented Aug 11, 2022

Really good description of the issue 💯

@stsewd stsewd added Improvement Minor improvement to code Accepted Accepted issue on our roadmap labels Aug 18, 2022
stsewd added a commit that referenced this issue Oct 18, 2022
We could disable search indexing while we do the migration,
but I don't think that should be required, we have 11M records,
But to migrate the SphinxDomain model it took 15 min,
and we had ~56M.

```python
In [7]: ImportedFile.objects.count()
Out[7]: 11527437
```

So some 3 min of not being able to index new versions doesn't seem
bad... There are two things that could happen:

- The query times out and we don't index that version.
- The query waits till the migration is done,
  nothing gets lost.

But if we disable search indexing we definitely
won't index new versions.

We don't use those models outside search indexing,
so doc serving the such shouldn't be affected.

ref #9492
stsewd added a commit that referenced this issue Oct 18, 2022
We could disable search indexing while we do the migration,
but I don't think that should be required, we have 11M records,
but to migrate the SphinxDomain model it took 15 min,
and we had ~56M.

```python
In [7]: ImportedFile.objects.count()
Out[7]: 11527437
```

So some 3 min of not being able to index new versions doesn't seem
bad... There are two things that could happen:

- The query times out and we don't index that version.
- The query waits till the migration is done,
  nothing gets lost.

But if we disable search indexing we definitely
won't index new versions.

We don't use those models outside search indexing,
so doc serving and such shouldn't be affected.

ref #9492
stsewd added a commit that referenced this issue Oct 18, 2022
How to deploy

We create page views on 404 and on page views (duh),
so while we do the migration this may slow down
doc serving (specially on .com where we have this feature enable for
everyone), so in order to avoid that we need to disable page views
while we do the migration.

Luckily we already have a feature flag for that:

https://github.com/readthedocs/readthedocs.org/blob/a09bc1a976a93bcc3f987fa0a052901f0065619f/readthedocs/projects/models.py#L1897-L1900

ref #9492
stsewd added a commit that referenced this issue Oct 18, 2022
We have 5M records, so migration shouldn't take that long (1-2 min?),
and we use a task to create the records, so this shouldn't affect
search.

```python
In [1]: SearchQuery.objects.count()
Out[1]: 5062590
```

Ref #9492
stsewd added a commit that referenced this issue May 25, 2023
How to deploy

We create page views on 404 and on page views (duh),
so while we do the migration this may slow down
doc serving (specially on .com where we have this feature enable for
everyone), so in order to avoid that we need to disable page views
while we do the migration.

Luckily we already have a feature flag for that:

https://github.com/readthedocs/readthedocs.org/blob/a09bc1a976a93bcc3f987fa0a052901f0065619f/readthedocs/projects/models.py#L1897-L1900

ref #9492
stsewd added a commit that referenced this issue Sep 26, 2023
* SearchQuery: use BigAutoField for primary key

We have 5M records, so migration shouldn't take that long (1-2 min?),
and we use a task to create the records, so this shouldn't affect
search.

```python
In [1]: SearchQuery.objects.count()
Out[1]: 5062590
```

Ref #9492

* Linter
@stsewd
Copy link
Member Author

stsewd commented Sep 27, 2023

The only "big" table that's missing migration is ImportedFile, currently at 32%. Since we are no longer creating a record per each html page, the growth rate should slow down now.

Open PR to migrate that id is at #9669.

@stsewd
Copy link
Member Author

stsewd commented Sep 27, 2023

How to calculate the percent:

max_int = 2**31 - 1
current_id = Model.objects.order_by('id').last().id
current_id * 100 / max_int

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Accepted Accepted issue on our roadmap Improvement Minor improvement to code
Projects
None yet
Development

No branches or pull requests

2 participants