Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementation Plan: Copy updates mature -> sensitive #2126

Merged
merged 9 commits into from
Jun 26, 2023

Conversation

AetherUnbound
Copy link
Collaborator

@AetherUnbound AetherUnbound commented May 18, 2023

Due date:

2023-06-08

Assigned reviewers

Description

Resolves: #1965

This is the implementation plan for copy/code changes necessary for updating references from mature to sensitive for various aspects of the frontend and API.

Admittedly I have very little familiarity with the frontend and translations specifically, so my understanding there may be inaccurate. Any guidance in that area would be welcome!

Current round

This discussion is following the Openverse decision-making process. Information
about this process can be found
on the Openverse documentation site.
Requested reviewers or participants will be following this process. If you are
being asked to give input on a specific detail, you do not need to familiarise
yourself with the process and follow it.

This discussion is currently in the Decision round.

The deadline for review of this round is 2023-06-08

@AetherUnbound AetherUnbound requested a review from a team as a code owner May 18, 2023 00:17
@github-actions github-actions bot added the 🧱 stack: documentation Related to Sphinx documentation label May 18, 2023
@github-actions
Copy link

Full-stack documentation: https://docs.openverse.org/_preview/2126

Please note that GitHub pages takes a little time to deploy newly pushed code, if the links above don't work or you see old versions, wait 5 minutes and try again.

You can check the GitHub pages deployment action list to see the current status of the deployments.

@AetherUnbound AetherUnbound added 🟨 priority: medium Not blocking but should be addressed soon 🌟 goal: addition Addition of new feature 📄 aspect: text Concerns the textual material in the repository 🧭 project: implementation plan An implementation plan for a project labels May 18, 2023
Copy link
Member

@krysal krysal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for writing this proposal. It looks quite detailed, so I just have a question on this round.

Comment on lines 79 to 81
The Elasticsearch model property
[`mature`](https://github.com/WordPress/openverse/blob/2041c5df1e9d5d1f9f37e7c177f2e70f61ea5dba/ingestion_server/ingestion_server/elasticsearch_models.py#L118)
will also not be affected.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will the ES model property also change later in conjunction with the field addition to the serializer?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure! I think sensitive is actually something we might end up deriving from a few different places in Elasticsearch, and might not be something we store on the indexed records directly. @sarayourfriend did you have a different impression?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mature is already a derived property. It theoretically can encode if a provider supplies the fact but also gets flipped to true for confirmed user reports. If we can update it to sensitive that would be ideal. It again would take more careful planning and potentially several PRs to do it in a backwards compatible way (similar to a database column rename, really).

Copy link
Collaborator

@sarayourfriend sarayourfriend May 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although, even more ideally would be to rename it to (something like) has_confirmed_sensitivity_report and then when we ingest sensitivity from providers in the future to use yet another field for that. If we can make the name more explicit and prevent overloading the meaning in the future, that would be 100% better than the current ambiguous name.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense! It seems like it might be outside the scope of this IP though, so I'll leave this as-is for now unless there are any objections.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally, I did not think it was something we would handle in this. Totally fine by me to continue excluding it from the scope.

Comment on lines 55 to 57
The following models will need to be updated. Since we want to avoid a migration
with this work, they will need to
[explicitly reference the old table names using `Meta::db_table`](https://docs.djangoproject.com/en/4.2/ref/models/options/#table-names).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll also support a version of this IP with migrations. No need to do it now but it would be a plus to keep the DB model updated too.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would probably be better to do sooner than later, if it's decided that that is desirable. It'll only get more tedious as the tables grow in size, right? Specifically with respect to the fact that we'd need to keep two tables for the rename to work with zero-downtime and copying the data will take longer as the tables grow. They're all pretty small (relatively speaking) at the moment, however.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, if we were going to change them now would be the time. I can rework this a bit to include instructions for how to approach that.

Comment on lines 63 to 70
The
[`mature` reason](https://github.com/WordPress/openverse/blob/2041c5df1e9d5d1f9f37e7c177f2e70f61ea5dba/api/api/models/media.py#L22-L21)
(and
[`mature_filtered`](https://github.com/WordPress/openverse/blob/2041c5df1e9d5d1f9f37e7c177f2e70f61ea5dba/api/api/models/media.py#L18-L17)
value) in `AbstractMediaReport` definition will also need to be changed to
`sensitive` and `sensitive_filtered` respectively. In order to remain backwards
compatible with the frontend, incoming references on reports for `mature` will
need to be mapped to the new value.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm understanding the approach suggested here, this would have us update all the previous report values, right? I think that will require multiple steps including Django migrations (Django insists on encoding "choices" into migrations even though Postgres doesn't know anything about it) and (even if Django migrations aren't needed) a management command to update the old values (as per our documentation on data transformations). Is my understanding of the suggestion correct? If so I think it would be worth outlining the steps for that explicitly.

FWIW, I don't think this violates the no-migrations stipulation from the project proposal as that was aimed primarily at table and column name changes rather than data changes. Table and column name changes are just a lot harder to juggle and can be masked in code and UI using Django's tools. The data inside the columns, however, is easier to update and much peskier if we don't (easy to accidentally mess up queries, etc).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh that's very true, thank you for mentioning it! I'll add that to this IP 🙂

Copy link
Member

@dhruvkb dhruvkb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Apart from what seems like a typo, I have nothing more to add after what @krysal and @sarayourfriend have already covered.

@AetherUnbound
Copy link
Collaborator Author

Thanks for the review folks! I'm moving this into the Revision round while I make updates to the document 🚀

@AetherUnbound AetherUnbound force-pushed the projects/mature-to-sensitive-ip branch from e1e2376 to 7b95a4c Compare June 5, 2023 22:58
@AetherUnbound AetherUnbound marked this pull request as ready for review June 5, 2023 23:56
@AetherUnbound
Copy link
Collaborator Author

@krysal and @sarayourfriend - I've added what I believe are the necessary steps for modifying the underlying table names. This seemed like a good opportunity to modify the report table names as well, since we'd also be changing the reason value in the table anyway (and doing this via a different table seemed easiest anyway). Clearly that breaks our intention to not have this IP include migrations :smiling: But I think now is a good time as we have so few records (<700 reports on the image table for instance)! Please let me know if any of these steps could use more clarity; it was a lot to wrap my head around for what needed to be done in which order and get it all into the IP.

@dhruvkb do you mind also taking one more look at the new proposed approach to make sure it seems okay?

Given the revisions I'm moving this into the Decision Round.

Copy link
Collaborator

@sarayourfriend sarayourfriend left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One blocker, but just to add a critical step/detail to the migration to cover the ingestion server changes.

Really excited about this work, thanks for the careful planning, Madison!

Comment on lines 85 to 115
1. Add new models with updated names and fields for `SensitiveImage` and
`SensitiveAudio`. Also add new models for `ImageReport` and `AudioReport`.
This will require renaming code references from the existing `ImageReport`
and `AudioReport` models to `NsfwReport` and `NsfwReportAudio` temporarily
while maintaining the reference to the existing table name. Said another way:
- `ImageReport` -> `NsfwReport` _(code only, `db_table` should remain the
same)_
- `AudioReport` -> `NsfwReportAudio` _(code only, `db_table` should remain
the same)_
- Add `ImageReport` _(new model, `db_table` will not need to be overridden)_
- Add `AudioReport` _(new model, `db_table` will not need to be overridden)_
2. Modify the API to write to both the old and new tables for each of the above
instances. On the new report tables, the `mature_filtered` reason should be
written as `sensitive_filtered`. Data should continue to be read from the
original tables during this time.
3. Deploy this version of the API.
4. Create a
[data management command](https://docs.openverse.org/general/zero_downtime_database_management.html#django-management-command-based-data-transformations)
which copies all data from the original tables to the new tables. The
`identifier` field can be used for determining which rows have already been
copied.
5. Deploy this version of the API and run the data management command until no
more rows need to be copied for each of the columns.
6. Remove all code references to the old model from the API, but leave the
models in place for the time being. Data should now only be written to and
read from the new tables.
7. Deploy this version of the API so that the only versions currently deployed
are writing to the new tables.
8. Remove the old models (`MatureImage`, `MatureAudio`, `NsfwReport`,
`NsfwReportAudio`) from the API codebase and deploy this version of the API.
This will remove the old tables from the database.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent! This is a clear description of this process and I'm very glad we'll soon have a concrete example of it with real life PRs to point to for future projects to reference.

Comment on lines 207 to 209
This work will include several migrations for the API, which will need to be
deployed one after another in production as described in
[the API section](#api).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will need an update to note the ingestion server timed deployment as well.

I suppose we'll need to coordinate around/pause data refresh for that, which would be good to call out, especially if we run any more experiments with the popularity calculations on the real catalog db.

Comment on lines 235 to 237
The frontend code work should wait until the API work is complete so the content
reporting using `sensitive` rather than `mature` is available on the API prior
to the frontend code changes.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This probably needs to take into account the changes that will happen in #2118, primarily to note that we'll need to coordinate around that.

I don't think the frontend copy changes need to be blocked on the API work though, which seems to be what the first sentence in this section says.

Not a blocker, just clarification. I think this coordination will happen naturally as the work on the frontend copy gets done. Maybe just adding a note to the frontend issues to be aware that the other project will also be making related changes and to be aware of potential overlaps/merge conflicts that might arise.

@AetherUnbound
Copy link
Collaborator Author

@sarayourfriend I believe I've addressed your concerns but let me know if there's anything else I'm missing! Thanks for your thorough review 🙌🏼

sarayourfriend
sarayourfriend previously approved these changes Jun 12, 2023
Copy link
Collaborator

@sarayourfriend sarayourfriend left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🚀

@openverse-bot
Copy link
Collaborator

Based on the medium urgency of this PR, the following reviewers are being gently reminded to review this PR:

@krysal
@dhruvkb
This reminder is being automatically generated due to the urgency configuration.

Excluding weekend1 days, this PR was ready for review 9 day(s) ago. PRs labelled with medium urgency are expected to be reviewed within 4 weekday(s)2.

@AetherUnbound, if this PR is not ready for a review, please draft it to prevent reviewers from getting further unnecessary pings.

Footnotes

  1. Specifically, Saturday and Sunday.

  2. For the purpose of these reminders we treat Monday - Friday as weekdays. Please note that the operation that generates these reminders runs at midnight UTC on Monday - Friday. This means that depending on your timezone, you may be pinged outside of the expected range.

Copy link
Member

@krysal krysal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a pretty comprehensive plan and looks good!

I only have a question about one of the steps for the API, but it shouldn't be a blocker.

[`exists_in_mature` table check](https://github.com/WordPress/openverse/blob/765a24028baa922b27fc0c38b8ae4c69902eec40/ingestion_server/ingestion_server/queries.py#L33-L37)
(this will necessarily involve updating tests and initialization SQL as
well).
7. Remove all code references to the old model from the API, but leave the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which model exactly are we speaking of here? Is it in the code from the Ingestion server?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This refers to the MatureImage and MatureAudio models, which are replaced by SensitiveImage and SensitiveAudio`. I'll make that explicit here. All the steps here deal with the API, unless explicitly stated like in step 6.

AetherUnbound and others added 5 commits June 26, 2023 15:38
Co-authored-by: Krystle Salazar <krystle.salazar@automattic.com>
Co-authored-by: Krystle Salazar <krystle.salazar@automattic.com>
@AetherUnbound AetherUnbound merged commit 26c863c into main Jun 26, 2023
@AetherUnbound AetherUnbound deleted the projects/mature-to-sensitive-ip branch June 26, 2023 23:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
📄 aspect: text Concerns the textual material in the repository 🌟 goal: addition Addition of new feature 🟨 priority: medium Not blocking but should be addressed soon 🧭 project: implementation plan An implementation plan for a project 🧱 stack: documentation Related to Sphinx documentation
Projects
Status: Accepted
Archived in project
Development

Successfully merging this pull request may close these issues.

Implementation Plan: Copy updates mature -> sensitive
5 participants