-
Notifications
You must be signed in to change notification settings - Fork 202
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implementation Plan: Copy updates mature
-> sensitive
#2126
Conversation
Full-stack documentation: https://docs.openverse.org/_preview/2126 Please note that GitHub pages takes a little time to deploy newly pushed code, if the links above don't work or you see old versions, wait 5 minutes and try again. You can check the GitHub pages deployment action list to see the current status of the deployments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for writing this proposal. It looks quite detailed, so I just have a question on this round.
The Elasticsearch model property | ||
[`mature`](https://github.com/WordPress/openverse/blob/2041c5df1e9d5d1f9f37e7c177f2e70f61ea5dba/ingestion_server/ingestion_server/elasticsearch_models.py#L118) | ||
will also not be affected. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will the ES model property also change later in conjunction with the field addition to the serializer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure! I think sensitive
is actually something we might end up deriving from a few different places in Elasticsearch, and might not be something we store on the indexed records directly. @sarayourfriend did you have a different impression?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mature
is already a derived property. It theoretically can encode if a provider supplies the fact but also gets flipped to true for confirmed user reports. If we can update it to sensitive that would be ideal. It again would take more careful planning and potentially several PRs to do it in a backwards compatible way (similar to a database column rename, really).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although, even more ideally would be to rename it to (something like) has_confirmed_sensitivity_report
and then when we ingest sensitivity from providers in the future to use yet another field for that. If we can make the name more explicit and prevent overloading the meaning in the future, that would be 100% better than the current ambiguous name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense! It seems like it might be outside the scope of this IP though, so I'll leave this as-is for now unless there are any objections.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Totally, I did not think it was something we would handle in this. Totally fine by me to continue excluding it from the scope.
The following models will need to be updated. Since we want to avoid a migration | ||
with this work, they will need to | ||
[explicitly reference the old table names using `Meta::db_table`](https://docs.djangoproject.com/en/4.2/ref/models/options/#table-names). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll also support a version of this IP with migrations. No need to do it now but it would be a plus to keep the DB model updated too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would probably be better to do sooner than later, if it's decided that that is desirable. It'll only get more tedious as the tables grow in size, right? Specifically with respect to the fact that we'd need to keep two tables for the rename to work with zero-downtime and copying the data will take longer as the tables grow. They're all pretty small (relatively speaking) at the moment, however.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, if we were going to change them now would be the time. I can rework this a bit to include instructions for how to approach that.
The | ||
[`mature` reason](https://github.com/WordPress/openverse/blob/2041c5df1e9d5d1f9f37e7c177f2e70f61ea5dba/api/api/models/media.py#L22-L21) | ||
(and | ||
[`mature_filtered`](https://github.com/WordPress/openverse/blob/2041c5df1e9d5d1f9f37e7c177f2e70f61ea5dba/api/api/models/media.py#L18-L17) | ||
value) in `AbstractMediaReport` definition will also need to be changed to | ||
`sensitive` and `sensitive_filtered` respectively. In order to remain backwards | ||
compatible with the frontend, incoming references on reports for `mature` will | ||
need to be mapped to the new value. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I'm understanding the approach suggested here, this would have us update all the previous report values, right? I think that will require multiple steps including Django migrations (Django insists on encoding "choices" into migrations even though Postgres doesn't know anything about it) and (even if Django migrations aren't needed) a management command to update the old values (as per our documentation on data transformations). Is my understanding of the suggestion correct? If so I think it would be worth outlining the steps for that explicitly.
FWIW, I don't think this violates the no-migrations stipulation from the project proposal as that was aimed primarily at table and column name changes rather than data changes. Table and column name changes are just a lot harder to juggle and can be masked in code and UI using Django's tools. The data inside the columns, however, is easier to update and much peskier if we don't (easy to accidentally mess up queries, etc).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh that's very true, thank you for mentioning it! I'll add that to this IP 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Apart from what seems like a typo, I have nothing more to add after what @krysal and @sarayourfriend have already covered.
...rust_and_safety/content_report_moderation/20230517-implementation_plan_mature_copy_change.md
Show resolved
Hide resolved
...rust_and_safety/content_report_moderation/20230517-implementation_plan_mature_copy_change.md
Outdated
Show resolved
Hide resolved
Thanks for the review folks! I'm moving this into the Revision round while I make updates to the document 🚀 |
e1e2376
to
7b95a4c
Compare
@krysal and @sarayourfriend - I've added what I believe are the necessary steps for modifying the underlying table names. This seemed like a good opportunity to modify the report table names as well, since we'd also be changing the @dhruvkb do you mind also taking one more look at the new proposed approach to make sure it seems okay? Given the revisions I'm moving this into the Decision Round. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One blocker, but just to add a critical step/detail to the migration to cover the ingestion server changes.
Really excited about this work, thanks for the careful planning, Madison!
1. Add new models with updated names and fields for `SensitiveImage` and | ||
`SensitiveAudio`. Also add new models for `ImageReport` and `AudioReport`. | ||
This will require renaming code references from the existing `ImageReport` | ||
and `AudioReport` models to `NsfwReport` and `NsfwReportAudio` temporarily | ||
while maintaining the reference to the existing table name. Said another way: | ||
- `ImageReport` -> `NsfwReport` _(code only, `db_table` should remain the | ||
same)_ | ||
- `AudioReport` -> `NsfwReportAudio` _(code only, `db_table` should remain | ||
the same)_ | ||
- Add `ImageReport` _(new model, `db_table` will not need to be overridden)_ | ||
- Add `AudioReport` _(new model, `db_table` will not need to be overridden)_ | ||
2. Modify the API to write to both the old and new tables for each of the above | ||
instances. On the new report tables, the `mature_filtered` reason should be | ||
written as `sensitive_filtered`. Data should continue to be read from the | ||
original tables during this time. | ||
3. Deploy this version of the API. | ||
4. Create a | ||
[data management command](https://docs.openverse.org/general/zero_downtime_database_management.html#django-management-command-based-data-transformations) | ||
which copies all data from the original tables to the new tables. The | ||
`identifier` field can be used for determining which rows have already been | ||
copied. | ||
5. Deploy this version of the API and run the data management command until no | ||
more rows need to be copied for each of the columns. | ||
6. Remove all code references to the old model from the API, but leave the | ||
models in place for the time being. Data should now only be written to and | ||
read from the new tables. | ||
7. Deploy this version of the API so that the only versions currently deployed | ||
are writing to the new tables. | ||
8. Remove the old models (`MatureImage`, `MatureAudio`, `NsfwReport`, | ||
`NsfwReportAudio`) from the API codebase and deploy this version of the API. | ||
This will remove the old tables from the database. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent! This is a clear description of this process and I'm very glad we'll soon have a concrete example of it with real life PRs to point to for future projects to reference.
...rust_and_safety/content_report_moderation/20230517-implementation_plan_mature_copy_change.md
Show resolved
Hide resolved
This work will include several migrations for the API, which will need to be | ||
deployed one after another in production as described in | ||
[the API section](#api). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will need an update to note the ingestion server timed deployment as well.
I suppose we'll need to coordinate around/pause data refresh for that, which would be good to call out, especially if we run any more experiments with the popularity calculations on the real catalog db.
The frontend code work should wait until the API work is complete so the content | ||
reporting using `sensitive` rather than `mature` is available on the API prior | ||
to the frontend code changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This probably needs to take into account the changes that will happen in #2118, primarily to note that we'll need to coordinate around that.
I don't think the frontend copy changes need to be blocked on the API work though, which seems to be what the first sentence in this section says.
Not a blocker, just clarification. I think this coordination will happen naturally as the work on the frontend copy gets done. Maybe just adding a note to the frontend issues to be aware that the other project will also be making related changes and to be aware of potential overlaps/merge conflicts that might arise.
@sarayourfriend I believe I've addressed your concerns but let me know if there's anything else I'm missing! Thanks for your thorough review 🙌🏼 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 🚀
Based on the medium urgency of this PR, the following reviewers are being gently reminded to review this PR: @krysal Excluding weekend1 days, this PR was ready for review 9 day(s) ago. PRs labelled with medium urgency are expected to be reviewed within 4 weekday(s)2. @AetherUnbound, if this PR is not ready for a review, please draft it to prevent reviewers from getting further unnecessary pings. Footnotes
|
...rust_and_safety/content_report_moderation/20230517-implementation_plan_mature_copy_change.md
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a pretty comprehensive plan and looks good!
I only have a question about one of the steps for the API, but it shouldn't be a blocker.
...rust_and_safety/content_report_moderation/20230517-implementation_plan_mature_copy_change.md
Outdated
Show resolved
Hide resolved
[`exists_in_mature` table check](https://github.com/WordPress/openverse/blob/765a24028baa922b27fc0c38b8ae4c69902eec40/ingestion_server/ingestion_server/queries.py#L33-L37) | ||
(this will necessarily involve updating tests and initialization SQL as | ||
well). | ||
7. Remove all code references to the old model from the API, but leave the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which model exactly are we speaking of here? Is it in the code from the Ingestion server?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This refers to the MatureImage
and MatureAudio
models, which are replaced by SensitiveImage
and SensitiveAudio`. I'll make that explicit here. All the steps here deal with the API, unless explicitly stated like in step 6.
...rust_and_safety/content_report_moderation/20230517-implementation_plan_mature_copy_change.md
Outdated
Show resolved
Hide resolved
Co-authored-by: Krystle Salazar <krystle.salazar@automattic.com>
Co-authored-by: Krystle Salazar <krystle.salazar@automattic.com>
Due date:
2023-06-08
Assigned reviewers
Description
Resolves: #1965
This is the implementation plan for copy/code changes necessary for updating references from
mature
tosensitive
for various aspects of the frontend and API.Admittedly I have very little familiarity with the frontend and translations specifically, so my understanding there may be inaccurate. Any guidance in that area would be welcome!
Current round
This discussion is following the Openverse decision-making process. Information
about this process can be found
on the Openverse documentation site.
Requested reviewers or participants will be following this process. If you are
being asked to give input on a specific detail, you do not need to familiarise
yourself with the process and follow it.
This discussion is currently in the Decision round.
The deadline for review of this round is 2023-06-08