Skip to content

[ENG-5862] SPAM - Fix Wiki Spamming #11171

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

antkryt
Copy link
Contributor

@antkryt antkryt commented Jun 5, 2025

Purpose

verify if spammy domains are detected

Changes

  • merge check_resource_for_domains_postcommit and check_resource_with_spam_services tasks to avoid race condition
  • compare note to value not enum
  • log detected domains to sentry

QA Notes

You can test it with domain xakw1.com on staging3. Currently project won't be banned with this domain, regardless of whether it's public or not.

Documentation

Side Effects

Ticket

https://openscience.atlassian.net/browse/ENG-5862

@antkryt
Copy link
Contributor Author

antkryt commented Jun 5, 2025

The issue described in the ticket is not entirely accurate. I found the domain "xawk1.com" on staging, and content containing this domain will not be banned even if the project is public (here is an example). And on the other side, if you repeat the steps from the ticket using some other domains, everything works as expected.

However, the spam check will always be triggered because the DomainReference is always created and you can verify it in django admin (the only place it’s created is in the function _check_resource_for_domains). Therefore, only two possibilities remain:

if notable_domain.note == NotableDomain.Note.EXCLUDE_FROM_ACCOUNT_CREATION_AND_CONTENT is not being triggered
or
resource.confirm_spam(save=True, domains=list(spammy_domains)) silently fails (changes are not saved in the database or racing condition and changes are overwritten with some other process).

Probably changes made by check_resource_for_domains_postcommit are overwritten by check_resource_with_spam_services. Both of them start almost at the same time and load same (non-spam) version of the resource at the beginning. Which task is completed last, such changes will be saved in the database (typically check_resource_with_spam_services ends last because we make request to external service)

Copy link
Contributor

@Johnetordoff Johnetordoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@antkryt I'm a little confused exactly what this accomplishes, could you write some test cases to illustrate how you are changing the current behavior. I see that the tests are changed, but I don't see new test cases. A good test case here should show the spam content or spam domain of a wiki is checked when a project is made public.

@antkryt
Copy link
Contributor Author

antkryt commented Jun 10, 2025

@Johnetordoff I'm not sure that it's possible to write a test case to illustrate what I'm changing here, so I'll try to explain better.

Both check_resource_for_domains_postcommit and check_resource_with_spam_services are doing the same thing: start after response is sent to client (run_postcommit decorator) -> load node with spam_status=UNKNOWN -> process -> save changes to db.

As you can see, tasks don't know anything about each other and changes that are made (it's two different parallel processes). So, if check_resource_with_spam_services finishes after check_resource_for_domains_postcommit, then changes made by check_resource_for_domains_postcommit will be overwritten (and vice versa). Locally everything works because SPAM_SERVICES_ENABLED is false and only check_resource_for_domains_postcommit is running.

Also, I've changed signature of the def _check_resource_for_domains(guid, content) to def _check_resource_for_domains(resource, content) and removed confirm_spam() call from it, that's why I've updated some tests to make them work as before

As for checking spam during privacy change tests, we have bunch of those (see osf_tests/test_node.py::TestNodeSpam). I'll add a few tests to check if new check_resource_for_spam_postcommit function works properly

Copy link
Contributor

@Johnetordoff Johnetordoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still not quite the behavior we are looking for, but correct techniques.


@mock.patch.object(settings, 'SPAM_SERVICES_ENABLED', True)
@mock.patch('osf.external.spam.tasks._check_resource_for_domains')
def test_check_resource_for_spam_postcommit_with_spammy_domains(self, mock_check_domains, project, user):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

THe ticket instructions to reproduce the bug are:

  1. Create project
  2. Update wiki with spammy domain
  3. Make project public

These are the steps the tests should follow, you are testing behavior that is too specific we know checking for spam domans works, when a project is saved when public, but we need to check a project's wiki content too, not just it's desciption etc.

Copy link
Contributor

@Johnetordoff Johnetordoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good new tests, thanks!

@brianjgeiger brianjgeiger changed the base branch from feature/pbs-25-10 to feature/pbs-25-13 June 30, 2025 17:22
@brianjgeiger brianjgeiger merged commit 5af4c80 into CenterForOpenScience:feature/pbs-25-13 Jun 30, 2025
6 checks passed
Johnetordoff added a commit to Johnetordoff/osf.io that referenced this pull request Jul 16, 2025
…cience/osf.io into add-brand-to-collection-provider

* 'feature/pbs-25-13' of https://github.com/CenterForOpenScience/osf.io: (25 commits)
  [ENG-8224] Fixed force archive template with registration addons (CenterForOpenScience#11210)
  API: Allow /v2/users/me/preprints list view to filter by title
  [ENG-8325] Public column does not display the visibility status of child nodes on the Nodes page in the Admin App
  [ENG-8246] Fixed deletion of maintenance alerts in admin (CenterForOpenScience#11226)
  add exception handling to /review_actions/ endpoint
  added a route to download node metadata (CenterForOpenScience#11215)
  [ENG-7929] Ability to move registrations to draft state (CenterForOpenScience#11153)
  switch to new UI when user views draft registration file (CenterForOpenScience#11144)
  [ENG-5862] SPAM - Fix Wiki Spamming (CenterForOpenScience#11171)
  improved displaying of stashed urls and approval state in admin (CenterForOpenScience#11193)
  [ENG-7962] Fix User Setting Response Payload async mailchimp perference change issues (CenterForOpenScience#11136)
  fixed children/parent fields in admin templates (CenterForOpenScience#11156)
  don't add multiple group perms for preprint provider (CenterForOpenScience#11159)
  fix content overflow for node page (CenterForOpenScience#11182)
  [ENG-8096] Admins on projects are unable to reject user access requests (CenterForOpenScience#11163)
  added retry to avoid race condition (CenterForOpenScience#11179)
  upgrade django to 4.2.17 (CenterForOpenScience#11173)
  add additional information to user admin (CenterForOpenScience#11184)
  [ENG-8192] Ability to force archive registrations when OSFS Folders have changed (CenterForOpenScience#11194)
  [ENG-8193] Fix issues with Preprint submission via API (CenterForOpenScience#11185)
  ...
Johnetordoff added a commit to Johnetordoff/osf.io that referenced this pull request Jul 29, 2025
…cience/osf.io into refactor-notifications

* 'feature/pbs-25-13' of https://github.com/CenterForOpenScience/osf.io: (33 commits)
  added academiaInstitution in social-schema, fixed True value of 'ongoing', fixed/added tests (CenterForOpenScience#11239)
  [ENG-7979] Registrations pending moderation that have components also pending moderation do not display the children (CenterForOpenScience#11222)
  [ENG-8216] Fixed children deletion on a node page in admin (CenterForOpenScience#11237)
  [ENG-8401] Fixed preprint downloading (CenterForOpenScience#11238)
  fix categories for sendgrid emails (CenterForOpenScience#11236)
  [ENG-8936] API: Allow /v2/users/me/preprints list view to filter by tags (CenterForOpenScience#11232)
  Add collections scopes to FULL_READ and FULL_WRITE
  add brand relationship to collectionprovider
  [ENG-8224] Fixed force archive template with registration addons (CenterForOpenScience#11210)
  API: Allow /v2/users/me/preprints list view to filter by title
  [ENG-8325] Public column does not display the visibility status of child nodes on the Nodes page in the Admin App
  [ENG-8246] Fixed deletion of maintenance alerts in admin (CenterForOpenScience#11226)
  add exception handling to /review_actions/ endpoint
  added a route to download node metadata (CenterForOpenScience#11215)
  [ENG-7929] Ability to move registrations to draft state (CenterForOpenScience#11153)
  switch to new UI when user views draft registration file (CenterForOpenScience#11144)
  [ENG-5862] SPAM - Fix Wiki Spamming (CenterForOpenScience#11171)
  improved displaying of stashed urls and approval state in admin (CenterForOpenScience#11193)
  [ENG-7962] Fix User Setting Response Payload async mailchimp perference change issues (CenterForOpenScience#11136)
  fixed children/parent fields in admin templates (CenterForOpenScience#11156)
  ...

# Conflicts:
#	tests/test_user_profile_view.py
Johnetordoff added a commit to Johnetordoff/osf.io that referenced this pull request Aug 4, 2025
…cience/osf.io into add-new-notifications-data-model

* 'feature/pbs-25-13' of https://github.com/CenterForOpenScience/osf.io: (35 commits)
  [ENG-8462]  Institution setup fixes (CenterForOpenScience#11241)
  [ENG-8401] Earlier preprint versions download the current file (CenterForOpenScience#11245)
  added academiaInstitution in social-schema, fixed True value of 'ongoing', fixed/added tests (CenterForOpenScience#11239)
  [ENG-7979] Registrations pending moderation that have components also pending moderation do not display the children (CenterForOpenScience#11222)
  [ENG-8216] Fixed children deletion on a node page in admin (CenterForOpenScience#11237)
  [ENG-8401] Fixed preprint downloading (CenterForOpenScience#11238)
  fix categories for sendgrid emails (CenterForOpenScience#11236)
  [ENG-8936] API: Allow /v2/users/me/preprints list view to filter by tags (CenterForOpenScience#11232)
  Add collections scopes to FULL_READ and FULL_WRITE
  add brand relationship to collectionprovider
  [ENG-8224] Fixed force archive template with registration addons (CenterForOpenScience#11210)
  API: Allow /v2/users/me/preprints list view to filter by title
  [ENG-8325] Public column does not display the visibility status of child nodes on the Nodes page in the Admin App
  [ENG-8246] Fixed deletion of maintenance alerts in admin (CenterForOpenScience#11226)
  add exception handling to /review_actions/ endpoint
  added a route to download node metadata (CenterForOpenScience#11215)
  [ENG-7929] Ability to move registrations to draft state (CenterForOpenScience#11153)
  switch to new UI when user views draft registration file (CenterForOpenScience#11144)
  [ENG-5862] SPAM - Fix Wiki Spamming (CenterForOpenScience#11171)
  improved displaying of stashed urls and approval state in admin (CenterForOpenScience#11193)
  ...

# Conflicts:
#	admin/nodes/views.py
#	api_tests/users/views/test_user_settings_detail.py
#	framework/email/tasks.py
#	tests/framework_tests/test_email.py
#	tests/test_user_profile_view.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants