Skip to content

Scrubbing and Fix Atomacos #211

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 21 commits into from
Jun 9, 2023

Conversation

KrishPatel13
Copy link
Collaborator

@KrishPatel13 KrishPatel13 commented Jun 1, 2023

This PR resolves #47.

Features:

  • Scrubs before Visualization only.
  • Scrubs all Screenshots with a accurate spacy model (en_core_web_trf).
  • Scrubs the dictionary and the data whose keys are in config.SCRUB_KEYS_HTML
  • Scrubs all the fields of "state" field.
  • Offers code reusability by making changes.
  • Scrubs "text" and "canonical_text" both with and without hyphenated form.

Newly Added Features:

  • Added a field in configure_logging called SCRUB_LOGGING that logs if it is True or not.
  • Scrubs during logging in visualize (by default SCRUB_LOGGIN is True).
  • Scrubs task_description in record.py logging. But it ensures that the task_description put in db is not changed/scrubbed (for now).
  • Scrubs "text" both in separated and non-separated form. And also ensures to return the scrubbed_text in the same form.
  • If text was scrubbed then, it ensures that all of its children key_chars, key_vk etc. all are also scrubbed! See: Scrubbing and Fix Atomacos #211 (comment)
  • Scrubs text in diff images as well. See: Scrubbing and Fix Atomacos #211 (comment) It scrubbed my name as it detected as PII.
  • Also, Fixes 2 TODOs:
    - utills.py
    - utils.py

Important Note:

  • The current of scrubbing ignores scrubbing for "text" and "canonical_text" whose value starts with the config.TEXT_PREIX or ends with config.TEXT_SUFFIX. Eg. It ignores scrubbing of , , + c, etc.

Krish Patel added 8 commits May 30, 2023 13:23
Add test_scrub module
Add Scrubbing config in config.py
Add presidio and required packages in requirements.txt
255 -> (255,0,0) in config due to failing test
single element tuple
convert back all hyphenated and scrubbed text back to hyphenated form
@KrishPatel13 KrishPatel13 requested review from abrichr and apgorton June 1, 2023 16:32
@KrishPatel13 KrishPatel13 self-assigned this Jun 1, 2023
@KrishPatel13 KrishPatel13 mentioned this pull request Jun 1, 2023
@KrishPatel13
Copy link
Collaborator Author

@abrichr Ready for review!

Also, We can close the PR: #128 if this one successfully gets merged! :)

@KrishPatel13
Copy link
Collaborator Author

Also look into modifying configure_logging to scrub data before it is logged? If it's a lot of work we can leave it as a TODO for now.

@KrishPatel13 KrishPatel13 changed the title Scrubbing Scrubbing and Fix Atomacos Jun 1, 2023
@KrishPatel13 KrishPatel13 linked an issue Jun 1, 2023 that may be closed by this pull request
@KrishPatel13
Copy link
Collaborator Author

KrishPatel13 commented Jun 1, 2023

This PR also resolves: https://github.com/MLDSAI/OpenAdapt/pull/211/files#r1213492837

If this is merged we can close the issue: #206

Krish Patel added 4 commits June 1, 2023 14:11
from models.py to config
also move language to config
also moves the masking char to scrub
fix typo is_seperated
Moved DIRNAME_PERFORMANCE_PLOTS = "performance" in config
@KrishPatel13
Copy link
Collaborator Author

KrishPatel13 commented Jun 1, 2023

image

Now, scrubbing also scrub the text in diff image if PII is detected ! :)

@KrishPatel13
Copy link
Collaborator Author

image

Now, scrubbing also scrub the children key_chars if the text or canonical_text was scrubbed.

@KrishPatel13
Copy link
Collaborator Author

KrishPatel13 commented Jun 2, 2023

@0dm Any thoughts on it ?
@abrichr ^^

@0dm
Copy link
Collaborator

0dm commented Jun 2, 2023

We can enable / disable it by default, but I think it's a good idea to give the user an option to use scrubbing. Some users may want it off if they know that there's nothing to scrub and want higher performance. We can redefine to visualize(scrubbing=True) and call with visualize(false) if scrubbing has been toggled off.

@KrishPatel13
Copy link
Collaborator Author

Related: #218 (Just keep it in the loop) :)

@KrishPatel13
Copy link
Collaborator Author

@abrichr Ready for Merging!

@KrishPatel13 KrishPatel13 requested review from abrichr and removed request for apgorton June 9, 2023 15:02
@KrishPatel13
Copy link
Collaborator Author

@abrichr Ready for Merging! :)

@abrichr abrichr merged commit d784d37 into OpenAdaptAI:main Jun 9, 2023
@abrichr abrichr deleted the feature/scrub-final branch June 9, 2023 15:39
@KrishPatel13
Copy link
Collaborator Author

KrishPatel13 commented Jun 14, 2023

Here are some screenshots of the html (after Visualization):

After Redaction:
image

Actual:
image

@KrishPatel13
Copy link
Collaborator Author

image

@KrishPatel13
Copy link
Collaborator Author

KrishPatel13 commented Jun 14, 2023

image

Actual:
image

@KrishPatel13
Copy link
Collaborator Author

KrishPatel13 commented Jun 14, 2023

image

Actual:
image

@KrishPatel13
Copy link
Collaborator Author

image

Actual:
image

R-ohit-B-isht pushed a commit to R-ohit-B-isht/OpenAdapt that referenced this pull request Jun 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fix atomacos in requirements.txt Implement Scrubbing
3 participants