Skip to content

Scrubbing #128

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 60 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
60 commits
Select commit Hold shift + click to select a range
8ac5676
Added native_chrome_extension folder
May 5, 2023
5c7f573
Implementing Issue #83
May 5, 2023
1572d45
Minor changes for the .sh installation script
May 6, 2023
224f321
Implementing Scrubbing Issue #47
May 6, 2023
221a09d
Added 2 more unit tests
May 6, 2023
a2b65f0
remove native folder
May 6, 2023
5f3dd57
remove native folder
May 6, 2023
4912397
add native folder
May 6, 2023
114b4cc
Add briwer.py, native-manifest.json,
May 6, 2023
cfc5189
Removed Issues#47 commits
May 7, 2023
ed57f81
Merge pull request #3 from MLDSAI/main
KrishPatel13 May 7, 2023
c4725eb
Merge branch 'MLDSAI:main' into issue47
KrishPatel13 May 7, 2023
a2e3b04
remove uneccessary commits
May 7, 2023
8bb2375
Merge branch 'main' of https://github.com/KrishPatel13/PAT into issue47
May 7, 2023
9b2b7f5
Merge branch 'issue47' of https://github.com/KrishPatel13/PAT into is…
May 7, 2023
34b6874
remove uneccessary commits
May 7, 2023
e7fd98f
Merge branch 'MLDSAI:main' into issue47
KrishPatel13 May 7, 2023
79cfd98
implemented scrub.py for text scrubbing using Presidio
May 9, 2023
6920315
implemented scrub.py for text scrubbing using Presidio
May 9, 2023
ee6df52
Code Clean Up and fixes #47
May 10, 2023
189a42f
Merge branch 'MLDSAI:main' into issue47
KrishPatel13 May 10, 2023
5f9a83d
remove unnecessary lines
May 10, 2023
ee4cd49
Merge branch 'issue47' of https://github.com/KrishPatel13/PAT into is…
May 10, 2023
0c9a787
code clean up
May 10, 2023
90c5a24
scrub with images progress
May 11, 2023
1a4992b
Scrubbing for Images
May 11, 2023
4bca565
Code Clean up
May 11, 2023
2a82e41
Fixed the Permission Error
May 11, 2023
5a343c8
Merge branch 'MLDSAI:main' into issue47
KrishPatel13 May 12, 2023
56f8767
Fixes in other files
May 12, 2023
5fb9943
Merge branch 'issue47' of https://github.com/KrishPatel13/PAT into is…
May 12, 2023
ae92591
code cleanup
May 12, 2023
62f93a6
Removed test bash script merged into existing test
May 13, 2023
b7784d7
Path typo in test_scrub.py
May 16, 2023
9f7c599
Fix visualize.py copy form OpenAdaptAI main branch
May 16, 2023
17f96e4
Puterbot -> openadapt
May 16, 2023
ed1c50f
Merge branch 'main' into issue47
KrishPatel13 May 16, 2023
b7b9868
added scrub_image in record.py
May 17, 2023
c32b87f
adding return fix in utlis.py
May 17, 2023
3656744
Added Scrubbing Feature before visualization
May 17, 2023
ceaac69
remove return from utils.py
May 18, 2023
e6fe0a7
Scrubbing Completed Version 1 (ready for review)
May 19, 2023
b85f7ad
ran black on scrub
May 19, 2023
8e66cb3
Add single element tuple
May 19, 2023
5395596
Remove TODO from scrub
May 22, 2023
7112b20
Code Clean up
May 22, 2023
9b59bd7
Merge branch 'MLDSAI:main' into issue47
KrishPatel13 May 23, 2023
5dbab64
resolved most styling issues
May 26, 2023
55e87dc
fixed all style issues
May 26, 2023
5ae7ae6
Merge branch 'main' of https://github.com/MLDSAI/OpenAdapt into MLDSA…
May 26, 2023
9c95ce6
Merge branch 'MLDSAI-main' into issue47
May 26, 2023
77e832b
merging fixes
May 26, 2023
b6cf3d3
fix merge conflicts
May 26, 2023
74bae49
added scrub before putting in database
May 28, 2023
166e030
ignore .html in gitignore
May 28, 2023
0d092fb
remove image scrubbing before adding into the db
May 28, 2023
4fde144
moved the text_sep 3 lines at top
May 29, 2023
78c50b8
Merge branch 'main' of https://github.com/MLDSAI/OpenAdapt into MLDSA…
May 29, 2023
12c3798
Merge branch 'MLDSAI-main' into issue47
May 29, 2023
7112bab
commented all scrubbing due to errors in record.py
May 29, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 9 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -21,4 +21,12 @@ cache
performance

# Generated when adding editable dependencies in requirements.txt (-e)
src
src
# VSCode
*.vscode

# venv Folder
venv/*

# HTML
*.html
3 changes: 2 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@ source .venv/bin/activate
pip install wheel
pip install -r requirements.txt
pip install -e .
python -m spacy download en_core_web_trf
alembic upgrade head
pytest
```
Expand Down Expand Up @@ -209,4 +210,4 @@ Please submit any issues to https://github.com/MLDSAI/openadapt/issues with the
following information:

- Problem description (please include any relevant console output and/or screenshots)
- Steps to reproduce (please help others to help you!)
- Steps to reproduce (please help others to help you!)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove unrelated changes 🙏

2 changes: 1 addition & 1 deletion alembic.ini
Original file line number Diff line number Diff line change
Expand Up @@ -102,4 +102,4 @@ formatter = generic

[formatter_generic]
format = %(levelname)-5.5s [%(name)s] %(message)s
datefmt = %H:%M:%S
datefmt = %H:%M:%S
2 changes: 1 addition & 1 deletion alembic/env.py
Original file line number Diff line number Diff line change
Expand Up @@ -89,4 +89,4 @@ def run_migrations_online() -> None:
if context.is_offline_mode():
run_migrations_offline()
else:
run_migrations_online()
run_migrations_online()
2 changes: 1 addition & 1 deletion alembic/script.py.mako
Original file line number Diff line number Diff line change
Expand Up @@ -21,4 +21,4 @@ def upgrade() -> None:


def downgrade() -> None:
${downgrades if downgrades else "pass"}
${downgrades if downgrades else "pass"}
Binary file added assets/test_scrub_image.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
57 changes: 57 additions & 0 deletions openadapt/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,10 @@

from dotenv import load_dotenv
from loguru import logger
from presidio_analyzer import AnalyzerEngine
from presidio_analyzer.nlp_engine import NlpEngineProvider
from presidio_anonymizer import AnonymizerEngine
from presidio_image_redactor import ImageRedactorEngine, ImageAnalyzerEngine


_DEFAULTS = {
Expand Down Expand Up @@ -43,3 +47,56 @@ def getenv_fallback(var_name):
for key, val in locals().items():
if not key.startswith("_") and key.isupper():
logger.info(f"{key}={val}")


# SCRUBBING CONFIGURATIONS

# SCRUB_CONFIG = {
# "nlp_engine_name": "spacy",
# "models": [{"lang_code": "en", "model_name": "en_core_web_lg"}],
# }
# SCRUB_PROVIDER = NlpEngineProvider(nlp_configuration=SCRUB_CONFIG)
# NLP_ENGINE = SCRUB_PROVIDER.create_engine()
ANALYZER = AnalyzerEngine(
# nlp_engine=NLP_ENGINE,
supported_languages=["en"]
)
ANONYMIZER = AnonymizerEngine()
IMAGE_REDACTOR = ImageRedactorEngine(ImageAnalyzerEngine(ANALYZER))
SCRUB_IGNORE_ENTITIES = [
# 'US_PASSPORT',
# 'US_DRIVER_LICENSE',
# 'CRYPTO',
# 'UK_NHS',
# 'PERSON',
# 'CREDIT_CARD',
# 'US_BANK_NUMBER',
# 'PHONE_NUMBER',
# 'US_ITIN',
# 'AU_ABN',
'DATE_TIME',
# 'NRP',
# 'SG_NRIC_FIN',
# 'AU_ACN',
# 'IP_ADDRESS',
# 'EMAIL_ADDRESS',
'URL',
# 'IBAN_CODE',
# 'AU_TFN',
# 'LOCATION',
# 'AU_MEDICARE',
# 'US_SSN',
# 'MEDICAL_LICENSE'
]
SCRUBBING_ENTITIES = [
entity
for entity in ANALYZER.get_supported_entities()
if entity not in SCRUB_IGNORE_ENTITIES
]
SCRUB_KEYS_HTML = [
'text',
'canonical_text',
'title',
'state'
]
DEFAULT_SCRUB_FILL_COLOR = (255,0,0)
12 changes: 7 additions & 5 deletions openadapt/models.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
import numpy as np
import sqlalchemy as sa

from openadapt import db, utils, window
from openadapt import db, utils, window, scrub


# https://groups.google.com/g/sqlalchemy/c/wlr7sShU6-k
Expand Down Expand Up @@ -61,6 +61,9 @@ def processed_action_events(self):

class ActionEvent(db.Base):
__tablename__ = "action_event"
_text_sep = "-"
_text_name_prefix = "<"
_text_name_suffix = ">"

id = sa.Column(sa.Integer, primary_key=True)
name = sa.Column(sa.String)
Expand Down Expand Up @@ -161,7 +164,9 @@ def _text(self, canonical=False):
)
else:
text = key_attr
return text

scrubbed_text = scrub.scrub_text(text, is_hyphenated=True)
return scrubbed_text

@property
def text(self):
Expand Down Expand Up @@ -201,9 +206,6 @@ def __str__(self):
rval = " ".join(attrs)
return rval

_text_sep = "-"
_text_name_prefix = "<"
_text_name_suffix = ">"

@classmethod
def from_children(cls, children_dicts):
Expand Down
12 changes: 9 additions & 3 deletions openadapt/record.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@
import fire
import mss.tools

from openadapt import config, crud, utils, window
from openadapt import config, crud, utils, window, scrub


EVENT_TYPES = ("screen", "action", "window")
Expand Down Expand Up @@ -162,6 +162,7 @@ def write_screen_event(
assert event.type == "screen", event
screenshot = event.data
png_data = mss.tools.to_png(screenshot.rgb, screenshot.size)
# event_data = {"png_data": scrub.scrub_png_data(png_data)}
event_data = {"png_data": png_data}
crud.insert_screenshot(recording_timestamp, event.timestamp, event_data)
perf_q.put((event.type, event.timestamp, utils.get_timestamp()))
Expand All @@ -182,6 +183,7 @@ def write_window_event(
"""

assert event.type == "window", event
# scrubbed_window_data = scrub.scrub_dict(event.data)
crud.insert_window_event(recording_timestamp, event.timestamp, event.data)
perf_q.put((event.type, event.timestamp, utils.get_timestamp()))

Expand Down Expand Up @@ -352,6 +354,8 @@ def read_screen_events(
if screenshot is None:
logger.warning("screenshot was None")
continue
# Scrubbing a ScreenShot
# scrubbed_screenshot = scrub.scrub_screenshot(screenshot)
event_q.put(Event(utils.get_timestamp(), "screen", screenshot))
logger.info("done")

Expand Down Expand Up @@ -522,12 +526,14 @@ def record(
Args:
task_description: a text description of the task that will be recorded
"""

scrubbed_task_description = scrub.scrub_text(task_description)

utils.configure_logging(logger, LOG_LEVEL)

logger.info(f"{task_description=}")
logger.info(f"{scrubbed_task_description=}")

recording = create_recording(task_description)
recording = create_recording(scrubbed_task_description)
recording_timestamp = recording.timestamp

event_q = queue.Queue()
Expand Down
Loading