-
Notifications
You must be signed in to change notification settings - Fork 639
Adds a default 1 hour window deduplication check on signal filter flow #3329
Conversation
| else: | ||
| # Apply the default deduplication rule if there's no deduplication rule set on the signal | ||
| # and the signal instance is not snoozed | ||
| if not has_dedup_filter and not filtered: | ||
| default_dedup_window = datetime.now(timezone.utc) - timedelta(hours=1) | ||
| default_dedup_query = ( | ||
| db_session.query(SignalInstance) | ||
| .filter( | ||
| SignalInstance.signal_id == signal_instance.signal_id, | ||
| SignalInstance.created_at >= default_dedup_window, | ||
| SignalInstance.id != signal_instance.id, | ||
| ) | ||
| .order_by(asc(SignalInstance.created_at)) | ||
| .all() | ||
| ) | ||
|
|
||
| if default_dedup_query: | ||
| signal_instance.case_id = default_dedup_query[0].case_id | ||
| signal_instance.filter_action = SignalFilterAction.deduplicate | ||
| filtered = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think indentation for this code block is wrong?
src/dispatch/signal/service.py
Outdated
|
|
||
| # Check if there's a deduplication rule set on the signal | ||
| has_dedup_filter = any( | ||
| f.action == SignalFilterAction.deduplicate for f in signal_instance.signal.filters |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it necessary to loop through all the signal_instance.signal.filters here if we do it again below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Barring any refactor of this function, I believe the check is necessary to handle the case of deduplication filters that don't match the signal instance (they will be filtered by the default rule, which is not intended). For example:
else:
# Apply the default deduplication rule if there's no deduplication rule set on the signal
# and the signal instance is not snoozed
if not filtered:
default_dedup_window = datetime.now(timezone.utc) - timedelta(hours=1)
default_dedup_query = (
db_session.query(SignalInstance)
.filter(
SignalInstance.signal_id == signal_instance.signal_id,
SignalInstance.created_at >= default_dedup_window,
SignalInstance.id != signal_instance.id,
)
.order_by(asc(SignalInstance.created_at))
.all()
)
if default_dedup_query:
signal_instance.case_id = default_dedup_query[0].case_id
signal_instance.filter_action = SignalFilterAction.deduplicate
filtered = TrueResults in:
=============================================================================================================================================================================================== FAILURES ===============================================================================================================================================================================================
__________________________________________________________________________________________________________________________________________________________________________ test_filter_actions_deduplicate_different_entities __________________________________________________________________________________________________________________________________________________________________________
Traceback (most recent call last):
File "/Users/wshel/Projects/dispatch/tests/signal/test_signal_service.py", line 149, in test_filter_actions_deduplicate_different_entities
assert not filter_signal(db_session=session, signal_instance=signal_instance_1)
AssertionError: assert not True
+ where True = <function filter_signal at 0x166560b80>(db_session=<sqlalchemy.orm.session.Session object at 0x1606eeb90>, signal_instance=<SignalInstance #32345795-3f52-4b17-b71f-3d671ad7ce0e>)
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Captured stderr call -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
/Users/wshel/.pyenv/versions/3.11.2/envs/dispatch-prod/lib/python3.11/site-packages/sqlalchemy/orm/query.py:2575: SAWarning: Pathed join target SignalInstance.entities has already been joined to; skipping
util.warn(
_______________________________________________________________________________________________________________________________________________________________________ test_filter_actions_deduplicate_different_entities_types _______________________________________________________________________________________________________________________________________________________________________
Traceback (most recent call last):
File "/Users/wshel/Projects/dispatch/tests/signal/test_signal_service.py", line 207, in test_filter_actions_deduplicate_different_entities_types
assert not filter_signal(db_session=session, signal_instance=signal_instance_1)
AssertionError: assert not True
+ where True = <function filter_signal at 0x166560b80>(db_session=<sqlalchemy.orm.session.Session object at 0x1606eeb90>, signal_instance=<SignalInstance #ded5b413-b982-4e3e-99e2-0059988f2b17>)
======================================================================================================================================================================================= short test summary info ========================================================================================================================================================================================
FAILED tests/signal/test_signal_service.py::test_filter_actions_deduplicate_different_entities - assert not True
FAILED tests/signal/test_signal_service.py::test_filter_actions_deduplicate_different_entities_types - assert not True| signal_instance.filter_action = SignalFilterAction.deduplicate | ||
| filtered = True | ||
| break | ||
| else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| signal_instances = ( | ||
| db_session.query(SignalInstance) | ||
| .filter(SignalInstance.project_id == project.id) | ||
| .filter(SignalInstance.filter_action == None) # noqa |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
filter_action gets set to none as string, but we seem to compare it to the None keyword, which SQLAlchemy converts it to NULL?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we change the default value of filter_action in SignalInstanceBase to none?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something like filter_action: SignalFilterAction = SignalFilterAction.none?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then update this filter to compare with SignalFilterAction.none?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Somehow it still works with the None here, but it's very confusing, I'll update it to work this way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, we can't update this without making things complicated. Since we have set un filtered instances to SignalFilterAction.none here
def filter_signal(*, db_session: Session, signal_instance: SignalInstance) -> bool:
...
if not filtered:
signal_instance.filter_action = SignalFilterAction.noneAll of the existing instances in the database (that were not filtered), without a case_id, will be re-processed.
It's probably smart to add a boundary on time created here know that I see this.
No description provided.