-
Notifications
You must be signed in to change notification settings - Fork 3.2k
fix(ingest/datahub): Create Structured property templates in advance and batch processing #13355
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov ReportAttention: Patch coverage is ✅ All tests successful. No failed tests found.
❌ Your patch status has failed because the patch coverage (57.97%) is below the target coverage (75.00%). You can increase the patch coverage or adjust the target coverage. 📢 Thoughts on this report? Let us know! |
) | ||
|
||
time.sleep( | ||
self.config.structured_properties_template_cache_invalidation_interval |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This time should be based on on how many structured properties exist.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, assuming green CI.
Ideally we could use something like: https://docs.datahub.com/docs/advanced/api-tracing to know when structured properties have been processed.
) | ||
|
||
query_timeout: Optional[int] = Field( | ||
hidden_from_docs=True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should not be hidden from docs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed
logger.debug("Fetching soft-deleted URNs") | ||
|
||
# Use server-side cursor implementation | ||
rows = self.execute_server_cursor(self.soft_deleted_urns_query, params) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@treff7es this query should also take into account the execution dates + limits & offsets. We can have ingestion sources emitting a lot of soft deletes so the disk issue on the server is also applicable here.
🔴 Meticulous spotted visual differences in 36 of 612 screens tested: view and approve differences detected. Meticulous evaluated ~10 hours of user flows against your PR. Last updated for commit 8afa715. This comment will update as new commits are pushed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nicely done!
duplicate_found = True | ||
break | ||
if first_iteration: | ||
start_date = row.get("createdon", start_date) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice
8afa715
to
3f01d68
Compare
Uh oh!
There was an error while loading. Please reload this page.