Skip to content

feat: add to logging resource name on incremental extract duplicate thresholds #3162

@and2reak

Description

@and2reak

Feature description

Description

As a DLT technical user on multiple extract duplicate warnings would like to see what is the resource the thresholds are breached.

Examples

{"written_at":"2024-01-15T14:18:11.004Z","written_ts":12323743747832468,"component_name":"salesforce","process":12345,"taskName":null,"msg":"Large number of records (201) sharing the same value of cursor field '<yourcursorfield>'. This can happen if the cursor field has a low resolution (e.g., only stores dates without times), causing many records to share the same cursor value. Consider using a cursor column with higher resolution to reduce the deduplication state size.","type":"log","logger":"dlt","thread":"MainThread","level":"WARNING","module":"__init__","line_no":600,"version":{"dlt_version":"1.15.1","pipeline_name":"<pipeline_name>"}}

Are you a dlt user?

Yes, I'm already a dlt user.

Use case

On Salesforce for example it is hard to tell which is the resource deduped/with source data issues

Image

Slack reference

Proposed solution

Add to logger.warning _check_duplicate_cursor_threshold the resource name in question.

Related issues

No response

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions