-
Notifications
You must be signed in to change notification settings - Fork 24
fix: (CDK) (CsvParser) - Fix the \\
escaping when passing the delimiter
from Builder's UI
#358
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
📝 WalkthroughWalkthroughThe pull request introduces a new private method, Changes
Sequence Diagram(s)sequenceDiagram
participant C as CsvParser.parse
participant D as _get_delimiter
participant R as csv.DictReader
C->>D: Call _get_delimiter()
D-->>C: Return processed delimiter
C->>R: Initialize csv.DictReader with delimiter
R-->>C: Return parsed records
Suggested labels
Suggested reviewers
✨ Finishing Touches
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
airbyte_cdk/sources/declarative/decoders/composite_raw_decoder.py (1)
110-118
: Consider making the delimiter processing immutable, wdyt?The method modifies
self.delimiter
which could lead to unexpected behavior if the method is called multiple times. How about returning a new value instead?def _get_delimiter(self) -> Optional[str]: """ Get delimiter from the configuration. Check for the escape character and decode it. """ if self.delimiter is not None: if self.delimiter.startswith("\\"): - self.delimiter = self.delimiter.encode("utf-8").decode("unicode_escape") + return self.delimiter.encode("utf-8").decode("unicode_escape") + return self.delimiter - return self.delimiter + return Noneunit_tests/sources/declarative/decoders/test_composite_decoder.py (1)
56-72
: Consider adding more test cases for delimiter handling, wdyt?The current test covers the basic case well. Would you like to add tests for:
- Multiple escaped characters (e.g., "\\t")
- Other common escape sequences (e.g., "\n", "\r")
- Empty delimiter
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
airbyte_cdk/sources/declarative/decoders/composite_raw_decoder.py
(2 hunks)unit_tests/sources/declarative/decoders/test_composite_decoder.py
(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (8)
- GitHub Check: Check: 'source-pokeapi' (skip=false)
- GitHub Check: Check: 'source-amplitude' (skip=false)
- GitHub Check: Check: 'source-shopify' (skip=false)
- GitHub Check: Check: 'source-hardcoded-records' (skip=false)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (Fast)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Analyze (python)
🔇 Additional comments (2)
airbyte_cdk/sources/declarative/decoders/composite_raw_decoder.py (1)
127-128
: LGTM! Nice use of the new helper method.The parse method now correctly handles escaped delimiters through the
_get_delimiter
helper.unit_tests/sources/declarative/decoders/test_composite_decoder.py (1)
65-66
: LGTM! Great test coverage for the escape handling.The test now properly verifies that the parser can handle escaped delimiters, and the comment clearly explains the intention.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving as a temporal solution, till we fix encoding of escape characters in UI
Thanks for the fix @bazarnov! |
* main: fix: update cryptography package to latest version to address CVE (airbytehq#377) fix: (CDK) (HttpRequester) - Make the `HttpRequester.path` optional (airbytehq#370) feat: improved custom components handling (airbytehq#350) feat: add microseconds timestamp format (airbytehq#373) fix: Replace Unidecode with anyascii for permissive license (airbytehq#367) feat: add IncrementingCountCursor (airbytehq#346) feat: (low-code cdk) datetime format with milliseconds (airbytehq#369) fix: (CDK) (AsyncRetriever) - Improve UX on variable naming and interpolation (airbytehq#368) fix: (CDK) (AsyncRetriever) - Add the `request` and `response` to each `async` operations (airbytehq#356) fix: (CDK) (ConnectorBuilder) - Add `auxiliary requests` to slice; support `TestRead` for AsyncRetriever (part 1/2) (airbytehq#355) feat(concurrent perpartition cursor): Add parent state updates (airbytehq#343) fix: update csv parser for builder compatibility (airbytehq#364) feat(low-code cdk): add interpolation for limit field in Rate (airbytehq#353) feat(low-code cdk): add AbstractStreamFacade processing as concurrent streams in declarative source (airbytehq#347) fix: (CDK) (CsvParser) - Fix the `\\` escaping when passing the `delimiter` from Builder's UI (airbytehq#358) feat: expose `str_to_datetime` jinja macro (airbytehq#351) fix: update CDK migration for 6.34.0 (airbytehq#348) feat: Removes `stream_state` interpolation from CDK (airbytehq#320) fix(declarative): Pass `extra_fields` in `global_substream_cursor` (airbytehq#195) feat(concurrent perpartition cursor): Refactor ConcurrentPerPartitionCursor (airbytehq#331) feat(HttpMocker): adding support for PUT requests and bytes responses (airbytehq#342) chore: use certified source for manifest-only test (airbytehq#338) feat: check for request_option mapping conflicts in individual components (airbytehq#328) feat(file-based): sync file acl permissions and identities (airbytehq#260) fix: (CDK) (Connector Builder) - refactor the `MessageGrouper` > `TestRead` (airbytehq#332) fix(low code): Fix missing cursor for ClientSideIncrementalRecordFilterDecorator (airbytehq#334) feat(low-code): Add API Budget (airbytehq#314) chore(decoder): clean decoders and make csvdecoder available (airbytehq#326)
What
Sometimes there is a case when the
csv
file is encoded with the\t
or other delimiters supported (basically any character) and should be passed with theescape
character alongside. This breaks theCsvParser
implementation when the input goes from the Builder's UI.More context here: https://airbytehq-team.slack.com/archives/C02U9R3AF37/p1740003282758399
How
delimiter
to decode theescaping_character
and normalize the input before decoding records.User Impact
No impact is expected, this is not a Breaking change.
Summary by CodeRabbit