Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Source: Apify - No properties node in stream schema #24701

Open
tybernstein opened this issue Mar 30, 2023 · 3 comments
Open

Source: Apify - No properties node in stream schema #24701

tybernstein opened this issue Mar 30, 2023 · 3 comments

Comments

@tybernstein
Copy link
Contributor

Environment

  • Airbyte version: Cloud
  • Step where error happened: Sync job

Current Behavior

Replication tab is unable to display source schema.
Sync fails due to the following error message:

2023-03-30 13:09:55 �[42mnormalization�[0m > Traceback (most recent call last):
2023-03-30 13:09:55 �[42mnormalization�[0m >   File "/usr/local/bin/transform-catalog", line 8, in <module>
2023-03-30 13:09:55 �[42mnormalization�[0m >     sys.exit(main())
2023-03-30 13:09:55 �[42mnormalization�[0m >   File "/usr/local/lib/python3.9/site-packages/normalization/transform_catalog/transform.py", line 111, in main
2023-03-30 13:09:55 �[42mnormalization�[0m >     TransformCatalog().run(args)
2023-03-30 13:09:55 �[42mnormalization�[0m >   File "/usr/local/lib/python3.9/site-packages/normalization/transform_catalog/transform.py", line 36, in run
2023-03-30 13:09:55 �[42mnormalization�[0m >     self.process_catalog()
2023-03-30 13:09:55 �[42mnormalization�[0m >   File "/usr/local/lib/python3.9/site-packages/normalization/transform_catalog/transform.py", line 64, in process_catalog
2023-03-30 13:09:55 �[42mnormalization�[0m >     processor.process(catalog_file=catalog_file, json_column_name=json_col, default_schema=schema)
2023-03-30 13:09:55 �[42mnormalization�[0m >   File "/usr/local/lib/python3.9/site-packages/normalization/transform_catalog/catalog_processor.py", line 55, in process
2023-03-30 13:09:55 �[42mnormalization�[0m >     stream_processors = self.build_stream_processor(
2023-03-30 13:09:55 �[42mnormalization�[0m >   File "/usr/local/lib/python3.9/site-packages/normalization/transform_catalog/catalog_processor.py", line 146, in build_stream_processor
2023-03-30 13:09:55 �[42mnormalization�[0m >     properties = get_field(get_field(stream_config, "json_schema", message), "properties", message)
2023-03-30 13:09:55 �[42mnormalization�[0m >   File "/usr/local/lib/python3.9/site-packages/normalization/transform_catalog/catalog_processor.py", line 238, in get_field
2023-03-30 13:09:55 �[42mnormalization�[0m >     raise KeyError(message)
2023-03-30 13:09:55 �[42mnormalization�[0m > KeyError: "'json_schema'.'properties' are not defined for stream DatasetItems"

Expected Behavior

Connector should be able to recognize dataset schema, and sync successfully.

Logs

tyler__airbyte_logs_1689166_txt.txt

Steps to Reproduce

  1. Setup an Apify Source Connector
  2. Sync to any Destination Connector
  3. Even if sync is successful no data is synced
@erohmensing
Copy link
Contributor

The offending area of code:

stream_name = DATASET_ITEMS_STREAM_NAME
json_schema = {
"$schema": "http://json-schema.org/draft-07/schema#",
"type": "object",
}

It looks like it defines the dataset as an object, but doesn't define its properties. Probably needs a smarter rework of dynamic schema discovery (since it could be any dataset, we don't know the shape of the data)

@erohmensing
Copy link
Contributor

erohmensing commented Mar 30, 2023

Interestingly enough, discover is fine with this, it's normalization that breaks, I guess because it doesn't know how to imply the types of the data that are coming though

Edit: fails without normalization too:
image

@wkargul
Copy link

wkargul commented Sep 19, 2023

Hey there! 😊

I've encountered a similar issue but when using the Files source connector. Here's what I'm passing as the file:

[["2000-06-05",116],["2000-06-06",129],["2000-06-07",135],["2000-06-08",86]]

I'm getting the same error as mentioned above. Any insights? 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants