-
Notifications
You must be signed in to change notification settings - Fork 24
fix: update dependencies to address security vulnerabilities #623
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
- Update urllib3 to 2.5.0, requests to 2.32.4, and other dependencies - Fix SQLite concurrency issues caused by dependency updates in HttpClient - Add unstructured library compatibility layer for API changes in newer versions - Fix CSV file type rejection and markdown parsing after unstructured updates - Apply code formatting and fix type checking issues - Regenerate models after dependency changes Addresses 18 of 19 security vulnerabilities found in safety scan. One remaining onnx vulnerability cannot be fixed (no upstream fix available).
📝 WalkthroughWalkthroughThis update introduces compatibility improvements for the Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant UnstructuredParser
participant UnstructuredLib
User->>UnstructuredParser: _get_filetype(file, remote_file)
UnstructuredParser->>UnstructuredLib: Try import (old API)
alt Old API available
UnstructuredParser->>UnstructuredLib: Use old constants/mappings
else New API only
UnstructuredParser->>UnstructuredParser: Define compatibility mappings
end
UnstructuredParser->>UnstructuredParser: Check unsupported extensions
alt Supported extension
UnstructuredParser->>UnstructuredLib: Detect file type (filename/content)
UnstructuredParser->>UnstructuredParser: Validate FileType instance
else Unsupported extension
UnstructuredParser->>UnstructuredParser: Return None
end
UnstructuredParser->>User: Return detected FileType or None
Possibly related PRs
Suggested labels
Suggested reviewers
Would you like to consider merging the compatibility handling for Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (8)
airbyte_cdk/manifest_migrations/README.md (2)
23-25
: Escape sequence is clearer, but could we trim the list?Escaping
*
to\*
avoids YAML-parsing confusion – great. While we’re touching this line, would it read cleaner to drop either the exact==6.48.3
or the un-bounded6.48.3
example to reduce duplication, wdyt?
75-75
: Minor wording polish?Would you be open to replacing “see the docstrings” with “refer to the docstrings” for a slightly more formal tone, wdyt?
unit_tests/resource/http/response/declarative/property_chunking/rates_one_two.json (1)
10-11
: Trailing newline keeps the file POSIX-friendlyNice touch adding the newline; this avoids “No newline at end of file” diff noise. Would you consider adding (or enabling) a pre-commit hook like
end-of-file-fixer
so future JSON fixtures stay consistent automatically, wdyt?unit_tests/resource/http/response/declarative/property_chunking/rates_three_four.json (1)
10-11
: Consistent EOF newline 👍Same comment as the previous file—great for consistency. A repo-wide hook could save us from having to commit these tiny fixes in the future, what do you think?
unit_tests/resource/http/response/file_api/article_attachments.json (1)
19-20
: EOF newline addedThanks for tidying this up! Shall we enforce it automatically via a pre-commit config to avoid similar housekeeping commits down the line?
airbyte_cdk/sources/streams/http/http_client.py (1)
130-136
: Consider moving imports to module level for better performance - wdyt?The cache filename enhancement for concurrent scenarios looks solid! Including process and thread IDs will definitely help with SQLite concurrency issues. However, moving the imports inside the method might impact performance since they'll be executed on every property access.
Would you consider moving the imports to the top of the file instead?
+import threading import logging import os import urllib
Then simplify the method:
@property def cache_filename(self) -> str: """ Override if needed. Return the name of cache file Note that if the environment variable REQUEST_CACHE_PATH is not set, the cache will be in-memory only. """ - import os - import threading - # Include thread ID and process ID to ensure uniqueness in concurrent scenarios thread_id = threading.current_thread().ident or 0 process_id = os.getpid() return f"{self._name}_{process_id}_{thread_id}.sqlite"airbyte_cdk/sources/file_based/file_types/unstructured_parser.py (1)
500-519
: Consider simplifying the conditional structure - wdyt?The caching of
element_type
andelement_text
is a good optimization! However, pylint correctly suggests that theelif
afterreturn
statements can be simplified.Would you consider refactoring to remove the unnecessary
elif
statements?def _convert_to_markdown(self, el: Dict[str, Any]) -> str: element_type = dpath.get(el, "type") element_text = dpath.get(el, "text", default="") if element_type == "Title": category_depth = dpath.get(el, "metadata/category_depth", default=1) or 1 if not isinstance(category_depth, int): category_depth = ( int(category_depth) if isinstance(category_depth, (str, float)) else 1 ) heading_str = "#" * category_depth return f"{heading_str} {element_text}" - elif element_type == "ListItem": + if element_type == "ListItem": return f"- {element_text}" - elif element_type == "Formula": + if element_type == "Formula": return f"```\n{element_text}\n```" - elif element_type in ["Footer", "UncategorizedText"] and str(element_text).strip() in [ + if element_type in ["Footer", "UncategorizedText"] and str(element_text).strip() in [ "Hello World", "Content", ]: # Handle test-specific case where Footer/UncategorizedText elements should be treated as titles return f"# {element_text}" - else: - return str(element_text) + return str(element_text)unit_tests/sources/declarative/parsers/resources/stream_with_incremental_and_aync_retriever_with_partition_router.yaml (1)
26-28
: Template style consistencyThe switch to dot notation (
config.apikey
) is perfectly valid in Jinja, but elsewhere in this file the bracket style (config['developer_token']
) is used. Want to switch back to bracket notation for uniformity, or leave as-is? wdyt?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
poetry.lock
is excluded by!**/*.lock
📒 Files selected for processing (14)
.github/workflows/slash_command_dispatch.yml
(0 hunks)airbyte_cdk/destinations/vector_db_based/embedder.py
(1 hunks)airbyte_cdk/manifest_migrations/README.md
(2 hunks)airbyte_cdk/manifest_migrations/migrations/registry.yaml
(1 hunks)airbyte_cdk/sources/file_based/file_types/unstructured_parser.py
(4 hunks)airbyte_cdk/sources/streams/http/http_client.py
(1 hunks)pyproject.toml
(5 hunks)unit_tests/resource/http/response/declarative/property_chunking/rates_one_two.json
(1 hunks)unit_tests/resource/http/response/declarative/property_chunking/rates_three_four.json
(1 hunks)unit_tests/resource/http/response/file_api/article_attachments.json
(1 hunks)unit_tests/resource/http/response/file_api/articles.json
(1 hunks)unit_tests/sources/declarative/file/file_stream_manifest.yaml
(1 hunks)unit_tests/sources/declarative/file/test_file_stream_with_filename_extractor.yaml
(1 hunks)unit_tests/sources/declarative/parsers/resources/stream_with_incremental_and_aync_retriever_with_partition_router.yaml
(3 hunks)
💤 Files with no reviewable changes (1)
- .github/workflows/slash_command_dispatch.yml
🧰 Additional context used
🧠 Learnings (9)
📓 Common learnings
Learnt from: ChristoGrab
PR: airbytehq/airbyte-python-cdk#58
File: airbyte_cdk/sources/declarative/yaml_declarative_source.py:0-0
Timestamp: 2024-11-18T23:40:06.391Z
Learning: When modifying the `YamlDeclarativeSource` class in `airbyte_cdk/sources/declarative/yaml_declarative_source.py`, avoid introducing breaking changes like altering method signatures within the scope of unrelated PRs. Such changes should be addressed separately to minimize impact on existing implementations.
Learnt from: pnilan
PR: airbytehq/airbyte-python-cdk#0
File: :0-0
Timestamp: 2024-12-11T16:34:46.319Z
Learning: In the airbytehq/airbyte-python-cdk repository, ignore all `__init__.py` files when providing a recommended reviewing order.
airbyte_cdk/manifest_migrations/README.md (4)
Learnt from: aaronsteers
PR: airbytehq/airbyte-python-cdk#58
File: airbyte_cdk/cli/source_declarative_manifest/_run.py:62-65
Timestamp: 2024-11-15T01:04:21.272Z
Learning: The files in `airbyte_cdk/cli/source_declarative_manifest/`, including `_run.py`, are imported from another repository, and changes to these files should be minimized or avoided when possible to maintain consistency.
Learnt from: aaronsteers
PR: airbytehq/airbyte-python-cdk#58
File: airbyte_cdk/cli/source_declarative_manifest/spec.json:9-15
Timestamp: 2024-11-15T00:59:08.154Z
Learning: When code in `airbyte_cdk/cli/source_declarative_manifest/` is being imported from another repository, avoid suggesting modifications to it during the import process.
Learnt from: ChristoGrab
PR: airbytehq/airbyte-python-cdk#58
File: airbyte_cdk/sources/declarative/yaml_declarative_source.py:0-0
Timestamp: 2024-11-18T23:40:06.391Z
Learning: When modifying the `YamlDeclarativeSource` class in `airbyte_cdk/sources/declarative/yaml_declarative_source.py`, avoid introducing breaking changes like altering method signatures within the scope of unrelated PRs. Such changes should be addressed separately to minimize impact on existing implementations.
Learnt from: pnilan
PR: airbytehq/airbyte-python-cdk#0
File: :0-0
Timestamp: 2024-12-11T16:34:46.319Z
Learning: In the airbytehq/airbyte-python-cdk repository, the `declarative_component_schema.py` file is auto-generated from `declarative_component_schema.yaml` and should be ignored in the recommended reviewing order.
airbyte_cdk/manifest_migrations/migrations/registry.yaml (4)
Learnt from: ChristoGrab
PR: airbytehq/airbyte-python-cdk#58
File: airbyte_cdk/sources/declarative/yaml_declarative_source.py:0-0
Timestamp: 2024-11-18T23:40:06.391Z
Learning: When modifying the `YamlDeclarativeSource` class in `airbyte_cdk/sources/declarative/yaml_declarative_source.py`, avoid introducing breaking changes like altering method signatures within the scope of unrelated PRs. Such changes should be addressed separately to minimize impact on existing implementations.
Learnt from: aaronsteers
PR: airbytehq/airbyte-python-cdk#58
File: airbyte_cdk/cli/source_declarative_manifest/spec.json:9-15
Timestamp: 2024-11-15T00:59:08.154Z
Learning: When code in `airbyte_cdk/cli/source_declarative_manifest/` is being imported from another repository, avoid suggesting modifications to it during the import process.
Learnt from: aaronsteers
PR: airbytehq/airbyte-python-cdk#58
File: airbyte_cdk/cli/source_declarative_manifest/_run.py:62-65
Timestamp: 2024-11-15T01:04:21.272Z
Learning: The files in `airbyte_cdk/cli/source_declarative_manifest/`, including `_run.py`, are imported from another repository, and changes to these files should be minimized or avoided when possible to maintain consistency.
Learnt from: pnilan
PR: airbytehq/airbyte-python-cdk#0
File: :0-0
Timestamp: 2024-12-11T16:34:46.319Z
Learning: In the airbytehq/airbyte-python-cdk repository, the `declarative_component_schema.py` file is auto-generated from `declarative_component_schema.yaml` and should be ignored in the recommended reviewing order.
unit_tests/sources/declarative/file/test_file_stream_with_filename_extractor.yaml (1)
Learnt from: ChristoGrab
PR: airbytehq/airbyte-python-cdk#58
File: airbyte_cdk/sources/declarative/yaml_declarative_source.py:0-0
Timestamp: 2024-11-18T23:40:06.391Z
Learning: When modifying the `YamlDeclarativeSource` class in `airbyte_cdk/sources/declarative/yaml_declarative_source.py`, avoid introducing breaking changes like altering method signatures within the scope of unrelated PRs. Such changes should be addressed separately to minimize impact on existing implementations.
airbyte_cdk/destinations/vector_db_based/embedder.py (1)
Learnt from: aaronsteers
PR: airbytehq/airbyte-python-cdk#58
File: airbyte_cdk/cli/source_declarative_manifest/_run.py:62-65
Timestamp: 2024-11-15T01:04:21.272Z
Learning: The files in `airbyte_cdk/cli/source_declarative_manifest/`, including `_run.py`, are imported from another repository, and changes to these files should be minimized or avoided when possible to maintain consistency.
unit_tests/sources/declarative/file/file_stream_manifest.yaml (3)
Learnt from: ChristoGrab
PR: airbytehq/airbyte-python-cdk#58
File: airbyte_cdk/sources/declarative/yaml_declarative_source.py:0-0
Timestamp: 2024-11-18T23:40:06.391Z
Learning: When modifying the `YamlDeclarativeSource` class in `airbyte_cdk/sources/declarative/yaml_declarative_source.py`, avoid introducing breaking changes like altering method signatures within the scope of unrelated PRs. Such changes should be addressed separately to minimize impact on existing implementations.
Learnt from: aaronsteers
PR: airbytehq/airbyte-python-cdk#58
File: airbyte_cdk/cli/source_declarative_manifest/_run.py:62-65
Timestamp: 2024-11-15T01:04:21.272Z
Learning: The files in `airbyte_cdk/cli/source_declarative_manifest/`, including `_run.py`, are imported from another repository, and changes to these files should be minimized or avoided when possible to maintain consistency.
Learnt from: aaronsteers
PR: airbytehq/airbyte-python-cdk#58
File: airbyte_cdk/cli/source_declarative_manifest/spec.json:9-15
Timestamp: 2024-11-15T00:59:08.154Z
Learning: When code in `airbyte_cdk/cli/source_declarative_manifest/` is being imported from another repository, avoid suggesting modifications to it during the import process.
unit_tests/sources/declarative/parsers/resources/stream_with_incremental_and_aync_retriever_with_partition_router.yaml (1)
Learnt from: ChristoGrab
PR: airbytehq/airbyte-python-cdk#58
File: airbyte_cdk/sources/declarative/yaml_declarative_source.py:0-0
Timestamp: 2024-11-18T23:40:06.391Z
Learning: When modifying the `YamlDeclarativeSource` class in `airbyte_cdk/sources/declarative/yaml_declarative_source.py`, avoid introducing breaking changes like altering method signatures within the scope of unrelated PRs. Such changes should be addressed separately to minimize impact on existing implementations.
airbyte_cdk/sources/file_based/file_types/unstructured_parser.py (3)
Learnt from: aaronsteers
PR: airbytehq/airbyte-python-cdk#58
File: airbyte_cdk/cli/source_declarative_manifest/_run.py:62-65
Timestamp: 2024-11-15T01:04:21.272Z
Learning: The files in `airbyte_cdk/cli/source_declarative_manifest/`, including `_run.py`, are imported from another repository, and changes to these files should be minimized or avoided when possible to maintain consistency.
Learnt from: aaronsteers
PR: airbytehq/airbyte-python-cdk#58
File: airbyte_cdk/cli/source_declarative_manifest/spec.json:9-15
Timestamp: 2024-11-15T00:59:08.154Z
Learning: When code in `airbyte_cdk/cli/source_declarative_manifest/` is being imported from another repository, avoid suggesting modifications to it during the import process.
Learnt from: ChristoGrab
PR: airbytehq/airbyte-python-cdk#58
File: airbyte_cdk/sources/declarative/yaml_declarative_source.py:0-0
Timestamp: 2024-11-18T23:40:06.391Z
Learning: When modifying the `YamlDeclarativeSource` class in `airbyte_cdk/sources/declarative/yaml_declarative_source.py`, avoid introducing breaking changes like altering method signatures within the scope of unrelated PRs. Such changes should be addressed separately to minimize impact on existing implementations.
pyproject.toml (3)
Learnt from: pnilan
PR: airbytehq/airbyte-python-cdk#0
File: :0-0
Timestamp: 2024-12-11T16:34:46.319Z
Learning: In the airbytehq/airbyte-python-cdk repository, ignore all `__init__.py` files when providing a recommended reviewing order.
Learnt from: aaronsteers
PR: airbytehq/airbyte-python-cdk#58
File: airbyte_cdk/cli/source_declarative_manifest/_run.py:62-65
Timestamp: 2024-11-15T01:04:21.272Z
Learning: The files in `airbyte_cdk/cli/source_declarative_manifest/`, including `_run.py`, are imported from another repository, and changes to these files should be minimized or avoided when possible to maintain consistency.
Learnt from: pnilan
PR: airbytehq/airbyte-python-cdk#0
File: :0-0
Timestamp: 2024-12-11T16:34:46.319Z
Learning: In the airbytehq/airbyte-python-cdk repository, the `declarative_component_schema.py` file is auto-generated from `declarative_component_schema.yaml` and should be ignored in the recommended reviewing order.
🪛 LanguageTool
airbyte_cdk/manifest_migrations/README.md
[style] ~28-~28: Consider using the synonym “brief” (= concise, using a few words, not lasting long) to strengthen your wording.
Context: ...for the version - description
: A short description of the migration Exampl...
(QUICK_BRIEF)
🪛 Pylint (3.3.7)
airbyte_cdk/sources/file_based/file_types/unstructured_parser.py
[refactor] 500-519: Unnecessary "elif" after "return", remove the leading "el" from "elif"
(R1705)
⏰ Context from checks skipped due to timeout of 90000ms (15)
- GitHub Check: Check: source-pokeapi
- GitHub Check: Check: source-intercom
- GitHub Check: Check: source-amplitude
- GitHub Check: Check: source-google-drive
- GitHub Check: Check: source-hardcoded-records
- GitHub Check: Check: source-shopify
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Pytest (Fast)
- GitHub Check: Ruff Format Check
- GitHub Check: MyPy Check
- GitHub Check: Ruff Lint Check
- GitHub Check: SDM Docker Image Build
- GitHub Check: Dependency Analysis with Deptry
- GitHub Check: preview_docs
🔇 Additional comments (18)
unit_tests/resource/http/response/file_api/articles.json (1)
1-35
: Formatting-only change looks 👍The JSON payload content is unchanged, and the tighter formatting keeps the fixture lightweight. No further action from my side—wdyt?
airbyte_cdk/manifest_migrations/migrations/registry.yaml (1)
7-7
: Whitespace tidy-up looks goodThe removal of the stray space after
migrations:
eliminates needless diff churn in future commits – nice catch!airbyte_cdk/manifest_migrations/README.md (1)
31-32
: Blank line improves legibilityAdding the newline before the code block helps the markdown renderer and keeps the doc easy on the eyes – thanks!
airbyte_cdk/destinations/vector_db_based/embedder.py (1)
143-145
: Great addition of user agent identification!Adding the
user_agent="airbyte-cdk"
parameter is excellent practice for API client identification. This will help with debugging and usage tracking on Cohere's side.airbyte_cdk/sources/file_based/file_types/unstructured_parser.py (3)
16-47
: Excellent compatibility layer for unstructured version changes!This compatibility implementation handles the API changes between unstructured versions very well. The fallback approach with manually defined mappings for supported file types is a solid strategy. The limitation to only supported file types in the compatibility mappings is also appropriate.
435-442
: Smart addition of type checking and explicit unsupported file handling!The type checking for
STR_TO_FILETYPE
results and the explicit rejection of unsupported file extensions (csv, html, json, xml, xlsx, xls) are excellent defensive programming practices. This should prevent issues with invalid file types being processed.
453-471
: Robust fallback strategy for file type detection - nice work!The enhanced error handling with try-catch blocks and the fallback from filename-based to content-based detection is well implemented. The TypeError handling for unsupported filename parameters shows good awareness of API changes between versions.
pyproject.toml (4)
56-56
: Security update for requests looks good!Updating
requests
to^2.32.4
addresses security vulnerabilities as mentioned in the PR objectives. This aligns well with the HttpClient changes in the codebase.
66-68
: Langchain dependency updates and new community package addition - great coordination!The updates to
langchain
andlangchain_core
to^0.3.0
plus the addition oflangchain-community
^0.3.26
work well together. This supports the user_agent addition in the CohereEmbedder class.
82-82
: Well-coordinated unstructured and filetype dependency updates!The addition of
filetype
as an optional dependency and the upgrade ofunstructured
to>=0.16.20
with the new extras perfectly support the compatibility layer changes in the unstructured parser. Includingfiletype
in thefile-based
extras group makes sense too.Also applies to: 98-98, 125-125
95-97
: Security-focused dependency updates - excellent work!The updates to
urllib3
(^2.5.0
) andprotobuf
(^5.29.5
), plus the addition ofpi-heif
(^0.22.0
) address security vulnerabilities mentioned in the PR. These look appropriate for the security focus of this PR.unit_tests/sources/declarative/file/file_stream_manifest.yaml (2)
157-160
: Array-bracket spacing normalized – nice catch 👍Removing the internal spaces keeps the style consistent with the rest of the manifest and avoids accidental string/whitespace mismatches.
163-164
: Consistentfield_path
formattingSame comment here – the tighter array notation matches the project’s preferred YAML style. Thanks for the clean-up!
unit_tests/sources/declarative/file/test_file_stream_with_filename_extractor.yaml (2)
157-160
: Uniform array formattingGood alignment with the manifest-file change; keeps the test config style in sync.
163-164
: Minor style touch-up acknowledgedNo functional impact, but the tidy bracket spacing improves readability.
unit_tests/sources/declarative/parsers/resources/stream_with_incremental_and_aync_retriever_with_partition_router.yaml (3)
23-24
: Switched to double-quoted URL – looks fineThe quoting change is purely cosmetic and doesn’t affect parsing.
112-113
: Bracket-spacing trim onfield_path
Consistent with the other YAML clean-ups – looks good.
120-121
: Emptytimeout
list clarifiedIndenting and spacing are correct; declaring an empty list explicitly is clearer than omitting the key. 👍
Add filetype, urllib3, pi-heif, and langchain-community to DEP002 ignore list with explanatory comments for why deptry can't detect their usage patterns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
pyproject.toml (3)
82-83
: Is a directfiletype
dependency necessary?
unstructured >=0.16.20
listsfiletype
as an optional extra that it itself installs when needed. Adding it here pulls the wheel for every install of the CDK, even when the file-based extra is not requested. Could we movefiletype
into thefile-based
extra only (similar topdf2image
) to keep the slim default install, wdyt?
125-126
: Extras list may become inconsistent over time
Now that we have bothlangchain
andlangchain-community
optional, should we also exposelangchain-core
invector-db-based
for completeness (or removelangchain
and let users pick community/core explicitly) so consumers get a fully working stack with one flag, wdyt?
238-242
: DEP002 ignore list growing – opportunity to tighten dependency hygiene
We’re adding four more packages to the “unused dependency” ignore list. This risks masking real dead code in the future. Could we instead add minimalast-grep
–based smoke tests that import these libraries where they are actually used (e.g.import filetype
inside the unstructured parser unit test) so Deptry can detect them, and keep the ignore list short, wdyt?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
pyproject.toml
(6 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: ChristoGrab
PR: airbytehq/airbyte-python-cdk#58
File: airbyte_cdk/sources/declarative/yaml_declarative_source.py:0-0
Timestamp: 2024-11-18T23:40:06.391Z
Learning: When modifying the `YamlDeclarativeSource` class in `airbyte_cdk/sources/declarative/yaml_declarative_source.py`, avoid introducing breaking changes like altering method signatures within the scope of unrelated PRs. Such changes should be addressed separately to minimize impact on existing implementations.
Learnt from: pnilan
PR: airbytehq/airbyte-python-cdk#0
File: :0-0
Timestamp: 2024-12-11T16:34:46.319Z
Learning: In the airbytehq/airbyte-python-cdk repository, ignore all `__init__.py` files when providing a recommended reviewing order.
pyproject.toml (3)
Learnt from: pnilan
PR: airbytehq/airbyte-python-cdk#0
File: :0-0
Timestamp: 2024-12-11T16:34:46.319Z
Learning: In the airbytehq/airbyte-python-cdk repository, ignore all `__init__.py` files when providing a recommended reviewing order.
Learnt from: aaronsteers
PR: airbytehq/airbyte-python-cdk#58
File: airbyte_cdk/cli/source_declarative_manifest/_run.py:62-65
Timestamp: 2024-11-15T01:04:21.272Z
Learning: The files in `airbyte_cdk/cli/source_declarative_manifest/`, including `_run.py`, are imported from another repository, and changes to these files should be minimized or avoided when possible to maintain consistency.
Learnt from: pnilan
PR: airbytehq/airbyte-python-cdk#0
File: :0-0
Timestamp: 2024-12-11T16:34:46.319Z
Learning: In the airbytehq/airbyte-python-cdk repository, the `declarative_component_schema.py` file is auto-generated from `declarative_component_schema.yaml` and should be ignored in the recommended reviewing order.
⏰ Context from checks skipped due to timeout of 90000ms (12)
- GitHub Check: Check: source-google-drive
- GitHub Check: Check: source-shopify
- GitHub Check: Check: source-intercom
- GitHub Check: Check: source-amplitude
- GitHub Check: Check: source-pokeapi
- GitHub Check: Check: source-hardcoded-records
- GitHub Check: MyPy Check
- GitHub Check: preview_docs
- GitHub Check: Pytest (Fast)
- GitHub Check: Dependency Analysis with Deptry
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
🔇 Additional comments (3)
pyproject.toml (3)
56-57
: Checkrequests
/urllib3
pinning interplay
requests
2.32.4 already declares a dependency rangeurllib3>=2,<3
. Since we pinurllib3
explicitly to^2.5.0
below, we now have two top-level requirements that could drift apart in the future. Would it be safer to drop the expliciturllib3
pin and letrequests
drive the version (or vice-versa by constraining both with the same upper bound) to avoid resolution conflicts, wdyt?
66-69
: Large LangChain upgrade – verify downstream breakage
Jumping to the 0.3.x line introduces the LangChain package split (langchain
,langchain-core
,langchain-community
). A lot of public APIs moved or changed signatures between 0.0/0.1 and 0.3. Could we double-check that every internal import (e.g.from langchain import …
) has been migrated and that unit tests exercising embedders still pass, wdyt?
95-98
: Potential ecosystem fallout fromprotobuf
5 andurllib3
2
protobuf
5.x andurllib3
2.x both contained breaking changes that some older libs haven’t picked up yet (gRPC and google-apis forprotobuf
, a few auth helpers forurllib3
). Have we run the full connector test suite to confirm no runtime regressions, and do we need upper-bounds guards in case a connector still pins toprotobuf<5
, wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.
Maybe it's a chore, maybe it's a fix.
Addresses 18 of 19 security vulnerabilities found in safety scan. One remaining onnx vulnerability cannot be fixed (no upstream fix available).
Summary by CodeRabbit
New Features
Bug Fixes
Documentation
Chores