[python] Fix pyarrow module missing attribute#7258
Conversation
|
Thanks! Can we use a small case to reproduce the issue, and add it in CI to avoid such issues in future? |
Maybe it is hard to cover the issue with a reliable test. When running the full test suite, A static analysis check (e.g. mypy) can be a proper way to catch this. However, introducing mypy into CI is a broader change that can surface many existing type issues across the codebase? |
Got your point. Can we use a subprocess to test the issue, or we just introduce |
|
Additionally, AI helps to find other "optional modules" in # refer to pyarrow/__init.py__#show_info()
print("\nOptional modules:")
modules = ["csv", "cuda", "dataset", "feather", "flight", "fs", "gandiva", "json",
"orc", "parquet"] |
fe729c7 to
367043a
Compare
|
Hi @tonymtu @XiaoHongbo-Hope , maybe
I don't have any good suggestions. I'm not very familiar with Python engineering testing. Perhaps it's a problem with our CI testing? |
|
Hi @JingsongLi @XiaoHongbo-Hope . I'd prefer to split this into two separate efforts:
I noticed there's already a [mypy] section in dev/cfg.ini but it's not wired into lint-python.sh, presumably because enabling it across the full codebase would surface too many existing issues at once.
Is there any better ideas? |
apache#7258) This commit applies similar optimizations to the Python version as implemented in the Java version (commit b0dad7c). Changes include: - Implement enhanced nonce generation combining UUID, timestamp and thread ID - Add comprehensive parameter validation for sign_headers and authorization methods - Optimize date formatting to ensure GMT/UTC timezone handling - Add thread-safe nonce generation tests - Add parameter validation tests - Simplify comments and remove unnecessary blank lines to comply with Apache project standards Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
|
Sounds good to me. I will merge this PR. |
* upstream/master: [test] Remove the S3 dependency from paimon-hive and paimon-spark (apache#7279) [python] Introduce Iceberg Table in Rest Catalog (apache#7280) [fs] Replace AWS SDK v2 bundle with explicit module dependencies (apache#7285) [docs] fix doc: fileformat.md (apache#7278) [python] Improve error response parsing in HTTP client (apache#7276) [core] Optimize between in Between [spark] supports converting some SparkPredicate to Paimon between LeafPredicate (apache#7265) [spark] Introduce TrimTransform for trim/ltrim/rtrim functions to pushdown (apache#7273) [python] Fix row_id_range method invoking [core] Fix performance issue in FileStoreCommitImpl#filterCommitted (apache#7275) [python] Fix pyarrow module missing attribute (apache#7258)
Purpose
Fix AttributeError: module 'pyarrow' has no attribute 'parquet' in format table read/write.
pyarrow.parquet is a submodule that needs explicit import. Previously this wasn't caught maybe because other modules imported pyarrow.parquet as a side effect, making it available globally. When those modules are absent or import order changes, the error surfaces.
Tests
API and Format
Documentation
Generative AI tooling
No