Skip to content

[python] Fix pyarrow module missing attribute#7258

Merged
JingsongLi merged 3 commits intoapache:masterfrom
tonymtu:fix-pyarrow-missing-attribute-parquet
Feb 12, 2026
Merged

[python] Fix pyarrow module missing attribute#7258
JingsongLi merged 3 commits intoapache:masterfrom
tonymtu:fix-pyarrow-missing-attribute-parquet

Conversation

@tonymtu
Copy link
Contributor

@tonymtu tonymtu commented Feb 11, 2026

Purpose

Fix AttributeError: module 'pyarrow' has no attribute 'parquet' in format table read/write.

pyarrow.parquet is a submodule that needs explicit import. Previously this wasn't caught maybe because other modules imported pyarrow.parquet as a side effect, making it available globally. When those modules are absent or import order changes, the error surfaces.

Tests

API and Format

Documentation

Generative AI tooling

No

@XiaoHongbo-Hope
Copy link
Contributor

Thanks! Can we use a small case to reproduce the issue, and add it in CI to avoid such issues in future?

@tonymtu
Copy link
Contributor Author

tonymtu commented Feb 11, 2026

Thanks! Can we use a small case to reproduce the issue, and add it in CI to avoid such issues in future?

Maybe it is hard to cover the issue with a reliable test. When running the full test suite, import in earlier tests can silently mask the issue.

A static analysis check (e.g. mypy) can be a proper way to catch this. However, introducing mypy into CI is a broader change that can surface many existing type issues across the codebase?

@XiaoHongbo-Hope
Copy link
Contributor

Thanks! Can we use a small case to reproduce the issue, and add it in CI to avoid such issues in future?

Maybe it is hard to cover the issue with a reliable test. When running the full test suite, import in earlier tests can silently mask the issue.

A static analysis check (e.g. mypy) can be a proper way to catch this. However, introducing mypy into CI is a broader change that can surface many existing type issues across the codebase?

Got your point. Can we use a subprocess to test the issue, or we just introduce mypy into CI ? We should try our best to catch these issues in CI rather than on the user side. Can you provide us some suggestion on it? @JingsongLi

@tonymtu tonymtu changed the title [python] Fix pyarrow module missing attribute parquet [python] Fix pyarrow module missing attribute Feb 11, 2026
@tonymtu
Copy link
Contributor Author

tonymtu commented Feb 11, 2026

Additionally, AI helps to find other "optional modules" in pyarrow that need explicit import across the codebase.

# refer to pyarrow/__init.py__#show_info()
print("\nOptional modules:")
modules = ["csv", "cuda", "dataset", "feather", "flight", "fs", "gandiva", "json",
           "orc", "parquet"]

@tonymtu tonymtu force-pushed the fix-pyarrow-missing-attribute-parquet branch from fe729c7 to 367043a Compare February 11, 2026 08:17
@JingsongLi
Copy link
Contributor

Hi @tonymtu @XiaoHongbo-Hope , maybe

Thanks! Can we use a small case to reproduce the issue, and add it in CI to avoid such issues in future?

Maybe it is hard to cover the issue with a reliable test. When running the full test suite, import in earlier tests can silently mask the issue.
A static analysis check (e.g. mypy) can be a proper way to catch this. However, introducing mypy into CI is a broader change that can surface many existing type issues across the codebase?

Got your point. Can we use a subprocess to test the issue, or we just introduce mypy into CI ? We should try our best to catch these issues in CI rather than on the user side. Can you provide us some suggestion on it? @JingsongLi

I don't have any good suggestions. I'm not very familiar with Python engineering testing. Perhaps it's a problem with our CI testing?

@tonymtu
Copy link
Contributor Author

tonymtu commented Feb 12, 2026

Hi @JingsongLi @XiaoHongbo-Hope . I'd prefer to split this into two separate efforts:

  • This PR: fix the immediate bug (explicit import pyarrow.xxx).
  • A follow-up effort: enable mypy checking in CI to prevent this class of issues.

I noticed there's already a [mypy] section in dev/cfg.ini but it's not wired into lint-python.sh, presumably because enabling it across the full codebase would surface too many existing issues at once.

Suggestions on enabling [mypy] by Claude Opus:
flake8 (our current linter) cannot catch this — it's a style checker, not a type checker. Only mypy can detect unresolved submodule attribute access. A practical approach for the follow-up would be running mypy only on files changed in each PR (git diff --name-only), so we can incrementally enforce type checking without a massive one-time fix.

Is there any better ideas?

sundapeng added a commit to sundapeng/paimon that referenced this pull request Feb 12, 2026
apache#7258)

This commit applies similar optimizations to the Python version as implemented
in the Java version (commit b0dad7c). Changes include:

- Implement enhanced nonce generation combining UUID, timestamp and thread ID
- Add comprehensive parameter validation for sign_headers and authorization methods
- Optimize date formatting to ensure GMT/UTC timezone handling
- Add thread-safe nonce generation tests
- Add parameter validation tests
- Simplify comments and remove unnecessary blank lines to comply with Apache project standards

Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
@JingsongLi
Copy link
Contributor

Sounds good to me. I will merge this PR.

@JingsongLi JingsongLi merged commit a1a75f5 into apache:master Feb 12, 2026
6 checks passed
jerry-024 added a commit to jerry-024/paimon that referenced this pull request Feb 14, 2026
* upstream/master:
  [test] Remove the S3 dependency from paimon-hive and paimon-spark (apache#7279)
  [python] Introduce Iceberg Table in Rest Catalog (apache#7280)
  [fs] Replace AWS SDK v2 bundle with explicit module dependencies (apache#7285)
  [docs] fix doc: fileformat.md (apache#7278)
  [python] Improve error response parsing in HTTP client (apache#7276)
  [core] Optimize between in Between
  [spark] supports converting some SparkPredicate to Paimon between LeafPredicate (apache#7265)
  [spark] Introduce TrimTransform for trim/ltrim/rtrim functions to pushdown (apache#7273)
  [python] Fix row_id_range method invoking
  [core] Fix performance issue in FileStoreCommitImpl#filterCommitted (apache#7275)
  [python] Fix pyarrow module missing attribute (apache#7258)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants