[python] Fix pyarrow module missing attribute by tonymtu · Pull Request #7258 · apache/paimon

tonymtu · 2026-02-11T03:19:08Z

Purpose

Fix AttributeError: module 'pyarrow' has no attribute 'parquet' in format table read/write.

pyarrow.parquet is a submodule that needs explicit import. Previously this wasn't caught maybe because other modules imported pyarrow.parquet as a side effect, making it available globally. When those modules are absent or import order changes, the error surfaces.

Tests

API and Format

Documentation

Generative AI tooling

No

XiaoHongbo-Hope · 2026-02-11T03:31:10Z

Thanks! Can we use a small case to reproduce the issue, and add it in CI to avoid such issues in future?

tonymtu · 2026-02-11T06:06:27Z

Thanks! Can we use a small case to reproduce the issue, and add it in CI to avoid such issues in future?

Maybe it is hard to cover the issue with a reliable test. When running the full test suite, import in earlier tests can silently mask the issue.

A static analysis check (e.g. mypy) can be a proper way to catch this. However, introducing mypy into CI is a broader change that can surface many existing type issues across the codebase?

XiaoHongbo-Hope · 2026-02-11T06:16:20Z

Thanks! Can we use a small case to reproduce the issue, and add it in CI to avoid such issues in future?

Maybe it is hard to cover the issue with a reliable test. When running the full test suite, import in earlier tests can silently mask the issue.

A static analysis check (e.g. mypy) can be a proper way to catch this. However, introducing mypy into CI is a broader change that can surface many existing type issues across the codebase?

Got your point. Can we use a subprocess to test the issue, or we just introduce mypy into CI ? We should try our best to catch these issues in CI rather than on the user side. Can you provide us some suggestion on it? @JingsongLi

tonymtu · 2026-02-11T06:54:00Z

Additionally, AI helps to find other "optional modules" in pyarrow that need explicit import across the codebase.

# refer to pyarrow/__init.py__#show_info()
print("\nOptional modules:")
modules = ["csv", "cuda", "dataset", "feather", "flight", "fs", "gandiva", "json",
           "orc", "parquet"]

JingsongLi · 2026-02-11T22:56:06Z

Hi @tonymtu @XiaoHongbo-Hope , maybe

Thanks! Can we use a small case to reproduce the issue, and add it in CI to avoid such issues in future?

Maybe it is hard to cover the issue with a reliable test. When running the full test suite, import in earlier tests can silently mask the issue.
A static analysis check (e.g. mypy) can be a proper way to catch this. However, introducing mypy into CI is a broader change that can surface many existing type issues across the codebase?

Got your point. Can we use a subprocess to test the issue, or we just introduce mypy into CI ? We should try our best to catch these issues in CI rather than on the user side. Can you provide us some suggestion on it? @JingsongLi

I don't have any good suggestions. I'm not very familiar with Python engineering testing. Perhaps it's a problem with our CI testing?

tonymtu · 2026-02-12T05:12:44Z

Hi @JingsongLi @XiaoHongbo-Hope . I'd prefer to split this into two separate efforts:

This PR: fix the immediate bug (explicit import pyarrow.xxx).
A follow-up effort: enable mypy checking in CI to prevent this class of issues.

I noticed there's already a [mypy] section in dev/cfg.ini but it's not wired into lint-python.sh, presumably because enabling it across the full codebase would surface too many existing issues at once.

Suggestions on enabling [mypy] by Claude Opus:
flake8 (our current linter) cannot catch this — it's a style checker, not a type checker. Only mypy can detect unresolved submodule attribute access. A practical approach for the follow-up would be running mypy only on files changed in each PR (git diff --name-only), so we can incrementally enforce type checking without a massive one-time fix.

Is there any better ideas?

apache#7258) This commit applies similar optimizations to the Python version as implemented in the Java version (commit b0dad7c). Changes include: - Implement enhanced nonce generation combining UUID, timestamp and thread ID - Add comprehensive parameter validation for sign_headers and authorization methods - Optimize date formatting to ensure GMT/UTC timezone handling - Add thread-safe nonce generation tests - Add parameter validation tests - Simplify comments and remove unnecessary blank lines to comply with Apache project standards Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>

JingsongLi · 2026-02-12T08:28:00Z

Sounds good to me. I will merge this PR.

* upstream/master: [test] Remove the S3 dependency from paimon-hive and paimon-spark (apache#7279) [python] Introduce Iceberg Table in Rest Catalog (apache#7280) [fs] Replace AWS SDK v2 bundle with explicit module dependencies (apache#7285) [docs] fix doc: fileformat.md (apache#7278) [python] Improve error response parsing in HTTP client (apache#7276) [core] Optimize between in Between [spark] supports converting some SparkPredicate to Paimon between LeafPredicate (apache#7265) [spark] Introduce TrimTransform for trim/ltrim/rtrim functions to pushdown (apache#7273) [python] Fix row_id_range method invoking [core] Fix performance issue in FileStoreCommitImpl#filterCommitted (apache#7275) [python] Fix pyarrow module missing attribute (apache#7258)

fix pyarrow module missing attribute parquet

21a223b

tonymtu changed the title ~~[python] Fix pyarrow module missing attribute parquet~~ [python] Fix pyarrow module missing attribute Feb 11, 2026

fix pyarrow module missing attributes

b27051c

fix lint

367043a

tonymtu force-pushed the fix-pyarrow-missing-attribute-parquet branch from fe729c7 to 367043a Compare February 11, 2026 08:17

JingsongLi merged commit a1a75f5 into apache:master Feb 12, 2026
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[python] Fix pyarrow module missing attribute#7258

[python] Fix pyarrow module missing attribute#7258
JingsongLi merged 3 commits intoapache:masterfrom
tonymtu:fix-pyarrow-missing-attribute-parquet

tonymtu commented Feb 11, 2026

Uh oh!

XiaoHongbo-Hope commented Feb 11, 2026

Uh oh!

tonymtu commented Feb 11, 2026

Uh oh!

XiaoHongbo-Hope commented Feb 11, 2026

Uh oh!

tonymtu commented Feb 11, 2026

Uh oh!

JingsongLi commented Feb 11, 2026

Uh oh!

tonymtu commented Feb 12, 2026 •

edited

Loading

Uh oh!

JingsongLi commented Feb 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

tonymtu commented Feb 11, 2026

Purpose

Tests

API and Format

Documentation

Generative AI tooling

Uh oh!

XiaoHongbo-Hope commented Feb 11, 2026

Uh oh!

tonymtu commented Feb 11, 2026

Uh oh!

XiaoHongbo-Hope commented Feb 11, 2026

Uh oh!

tonymtu commented Feb 11, 2026

Uh oh!

JingsongLi commented Feb 11, 2026

Uh oh!

tonymtu commented Feb 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JingsongLi commented Feb 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

tonymtu commented Feb 12, 2026 •

edited

Loading