Skip to content

Conversation

@kennedynguyen1
Copy link

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

zanmato1984 and others added 30 commits June 2, 2025 15:11
…ent Clang (#46509)

### Rationale for this change

A warning is introduced in Clang 19.1.0 and AppleClang 17.0.0 that won't be demoted by `-Wno-error`, causing opentelemetry-cpp build failure.

### What changes are included in this PR?

Upgrade opentelemetry-cpp (and opentelemetry-proto correspondingly) to the most recent version, which has addressed this issue in open-telemetry/opentelemetry-cpp#3133.

With this upgrade, we found several false-positives in sanitizers. The reason seems to be that we build bundled third-party dependencies as static libraries and don't instrument their code. This is known to cause false-positives as per https://github.com/google/sanitizers/wiki/threadsanitizercppmanual#non-instrumented-code . So some suppressions and disablements are also made in this PR.

### Are these changes tested?

Manually build pass.

### Are there any user-facing changes?

None.
* GitHub Issue: #46508

Authored-by: Rossi Sun <zanmato1984@gmail.com>
Signed-off-by: Rossi Sun <zanmato1984@gmail.com>
### Rationale for this change

We don't want to depend on external CI such as ursacomputing/crossbow with Crossbow as much as possible for easy to maintain.

### What changes are included in this PR?

Use `CI: Extra` label to run C++ tests on Alpine Linux in apache/arrow.

### Are these changes tested?

Yes.

### Are there any user-facing changes?

No.
* GitHub Issue: #46665

Authored-by: Sutou Kouhei <kou@clear-code.com>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
### Rationale for this change

We want to migrate to pre-commit from `archery lint`.

### What changes are included in this PR?

Use pre-commit for styler against `r/`.

This also fixes styles of exiting files.

### Are these changes tested?

Yes.

### Are there any user-facing changes?

No.
* GitHub Issue: #46645

Lead-authored-by: Sutou Kouhei <kou@clear-code.com>
Co-authored-by: Sutou Kouhei <kou@cozmixng.org>
Co-authored-by: Jonathan Keane <jkeane@gmail.com>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
…rapper (#46680)

### Rationale for this change

By using the wrapdb entry for gflags, we can use a Meson-native solution for wrapping that project without requiring CMake

### What changes are included in this PR?

Switched to using the WrapDB entry for gflags

### Are these changes tested?

Yes

### Are there any user-facing changes?

No
* GitHub Issue: #46679

Authored-by: Will Ayd <william.ayd@icloud.com>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
### Rationale for this change

This fixes an issue with the Meson configuration on Windows

### What changes are included in this PR?

Fix up incorrect variable name usage

### Are these changes tested?

Yes

### Are there any user-facing changes?

No

* GitHub Issue: #46684

Authored-by: Will Ayd <william.ayd@icloud.com>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
### What changes are included in this PR?

The `<ciso646>` inclusion is deprecated in C++20 and doesn't actually serve a purpose, remove it.

### Are these changes tested?

Compilation  passes on CI.

### Are there any user-facing changes?

No.

Authored-by: Antoine Pitrou <antoine@python.org>
Signed-off-by: Antoine Pitrou <antoine@python.org>
### Rationale for this change

C++ implementation support filter while performing hash join, however, it didn't expose to python and I think it's good to have this, so other users can avoid additional filter op explicitly in their side. 

### What changes are included in this PR?

Support filter expression in python binding.

### Are these changes tested?

Yes, added new test `test_hash_join_with_filter`.

### Are there any user-facing changes?

It will expose one more argument for user, i.e., filter_expression for `Table.join` and `Datastet.join`

* GitHub Issue: #46572

Lead-authored-by: Xingyu Long <xingyulong97@gmail.com>
Co-authored-by: Rossi Sun <zanmato1984@gmail.com>
Signed-off-by: AlenkaF <frim.alenka@gmail.com>
### Rationale for this change

Arrow C++ slices arrays by bumping the top-level `offset` value.
However, Arrow Rust slices list arrays by slicing the `value_offsets`
buffer. When receiving a Rust Arrow Array in C++ (via the C data
interface), its IPC serialization fails to notice that the
`value_offsets` buffer needed to be updated, but it still updates the
`values` buffer.  This leads to a corrupt array on deserialization, with
an `value_offsets` buffer that points past the end of the values array.

This PR fixes the IPC serialization by also looking at value_offset(0) to
determine whether the `value_offsets` buffer needs reconstructing,
instead of only looking at offset().
This works because value_offset(int) is the offets buffer, shifted by the top-level offset.
We still need to check for offset(), to account for array starting with an empty list (multiple
zeroes at the start of the offsets buffer).

### What changes are included in this PR?

The fix and nothing else

### Are these changes tested?

Yes

### Are there any user-facing changes?

No (well, unless they are affected by the bug)

**This PR contains a "Critical Fix".** (the changes fix (b) a bug that caused incorrect or invalid data to be produced) : valid operations on valid data produce invalid data.

* GitHub Issue: #46407

Lead-authored-by: Bruno Cauet <brunocauet@gmail.com>
Co-authored-by: Bruno Cauet <bruno.cauet@qube-rt.com>
Co-authored-by: Antoine Pitrou <antoine@python.org>
Signed-off-by: Antoine Pitrou <antoine@python.org>
…19 to windows-2022 (#46694)

### Rationale for this change

GitHub is deprecating windows-2019 hosted runners, see:
- actions/runner-images#12045

### What changes are included in this PR?

Update our images to use windows-2022 and Visual Studio 17 2022 (where necessary)

### Are these changes tested?

Yes via CI.

### Are there any user-facing changes?
No

* GitHub Issue: #46693

Authored-by: Raúl Cumplido <raulcumplido@gmail.com>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
### Rationale for this change

This PR fixes the bug introduced in #46527.

If the environment variable `INSTALL_ARGS` doesn't set,
the R command should execute,
`CMD INSTALL arrow*tar.gz`. 

After #46527 change, It executed  
`CMD INSTALL '' arrow*tar.gz`

### What changes are included in this PR?

Split `INSTALL_ARGS` to `R_INSTALL_ARGS` as array and use `"${R_INSTALL_ARGS[@]}"`.

### Are these changes tested?

Yes.

### Are there any user-facing changes?

No.
* GitHub Issue: #46673

Lead-authored-by: Hiroyuki Sato <hiroysato@gmail.com>
Co-authored-by: Sutou Kouhei <kou@cozmixng.org>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
### Rationale for this change

There is a typo.

FYI: This is not a critical because this typo exists in code that is only executed in an error case.

### What changes are included in this PR?

Fix a typo.

### Are these changes tested?

Yes.

### Are there any user-facing changes?

Yes.
* GitHub Issue: #46688

Authored-by: Sutou Kouhei <kou@clear-code.com>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
…l.h (#46695)

### Rationale for this change

Addresses #46180 (comment). One of three PRs to resolve #46439.

### What changes are included in this PR?

- Remove unneeded namespace prefix in test_util_internal.h.

### Are these changes tested?

Yes. Impacted tests still pass.

### Are there any user-facing changes?

No.
* GitHub Issue: #46439

Authored-by: Bryce Mecum <petridish@gmail.com>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
…6697)

### Rationale for this change

In #45908 we moved the Converter class in from_string.cc to the internal namespace to avoid a symbol clash with the Converter class defined in arrow/util. It's better to keep the class in an anonymous namespace since it's internal to the file. This reverts the previous change and just renames the class.

### What changes are included in this PR?

- Removed namespace, use anonymous namespace instead like before
- Renamed from Converter to JSONConverter

### Are these changes tested?

Yes.

### Are there any user-facing changes?

No. These changes are internal-only.
* GitHub Issue: #46439

Authored-by: Bryce Mecum <petridish@gmail.com>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
…46618)

### Rationale for this change

When we added Float16 we did not update pyarrow to be able to convert from Python objects to Arrow. Float16 required numpy and it crashed if numpy was not present.

### What changes are included in this PR?

Allow to not require numpy to generate float16 scalars and arrays on pyarrow and do not fail if numpy is not present.

### Are these changes tested?

Yes, new tests have been added

### Are there any user-facing changes?

No changes for old functionality. Users will be allowed to use float16 without requiring to use np.float16 and directly from Python objects

* GitHub Issue: #46611

Lead-authored-by: Raúl Cumplido <raulcumplido@gmail.com>
Co-authored-by: Antoine Pitrou <pitrou@free.fr>
Signed-off-by: Raúl Cumplido <raulcumplido@gmail.com>
### Rationale for this change

We moved `js/` in apache/arrow to apache/arrow-js.

### What changes are included in this PR?

Remove `js/` in apache/arrow.

### Are these changes tested?

Yes.

### Are there any user-facing changes?

No.
* GitHub Issue: #46702

Authored-by: Sutou Kouhei <kou@clear-code.com>
Signed-off-by: Raúl Cumplido <raulcumplido@gmail.com>
…#46653)

### Rationale for this change

The docstrings for row_group_size could be clearer both in terms of (1) whether the value is rows instead of byte size and (2) use of unit prefixes. See #46652. 

My idea here was that just saying "64 * 1024 * 1024" is probably more easily understood than using Mi (mebi). The existing text may be just fine so I'm happy to close this if others like how it reads now.

### What changes are included in this PR?

- Updated language in docstrings for row_group_size
- Add missing `, default None` to docstring for top-level `write_table`

### Are these changes tested?

No.

### Are there any user-facing changes?

No.
* GitHub Issue: #46652

Authored-by: Bryce Mecum <petridish@gmail.com>
Signed-off-by: AlenkaF <frim.alenka@gmail.com>
…argeList directly (#46678)

### Rationale for this change

When reading a Parquet LIST logical type (or a repeated field without a logical type), Parquet C++ automatically reads it as a Arrow List array.

However, this can in some cases run into the 32-bit offsets limit. We'd like to be able to choose to read as LargeList instead, even if there is no serialized Arrow schema in the Parquet file.

### What changes are included in this PR?

* Add a Parquet read option `list_type` to select which Arrow type to read LIST / repeated Parquet columns into
* Fix an index truncation bug when writing a huge single chunk of data to Parquet

### Are these changes tested?

Yes, the functionality is tested. However, I wasn't able to write a unit test that wouldn't consume a horrendous amount of time or  memory writing/reading a list with offsets larger than 2**32.

### Are there any user-facing changes?

No, only an API improvement.

* GitHub Issue: #46676

Authored-by: Antoine Pitrou <antoine@python.org>
Signed-off-by: Antoine Pitrou <antoine@python.org>
…g view storage (#46660)

### Rationale for this change

Extension arrays exported with binary view/string view storage did not export the variadic sizes buffer which resulted in crashes when reimporting.

### What changes are included in this PR?

The expression that controlled whether the variadic sizes buffer was written was updated.

### Are these changes tested?

Yes, a test was added

### Are there any user-facing changes?

No
* GitHub Issue: #46659

Lead-authored-by: Dewey Dunnington <dewey@wherobots.com>
Co-authored-by: Dewey Dunnington <dewey@fishandwhistle.net>
Co-authored-by: Benjamin Kietzman <bengilgit@gmail.com>
Signed-off-by: Dewey Dunnington <dewey@wherobots.com>
…6696)

### Rationale for this change

#45908 brought these helpers into the public API but didn't consider changes to their API. This PR makes all the helpers use the standard Result-pattern to make them more ergonomic. We can do this now without a breaking change because this and #45908 will be part of Arrow 21.

### What changes are included in this PR?

- Refactored all FromJSONString helpers to use the Result pattern (instead of using outparams)

### Are these changes tested?

Yes.

### Are there any user-facing changes?

No.
* GitHub Issue: #46439

Lead-authored-by: Bryce Mecum <petridish@gmail.com>
Co-authored-by: Sutou Kouhei <kou@cozmixng.org>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
…sh (#46700)

### Rationale for this change

`ci/scripts/cpp_test.sh` violates two shellcheck rules.

* SC2071: `< is for string comparisons. Use -lt instead.`
* SC2086: `Double quote to prevent globbing and word splitting.`

```
./ci/scripts/cpp_test.sh

In ./ci/scripts/cpp_test.sh line 22:
if [[ $# < 2 ]]; then
         ^-- SC2071 (error): < is for string comparisons. Use -lt instead.

In ./ci/scripts/cpp_test.sh line 87:
pushd ${build_dir}
      ^----------^ SC2086 (info): Double quote to prevent globbing and word splitting.

Did you mean:
pushd "${build_dir}"

In ./ci/scripts/cpp_test.sh line 103:
    --parallel ${n_jobs} \
               ^-------^ SC2086 (info): Double quote to prevent globbing and word splitting.

Did you mean:
    --parallel "${n_jobs}" \

In ./ci/scripts/cpp_test.sh line 105:
    --timeout ${ARROW_CTEST_TIMEOUT:-300} \
              ^-------------------------^ SC2086 (info): Double quote to prevent globbing and word splitting.

Did you mean:
    --timeout "${ARROW_CTEST_TIMEOUT:-300}" \

In ./ci/scripts/cpp_test.sh line 111:
    examples=$(find ${binary_output_dir} -executable -name "*example")
                    ^------------------^ SC2086 (info): Double quote to prevent globbing and word splitting.

Did you mean:
    examples=$(find "${binary_output_dir}" -executable -name "*example")

In ./ci/scripts/cpp_test.sh line 129:
    ${binary_output_dir}/arrow-ipc-stream-fuzz ${ARROW_TEST_DATA}/arrow-ipc-stream/crash-*
    ^------------------^ SC2086 (info): Double quote to prevent globbing and word splitting.
                                               ^----------------^ SC2086 (info): Double quote to prevent globbing and word splitting.

Did you mean:
    "${binary_output_dir}"/arrow-ipc-stream-fuzz "${ARROW_TEST_DATA}"/arrow-ipc-stream/crash-*

In ./ci/scripts/cpp_test.sh line 130:
    ${binary_output_dir}/arrow-ipc-stream-fuzz ${ARROW_TEST_DATA}/arrow-ipc-stream/*-testcase-*
    ^------------------^ SC2086 (info): Double quote to prevent globbing and word splitting.
                                               ^----------------^ SC2086 (info): Double quote to prevent globbing and word splitting.

Did you mean:
    "${binary_output_dir}"/arrow-ipc-stream-fuzz "${ARROW_TEST_DATA}"/arrow-ipc-stream/*-testcase-*

In ./ci/scripts/cpp_test.sh line 131:
    ${binary_output_dir}/arrow-ipc-file-fuzz ${ARROW_TEST_DATA}/arrow-ipc-file/*-testcase-*
    ^------------------^ SC2086 (info): Double quote to prevent globbing and word splitting.
                                             ^----------------^ SC2086 (info): Double quote to prevent globbing and word splitting.

Did you mean:
    "${binary_output_dir}"/arrow-ipc-file-fuzz "${ARROW_TEST_DATA}"/arrow-ipc-file/*-testcase-*

In ./ci/scripts/cpp_test.sh line 132:
    ${binary_output_dir}/arrow-ipc-tensor-stream-fuzz ${ARROW_TEST_DATA}/arrow-ipc-tensor-stream/*-testcase-*
    ^------------------^ SC2086 (info): Double quote to prevent globbing and word splitting.
                                                      ^----------------^ SC2086 (info): Double quote to prevent globbing and word splitting.

Did you mean:
    "${binary_output_dir}"/arrow-ipc-tensor-stream-fuzz "${ARROW_TEST_DATA}"/arrow-ipc-tensor-stream/*-testcase-*

In ./ci/scripts/cpp_test.sh line 134:
      ${binary_output_dir}/parquet-arrow-fuzz ${ARROW_TEST_DATA}/parquet/fuzzing/*-testcase-*
      ^------------------^ SC2086 (info): Double quote to prevent globbing and word splitting.
                                              ^----------------^ SC2086 (info): Double quote to prevent globbing and word splitting.

Did you mean:
      "${binary_output_dir}"/parquet-arrow-fuzz "${ARROW_TEST_DATA}"/parquet/fuzzing/*-testcase-*

For more information:
  https://www.shellcheck.net/wiki/SC2071 -- < is for string comparisons. Use ...
  https://www.shellcheck.net/wiki/SC2086 -- Double quote to prevent globbing ...
```

### What changes are included in this PR?

* Use `-lt` instead of `<`
* Quote variables.

### Are these changes tested?

Yes.

### Are there any user-facing changes?

No.
* GitHub Issue: #46699

Authored-by: Hiroyuki Sato <hiroysato@gmail.com>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
… to match newest auditwheel naming (#46705)

### Rationale for this change

The new version of auditwheel has added improvements to detect libc / platform on the wheels:
- pypa/auditwheel#548

This has updated the ordering of the platform tags for some of the generated wheels. For the case of our manylinux_2014 and libc 2.17 but only for Python 3.13 amd64 and 3.13t arm64. The rest are using the old order.

### What changes are included in this PR?

Force the newest version and update to new order of platform tags.

### Are these changes tested?

Via archery.

### Are there any user-facing changes?

No

* GitHub Issue: #46691

Authored-by: Raúl Cumplido <raulcumplido@gmail.com>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
### Rationale for this change

Asks users to specify language when using the kapa.ai bot

### What changes are included in this PR?

Update user instructions

### Are these changes tested?

Nah but I'll build the docs here to look

### Are there any user-facing changes?

Yeah, the instructions

Authored-by: Nic Crane <thisisnic@gmail.com>
Signed-off-by: Nic Crane <thisisnic@gmail.com>
…nment in `case_when()` (#46667)

### Rationale for this change

When a script is called in an environment that isn't the global environment (for instance with `source("my-script.R", local = new.env())`, `case_when()` would fail to detect external objects used in conditions.

This PR fixes this behavior.

Fixes #46636 

### What changes are included in this PR?

When evaluating expressions in `dplyr` functions, `eval_tidy()` now takes into account `mask` as an environment where it should look for external objects.

@ thisisnic suggested in #46636 that the bug might be due to https://github.com/apache/arrow/blob/main/r/R/dplyr-funcs-conditional.R#L116 but I couldn't find a way to fix it there.

### Are these changes tested?

I added a test for this scenario. I ensured it failed before the change and succeeds after.

### Are there any user-facing changes?

There is one user-facing, non-breaking change, illustrated both in the related issue and in the new test.

* GitHub Issue: #46636

Authored-by: etiennebacher <etienne.bacher@protonmail.com>
Signed-off-by: Nic Crane <thisisnic@gmail.com>
…tion (#46722)

### Rationale for this change

pkgdown generation was failing due to a function not being included in the list of functions to document

### What changes are included in this PR?

Update roxygen header to not generate that function or trigger the pkgdown check

### Are these changes tested?

Nah, but I'll trigger CI to check

### Are there any user-facing changes?

No
* GitHub Issue: #46717

Authored-by: Nic Crane <thisisnic@gmail.com>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
…hReader (#46731)

### Rationale for this change

Our docs say you can construct a Dataset from a RecordBatchReader but you can't. While we can't pass the actual RecordBatchReader to the Dataset as a source (AFAIK), we can at least consume the reader immediately and create an InMemoryDataset from the batches.

### What changes are included in this PR?

- Tweaked type checks so this now works (both from ds.dataset and ds.InMemoryDataset)
- Test case extended to cover the new behavior
- Tweaked error message just to use proper case

### Are these changes tested?

Yes.

### Are there any user-facing changes?

No.
* GitHub Issue: #46729

Authored-by: Bryce Mecum <petridish@gmail.com>
Signed-off-by: Bryce Mecum <petridish@gmail.com>
### Rationale for this change

Continues building out support for Meson as a build system generator

### What changes are included in this PR?

This adds the flight directory to the Meson configuration

### Are these changes tested?

Locally

### Are there any user-facing changes?

No

* GitHub Issue: #46141

Authored-by: Will Ayd <william.ayd@icloud.com>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
…::ArrayStatistics::Equals() (#46422)

### Rationale for this change
`arrow::ArrayStatistics::Equals` does not handle double values for `ArrayStatistics::ValueType` correctly

### What changes are included in this PR?
Add `arrow::EqualOptions` to `arrow::ArrayStatistics::Eqauls()`
Add `arrow::ArrayStatisticsEqauls()`
Add `EqualOptions::use_atol_`
Add `EqualOptions::use_atol()`
Add `EqualOptions::use_atol(bool v)`
### Are these changes tested?
Yes, I ran the relevant unit tests.
### Are there any user-facing changes?
Yes.
Add `arrow::ArrayStatisticsEqauls()`
Add `EqualOptions::use_atol()`
Add `EqualOptions::use_atol(bool v)`

* GitHub Issue: #46395

Authored-by: Arash Andishgar <arashandishgar1@gmail.com>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
### Rationale for this change

Historically, we've been lax about selecting which APIs are public. A lot of internal APIs are exposed publicly.

### What changes are included in this PR?

Make some headers in `arrow/util` internal. They won't be installed and so won't be available for third-party usage.

Note that this represents a subset of all internal APIs in `arrow/util`, as some of them are included in other public headers.

### Are these changes tested?

Yes, by existing CI configurations.

### Are there any user-facing changes?

Unless the user was relying on internal APIs, there should not be any change.
* GitHub Issue: #46459

Lead-authored-by: Antoine Pitrou <antoine@python.org>
Co-authored-by: Sutou Kouhei <kou@clear-code.com>
Signed-off-by: Antoine Pitrou <antoine@python.org>
### Rationale for this change
Arrow deals with secrets like encryption / decryption keys which must be kept private. One way of leaking such secrets is through memory allocation where another process allocates memory that previously hold the secret, because that memory was not cleared before being freed.

### What changes are included in this PR?
Uses various implementations of securely clearing memory, notably
- `SecureZeroMemory`(Windows)
- `memset_s`(STDC)
- `OPENSSL_cleanse` (OpenSSL >= 3)
- `explicit_bzero`(glibc 2.25+)
- volatile `memset` (fallback).

### Are these changes tested?
Unit tests.

### Are there any user-facing changes?
This only adds the `SecureString` class and tests. Using this new infrastructure is done in follow-up pull requests.

* GitHub Issue: #31603

Lead-authored-by: Enrico Minack <github@enrico.minack.dev>
Co-authored-by: Antoine Pitrou <antoine@python.org>
Co-authored-by: Antoine Pitrou <pitrou@free.fr>
Signed-off-by: Antoine Pitrou <antoine@python.org>
### Rationale for this change

PR #46408 included a typo that changed list-view IPC tests to use the same data as list tests. This was detected as a duplicate corpus file by the OSS-Fuzz CI build.

### What changes are included in this PR?

Undo mistake that led to using the same test data for lists and list-views. Also fix a regression in the CUDA tests, due to reading non-CPU memory when fetching the first offset in a list/binary array.

### Are these changes tested?

Yes.

### Are there any user-facing changes?

No.

* GitHub Issue: #46704

Authored-by: Antoine Pitrou <antoine@python.org>
Signed-off-by: Antoine Pitrou <antoine@python.org>
sgilmore10 and others added 10 commits August 7, 2025 09:51
…ay` class (#47264)

### Rationale for this change

This is a followup to #38422. Now that `NumNulls` is a property on `arrow.array.Array`, we should add `NumNulls` as a property on `arrow.array.ChunkedArray`.

### What changes are included in this PR?

Added `NumNulls` as a property to `arrow.array.ChunkedArray`.

**Example**:
```matlab
>> a1 = arrow.array(1:10);
>> a2 = arrow.array([11 12 NaN 14 NaN]);
>> a3 = arrow.array([16 17 NaN 18 19]);
>> a4 = arrow.array(20:30);

>> C1 = arrow.array.ChunkedArray.fromArrays(a1, a2, a3)

C1 = 

  ChunkedArray with properties:

           Type: [1×1 arrow.type.Float64Type]
      NumChunks: 4
    NumElements: 31
       NumNulls: 3

>> C1.NumNulls

ans =

  int64

   3
```

### Are these changes tested?

1. Added a `NumNullsNoSetter` test case to `tChunkedArray.m`.
2. Updated `tChunkedArray/verifyChunkedArray` helper method to verify the `NumNulls` property value is set to the expected value.

### Are there any user-facing changes?

Yes. `arrow.array.ChunkedArray` has a new public property called `NumNulls`.

* GitHub Issue: #47263

Authored-by: Sarah Gilmore <sgilmore@mathworks.com>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
### Rationale for this change

We can't use `Time` to refer `::Time` in `Arrow` namespace because there is `Arrow::Time`.

### What changes are included in this PR?

Use `::Time` to refer the top-level `Time`.

### Are these changes tested?

Yes.

### Are there any user-facing changes?

Yes.
* GitHub Issue: #47265

Authored-by: Sutou Kouhei <kou@clear-code.com>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
### Rationale for this change

R 4.0 jobs are failing because purrr 1.1.0 requires R 4.1 or later

### What changes are included in this PR?

Drop 4.0 support

### Are these changes tested?

Will fire off the CI

### Are there any user-facing changes?

Only if you're on R 4.0
* GitHub Issue: #47096

Authored-by: Nic Crane <thisisnic@gmail.com>
Signed-off-by: Nic Crane <thisisnic@gmail.com>
### Rationale for this change

We want to add support for C++23.

### What changes are included in this PR?

Add a CI job for C++23 as an extra job but its failure is ignored for now. Because we aren't C++23 ready for now.

### Are these changes tested?

Yes.

### Are there any user-facing changes?

No.
* GitHub Issue: #47208

Lead-authored-by: Sutou Kouhei <kou@clear-code.com>
Co-authored-by: Sutou Kouhei <kou@cozmixng.org>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
### Rationale for this change

Typo.

### What changes are included in this PR?

`if` -> `is`

### Are these changes tested?

No need.

### Are there any user-facing changes?

None.

Authored-by: Rossi Sun <zanmato1984@gmail.com>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
### Rationale for this change
Please see Issue #45382 

### What changes are included in this PR?
Add support for pandas' attributes in metadata when writing to or reading from .parquet

### Are these changes tested?
Yes, though the current implementation depends on pandas' which has similar functionality

### Are there any user-facing changes?
Pandas will no longer need to work around the metadata handling on their side

* GitHub Issue: #45382

Authored-by: Bogdan Romenskii <rmnsk@seznam.cz>
Signed-off-by: Rok Mihevc <rok@mihevc.org>
…lity with old compiler (#47299)

### Rationale for this change

The `r-binary-packages` job (specifically [C++ Binary Linux OpenSSL 1.0](https://github.com/ursacomputing/crossbow/actions/runs/16314662786/job/46077638701#logs) and 1.1) has been failing since 15th July with this error:

```
/arrow/cpp/src/parquet/encryption/encryption.h:175:62: error: use of deleted function 'arrow::util::SecureString::SecureString()'
         : column_path_(std::move(path)), encrypted_(encrypted) {}
                                                              ^
In file included from /arrow/cpp/src/parquet/encryption/encryption.h:26,
                 from /arrow/cpp/src/parquet/properties.h:30,
                 from /arrow/cpp/src/parquet/arrow/path_internal.cc:109:
```

### What changes are included in this PR?

Remove the unconditional `noexcept` from `SecureString`’s default constructor so its exception specification now matches `std::string`’s on all libstdc++ versions.

### Are these changes tested?

No but I'll run the failing CI job

### Are there any user-facing changes?

No
* GitHub Issue: #47277

Authored-by: Nic Crane <thisisnic@gmail.com>
Signed-off-by: Nic Crane <thisisnic@gmail.com>
…ild_emscripten.sh (#47290)

### Rationale for this change

* SC1090: Can't follow non-constant source. Use a directive to specify location
* SC2086: Double quote to prevent globbing and word splitting

```
In ci/scripts/python_build_emscripten.sh line 26:
source ~/emsdk/emsdk_env.sh
       ^------------------^ SC1090 (warning): ShellCheck can't follow non-constant source. Use a directive to specify location.

In ci/scripts/python_build_emscripten.sh line 31:
rm -rf ${python_build_dir}
       ^-----------------^ SC2086 (info): Double quote to prevent globbing and word splitting.

Did you mean:
rm -rf "${python_build_dir}"

In ci/scripts/python_build_emscripten.sh line 32:
cp -aL ${source_dir} ${python_build_dir}
       ^-----------^ SC2086 (info): Double quote to prevent globbing and word splitting.
                     ^-----------------^ SC2086 (info): Double quote to prevent globbing and word splitting.

Did you mean:
cp -aL "${source_dir}" "${python_build_dir}"

In ci/scripts/python_build_emscripten.sh line 38:
pushd ${python_build_dir}
      ^-----------------^ SC2086 (info): Double quote to prevent globbing and word splitting.

Did you mean:
pushd "${python_build_dir}"

For more information:
  https://www.shellcheck.net/wiki/SC1090 -- ShellCheck can't follow non-const...
  https://www.shellcheck.net/wiki/SC2086 -- Double quote to prevent globbing ...
```  

### What changes are included in this PR?

* SC1090: disable source file check.
* SC2086: Quote variables.

### Are these changes tested?

Yes.

### Are there any user-facing changes?

No.
* GitHub Issue: #47289

Authored-by: Hiroyuki Sato <hiroysato@gmail.com>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
### Rationale for this change

We may need to build bundled Apache Thrift. It requires CMake 3.26 or later.

### What changes are included in this PR?

Require CMake 3.26 or later.

### Are these changes tested?

Yes.

### Are there any user-facing changes?

Yes.
* GitHub Issue: #47213

Authored-by: Sutou Kouhei <kou@clear-code.com>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
### Rationale for this change

Both of

https://github.com/apache/arrow/blob/97c9bfcdf8a9b864414fb5457a1c3f7a5747a3f1/cpp/src/arrow/CMakeLists.txt#L845-L866

and

https://github.com/apache/arrow/blob/97c9bfcdf8a9b864414fb5457a1c3f7a5747a3f1/cpp/src/arrow/compute/CMakeLists.txt#L22-L25

install `arrow-compute.pc`.

We don't need to install `arrow-compute.pc` multiple times.

### What changes are included in this PR?

Remove `arrow_add_pkg_config("arrow-compute")`.

### Are these changes tested?

Yes.

### Are there any user-facing changes?

No.
* GitHub Issue: #47303

Authored-by: Sutou Kouhei <kou@clear-code.com>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
BwL1289 pushed a commit that referenced this pull request Aug 11, 2025
…n timezone (apache#45051)

### Rationale for this change

If the timezone database is present on the system, but does not contain a timezone referenced in a ORC file, the ORC reader will crash with an uncaught C++ exception.

This can happen for example on Ubuntu 24.04 where some timezone aliases have been removed from the main `tzdata` package to a `tzdata-legacy` package. If `tzdata-legacy` is not installed, trying to read a ORC file that references e.g. the "US/Pacific" timezone would crash.

Here is a backtrace excerpt:
```
apache#12 0x00007f1a3ce23a55 in std::terminate() () from /lib/x86_64-linux-gnu/libstdc++.so.6
apache#13 0x00007f1a3ce39391 in __cxa_throw () from /lib/x86_64-linux-gnu/libstdc++.so.6
apache#14 0x00007f1a3f4accc4 in orc::loadTZDB(std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) ()
   from /tmp/arrow-HEAD.ArqTs/venv-wheel-3.12-manylinux_2_17_x86_64.manylinux2014_x86_64/lib/python3.12/site-packages/pyarrow/libarrow.so.1900
apache#15 0x00007f1a3f4ad392 in std::call_once<orc::LazyTimezone::getImpl() const::{lambda()#1}>(std::once_flag&, orc::LazyTimezone::getImpl() const::{lambda()#1}&&)::{lambda()#2}::_FUN() () from /tmp/arrow-HEAD.ArqTs/venv-wheel-3.12-manylinux_2_17_x86_64.manylinux2014_x86_64/lib/python3.12/site-packages/pyarrow/libarrow.so.1900
apache#16 0x00007f1a4298bec3 in __pthread_once_slow (once_control=0xa5ca7c8, init_routine=0x7f1a3ce69420 <__once_proxy>) at ./nptl/pthread_once.c:116
apache#17 0x00007f1a3f4a9ad0 in orc::LazyTimezone::getEpoch() const ()
   from /tmp/arrow-HEAD.ArqTs/venv-wheel-3.12-manylinux_2_17_x86_64.manylinux2014_x86_64/lib/python3.12/site-packages/pyarrow/libarrow.so.1900
apache#18 0x00007f1a3f4e76b1 in orc::TimestampColumnReader::TimestampColumnReader(orc::Type const&, orc::StripeStreams&, bool) ()
   from /tmp/arrow-HEAD.ArqTs/venv-wheel-3.12-manylinux_2_17_x86_64.manylinux2014_x86_64/lib/python3.12/site-packages/pyarrow/libarrow.so.1900
apache#19 0x00007f1a3f4e84ad in orc::buildReader(orc::Type const&, orc::StripeStreams&, bool, bool, bool) ()
   from /tmp/arrow-HEAD.ArqTs/venv-wheel-3.12-manylinux_2_17_x86_64.manylinux2014_x86_64/lib/python3.12/site-packages/pyarrow/libarrow.so.1900
apache#20 0x00007f1a3f4e8dd7 in orc::StructColumnReader::StructColumnReader(orc::Type const&, orc::StripeStreams&, bool, bool) ()
   from /tmp/arrow-HEAD.ArqTs/venv-wheel-3.12-manylinux_2_17_x86_64.manylinux2014_x86_64/lib/python3.12/site-packages/pyarrow/libarrow.so.1900
apache#21 0x00007f1a3f4e8532 in orc::buildReader(orc::Type const&, orc::StripeStreams&, bool, bool, bool) ()
   from /tmp/arrow-HEAD.ArqTs/venv-wheel-3.12-manylinux_2_17_x86_64.manylinux2014_x86_64/lib/python3.12/site-packages/pyarrow/libarrow.so.1900
apache#22 0x00007f1a3f4925e9 in orc::RowReaderImpl::startNextStripe() ()
   from /tmp/arrow-HEAD.ArqTs/venv-wheel-3.12-manylinux_2_17_x86_64.manylinux2014_x86_64/lib/python3.12/site-packages/pyarrow/libarrow.so.1900
apache#23 0x00007f1a3f492c9d in orc::RowReaderImpl::next(orc::ColumnVectorBatch&) ()
   from /tmp/arrow-HEAD.ArqTs/venv-wheel-3.12-manylinux_2_17_x86_64.manylinux2014_x86_64/lib/python3.12/site-packages/pyarrow/libarrow.so.1900
apache#24 0x00007f1a3e6b251f in arrow::adapters::orc::ORCFileReader::Impl::ReadBatch(orc::RowReaderOptions const&, std::shared_ptr<arrow::Schema> const&, long) ()
   from /tmp/arrow-HEAD.ArqTs/venv-wheel-3.12-manylinux_2_17_x86_64.manylinux2014_x86_64/lib/python3.12/site-packages/pyarrow/libarrow.so.1900
```

### What changes are included in this PR?

Catch C++ exceptions when iterating ORC batches instead of letting them slip through.

### Are these changes tested?

Yes.

### Are there any user-facing changes?

No.
* GitHub Issue: apache#40633

Authored-by: Antoine Pitrou <antoine@python.org>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
WillAyd and others added 2 commits August 12, 2025 10:55
…47298)

### Rationale for this change

The meson configuration is missing some symbols for flight that appear to come from the testing library. 

### What changes are included in this PR?

Updates to the Meson configuration

### Are these changes tested?

Yes

### Are there any user-facing changes?

No
* GitHub Issue: #47283

Authored-by: Will Ayd <william.ayd@icloud.com>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
Bumps [actions/download-artifact](https://github.com/actions/download-artifact) from 4.3.0 to 5.0.0.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a href="https://github.com/actions/download-artifact/releases">actions/download-artifact's releases</a>.</em></p>
<blockquote>
<h2>v5.0.0</h2>
<h2>What's Changed</h2>
<ul>
<li>Update README.md by <a href="https://github.com/nebuk89"><code>@​nebuk89</code></a> in <a href="https://redirect.github.com/actions/download-artifact/pull/407">actions/download-artifact#407</a></li>
<li>BREAKING fix: inconsistent path behavior for single artifact downloads by ID by <a href="https://github.com/GrantBirki"><code>@​GrantBirki</code></a> in <a href="https://redirect.github.com/actions/download-artifact/pull/416">actions/download-artifact#416</a></li>
</ul>
<h2>v5.0.0</h2>
<h3>🚨 Breaking Change</h3>
<p>This release fixes an inconsistency in path behavior for single artifact downloads by ID. <strong>If you're downloading single artifacts by ID, the output path may change.</strong></p>
<h4>What Changed</h4>
<p>Previously, <strong>single artifact downloads</strong> behaved differently depending on how you specified the artifact:</p>
<ul>
<li><strong>By name</strong>: <code>name: my-artifact</code> → extracted to <code>path/</code> (direct)</li>
<li><strong>By ID</strong>: <code>artifact-ids: 12345</code> → extracted to <code>path/my-artifact/</code> (nested)</li>
</ul>
<p>Now both methods are consistent:</p>
<ul>
<li><strong>By name</strong>: <code>name: my-artifact</code> → extracted to <code>path/</code> (unchanged)</li>
<li><strong>By ID</strong>: <code>artifact-ids: 12345</code> → extracted to <code>path/</code> (fixed - now direct)</li>
</ul>
<h4>Migration Guide</h4>
<h5>✅ No Action Needed If:</h5>
<ul>
<li>You download artifacts by <strong>name</strong></li>
<li>You download <strong>multiple</strong> artifacts by ID</li>
<li>You already use <code>merge-multiple: true</code> as a workaround</li>
</ul>
<h5>⚠️ Action Required If:</h5>
<p>You download <strong>single artifacts by ID</strong> and your workflows expect the nested directory structure.</p>
<p><strong>Before v5 (nested structure):</strong></p>
<pre lang="yaml"><code>- uses: actions/download-artifact@ v4
  with:
    artifact-ids: 12345
    path: dist
# Files were in: dist/my-artifact/
</code></pre>
<blockquote>
<p>Where <code>my-artifact</code> is the name of the artifact you previously uploaded</p>
</blockquote>
<p><strong>To maintain old behavior (if needed):</strong></p>
<pre lang="yaml"><code>&lt;/tr&gt;&lt;/table&gt; 
</code></pre>
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a href="https://github.com/actions/download-artifact/commit/634f93cb2916e3fdff6788551b99b062d0335ce0"><code>634f93c</code></a> Merge pull request <a href="https://redirect.github.com/actions/download-artifact/issues/416">#416</a> from actions/single-artifact-id-download-path</li>
<li><a href="https://github.com/actions/download-artifact/commit/b19ff4302770b82aa4694b63703b547756dacce6"><code>b19ff43</code></a> refactor: resolve download path correctly in artifact download tests (mainly ...</li>
<li><a href="https://github.com/actions/download-artifact/commit/e262cbee4ab8c473c61c59a81ad8e9dc760e90db"><code>e262cbe</code></a> bundle dist</li>
<li><a href="https://github.com/actions/download-artifact/commit/bff23f9308ceb2f06d673043ea6311519be6a87b"><code>bff23f9</code></a> update docs</li>
<li><a href="https://github.com/actions/download-artifact/commit/fff8c148a8fdd56aa81fcb019f0b5f6c65700c4d"><code>fff8c14</code></a> fix download path logic when downloading a single artifact by id</li>
<li><a href="https://github.com/actions/download-artifact/commit/448e3f862ab3ef47aa50ff917776823c9946035b"><code>448e3f8</code></a> Merge pull request <a href="https://redirect.github.com/actions/download-artifact/issues/407">#407</a> from actions/nebuk89-patch-1</li>
<li><a href="https://github.com/actions/download-artifact/commit/47225c44b359a5155efdbbbc352041b3e249fb1b"><code>47225c4</code></a> Update README.md</li>
<li>See full diff in <a href="https://github.com/actions/download-artifact/compare/v4.3.0...v5.0.0">compare view</a></li>
</ul>
</details>
<br />

[![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=actions/download-artifact&package-manager=github_actions&previous-version=4.3.0&new-version=5.0.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@ dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@ dependabot rebase` will rebase this PR
- `@ dependabot recreate` will recreate this PR, overwriting any edits that have been made to it
- `@ dependabot merge` will merge this PR after your CI passes on it
- `@ dependabot squash and merge` will squash and merge this PR after your CI passes on it
- `@ dependabot cancel merge` will cancel a previously requested merge and block automerging
- `@ dependabot reopen` will reopen this PR if it is closed
- `@ dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
- `@ dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency
- `@ dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
- `@ dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
- `@ dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

</details>

Authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
dependabot bot and others added 12 commits August 12, 2025 10:58
Bumps [actions/checkout](https://github.com/actions/checkout) from 4 to 5.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a href="https://github.com/actions/checkout/releases">actions/checkout's releases</a>.</em></p>
<blockquote>
<h2>v5.0.0</h2>
<h2>What's Changed</h2>
<ul>
<li>Update actions checkout to use node 24 by <a href="https://github.com/salmanmkc"><code>@​salmanmkc</code></a> in <a href="https://redirect.github.com/actions/checkout/pull/2226">actions/checkout#2226</a></li>
<li>Prepare v5.0.0 release by <a href="https://github.com/salmanmkc"><code>@​salmanmkc</code></a> in <a href="https://redirect.github.com/actions/checkout/pull/2238">actions/checkout#2238</a></li>
</ul>
<h2>⚠️ Minimum Compatible Runner Version</h2>
<p><strong>v2.327.1</strong><br />
<a href="https://github.com/actions/runner/releases/tag/v2.327.1">Release Notes</a></p>
<p>Make sure your runner is updated to this version or newer to use this release.</p>
<p><strong>Full Changelog</strong>: <a href="https://github.com/actions/checkout/compare/v4...v5.0.0">https://github.com/actions/checkout/compare/v4...v5.0.0</a></p>
<h2>v4.3.0</h2>
<h2>What's Changed</h2>
<ul>
<li>docs: update README.md by <a href="https://github.com/motss"><code>@​motss</code></a> in <a href="https://redirect.github.com/actions/checkout/pull/1971">actions/checkout#1971</a></li>
<li>Add internal repos for checking out multiple repositories by <a href="https://github.com/mouismail"><code>@​mouismail</code></a> in <a href="https://redirect.github.com/actions/checkout/pull/1977">actions/checkout#1977</a></li>
<li>Documentation update - add recommended permissions to Readme by <a href="https://github.com/benwells"><code>@​benwells</code></a> in <a href="https://redirect.github.com/actions/checkout/pull/2043">actions/checkout#2043</a></li>
<li>Adjust positioning of user email note and permissions heading by <a href="https://github.com/joshmgross"><code>@​joshmgross</code></a> in <a href="https://redirect.github.com/actions/checkout/pull/2044">actions/checkout#2044</a></li>
<li>Update README.md by <a href="https://github.com/nebuk89"><code>@​nebuk89</code></a> in <a href="https://redirect.github.com/actions/checkout/pull/2194">actions/checkout#2194</a></li>
<li>Update CODEOWNERS for actions by <a href="https://github.com/TingluoHuang"><code>@​TingluoHuang</code></a> in <a href="https://redirect.github.com/actions/checkout/pull/2224">actions/checkout#2224</a></li>
<li>Update package dependencies by <a href="https://github.com/salmanmkc"><code>@​salmanmkc</code></a> in <a href="https://redirect.github.com/actions/checkout/pull/2236">actions/checkout#2236</a></li>
<li>Prepare release v4.3.0 by <a href="https://github.com/salmanmkc"><code>@​salmanmkc</code></a> in <a href="https://redirect.github.com/actions/checkout/pull/2237">actions/checkout#2237</a></li>
</ul>
<h2>New Contributors</h2>
<ul>
<li><a href="https://github.com/motss"><code>@​motss</code></a> made their first contribution in <a href="https://redirect.github.com/actions/checkout/pull/1971">actions/checkout#1971</a></li>
<li><a href="https://github.com/mouismail"><code>@​mouismail</code></a> made their first contribution in <a href="https://redirect.github.com/actions/checkout/pull/1977">actions/checkout#1977</a></li>
<li><a href="https://github.com/benwells"><code>@​benwells</code></a> made their first contribution in <a href="https://redirect.github.com/actions/checkout/pull/2043">actions/checkout#2043</a></li>
<li><a href="https://github.com/nebuk89"><code>@​nebuk89</code></a> made their first contribution in <a href="https://redirect.github.com/actions/checkout/pull/2194">actions/checkout#2194</a></li>
<li><a href="https://github.com/salmanmkc"><code>@​salmanmkc</code></a> made their first contribution in <a href="https://redirect.github.com/actions/checkout/pull/2236">actions/checkout#2236</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a href="https://github.com/actions/checkout/compare/v4...v4.3.0">https://github.com/actions/checkout/compare/v4...v4.3.0</a></p>
<h2>v4.2.2</h2>
<h2>What's Changed</h2>
<ul>
<li><code>url-helper.ts</code> now leverages well-known environment variables by <a href="https://github.com/jww3"><code>@​jww3</code></a> in <a href="https://redirect.github.com/actions/checkout/pull/1941">actions/checkout#1941</a></li>
<li>Expand unit test coverage for <code>isGhes</code> by <a href="https://github.com/jww3"><code>@​jww3</code></a> in <a href="https://redirect.github.com/actions/checkout/pull/1946">actions/checkout#1946</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a href="https://github.com/actions/checkout/compare/v4.2.1...v4.2.2">https://github.com/actions/checkout/compare/v4.2.1...v4.2.2</a></p>
<h2>v4.2.1</h2>
<h2>What's Changed</h2>
<ul>
<li>Check out other refs/* by commit if provided, fall back to ref by <a href="https://github.com/orhantoy"><code>@​orhantoy</code></a> in <a href="https://redirect.github.com/actions/checkout/pull/1924">actions/checkout#1924</a></li>
</ul>
<h2>New Contributors</h2>
<ul>
<li><a href="https://github.com/Jcambass"><code>@​Jcambass</code></a> made their first contribution in <a href="https://redirect.github.com/actions/checkout/pull/1919">actions/checkout#1919</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a href="https://github.com/actions/checkout/compare/v4.2.0...v4.2.1">https://github.com/actions/checkout/compare/v4.2.0...v4.2.1</a></p>

</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a href="https://github.com/actions/checkout/commit/08c6903cd8c0fde910a37f88322edcfb5dd907a8"><code>08c6903</code></a> Prepare v5.0.0 release (<a href="https://redirect.github.com/actions/checkout/issues/2238">#2238</a>)</li>
<li><a href="https://github.com/actions/checkout/commit/9f265659d3bb64ab1440b03b12f4d47a24320917"><code>9f26565</code></a> Update actions checkout to use node 24 (<a href="https://redirect.github.com/actions/checkout/issues/2226">#2226</a>)</li>
<li>See full diff in <a href="https://github.com/actions/checkout/compare/v4...v5">compare view</a></li>
</ul>
</details>
<br />

[![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=actions/checkout&package-manager=github_actions&previous-version=4&new-version=5)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@ dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@ dependabot rebase` will rebase this PR
- `@ dependabot recreate` will recreate this PR, overwriting any edits that have been made to it
- `@ dependabot merge` will merge this PR after your CI passes on it
- `@ dependabot squash and merge` will squash and merge this PR after your CI passes on it
- `@ dependabot cancel merge` will cancel a previously requested merge and block automerging
- `@ dependabot reopen` will reopen this PR if it is closed
- `@ dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
- `@ dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency
- `@ dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
- `@ dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
- `@ dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

</details>

Authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
### Rationale for this change

We can remove a patch by updating to aws-c-common 0.12.4.

### What changes are included in this PR?

Update to 0.12.4 and remove a patch.

### Are these changes tested?

Yes.

### Are there any user-facing changes?

No.
* GitHub Issue: #47291

Authored-by: Sutou Kouhei <kou@clear-code.com>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
### Rationale for this change

CRAN reports GNU variadic macro warnings.

### What changes are included in this PR?

Don't use GNU variadic macro extension.

### Are these changes tested?

Yes.

```bash
archery docker run \
  -e CC=clang \
  -e CXX=clang++ \
  -e Thrift_SOURCE=BUNDLED \
  -e CXXFLAGS="-Wgnu-zero-variadic-macro-arguments -Wno-variadic-macro-arguments-omitted" \
  fedora-cpp
```

### Are there any user-facing changes?

No.
* GitHub Issue: #47205

Authored-by: Sutou Kouhei <kou@clear-code.com>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
### Rationale for this change

Alpine Linux 3.18 is currently deprecated.

### What changes are included in this PR?

Update version of Alpine Linux.

### Are these changes tested?

Via CI

* GitHub Issue: #47052

Lead-authored-by: Raúl Cumplido <raulcumplido@gmail.com>
Co-authored-by: Sutou Kouhei <kou@clear-code.com>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
…ild.sh (#47307)

### Rationale for this change

This is the sub issue #44748.

* SC1091: Not following: ./bin/activate: openBinaryFile: does not exist
* SC2034: foo appears unused
* SC2086: Double quote to prevent globbing and word splitting
* SC2223: This default assignment may cause DoS due to globbing. Quote it.
* SC2236: Use `-n` instead of `! -z`

```
shellcheck ci/scripts/python_build.sh

In ci/scripts/python_build.sh line 28:
: ${BUILD_DOCS_PYTHON:=OFF}
  ^-----------------------^ SC2223 (info): This default assignment may cause DoS due to globbing. Quote it.

In ci/scripts/python_build.sh line 31:
  git config --global --add safe.directory ${arrow_dir}
                                           ^----------^ SC2086 (info): Double quote to prevent globbing and word splitting.

Did you mean:
  git config --global --add safe.directory "${arrow_dir}"

In ci/scripts/python_build.sh line 35:
  . "${ARROW_PYTHON_VENV}/bin/activate"
    ^-- SC1091 (info): Not following: ./bin/activate: openBinaryFile: does not exist (No such file or directory)

In ci/scripts/python_build.sh line 53:
if [ ! -z "${CONDA_PREFIX}" ]; then
     ^-- SC2236 (style): Use -n instead of ! -z.

In ci/scripts/python_build.sh line 77:
: ${CMAKE_PREFIX_PATH:=${ARROW_HOME}}
  ^-- SC2223 (info): This default assignment may cause DoS due to globbing. Quote it.

In ci/scripts/python_build.sh line 85:
rm -rf ${python_build_dir}
       ^-----------------^ SC2086 (info): Double quote to prevent globbing and word splitting.

Did you mean:
rm -rf "${python_build_dir}"

In ci/scripts/python_build.sh line 86:
cp -aL ${source_dir} ${python_build_dir}
       ^-----------^ SC2086 (info): Double quote to prevent globbing and word splitting.
                     ^-----------------^ SC2086 (info): Double quote to prevent globbing and word splitting.

Did you mean:
cp -aL "${source_dir}" "${python_build_dir}"

In ci/scripts/python_build.sh line 87:
pushd ${python_build_dir}
      ^-----------------^ SC2086 (info): Double quote to prevent globbing and word splitting.

Did you mean:
pushd "${python_build_dir}"

In ci/scripts/python_build.sh line 101:
  rm -rf ${python_build_dir}/docs/source
         ^-----------------^ SC2086 (info): Double quote to prevent globbing and word splitting.

Did you mean:
  rm -rf "${python_build_dir}"/docs/source

In ci/scripts/python_build.sh line 102:
  mkdir -p ${python_build_dir}/docs
           ^-----------------^ SC2086 (info): Double quote to prevent globbing and word splitting.

Did you mean:
  mkdir -p "${python_build_dir}"/docs

In ci/scripts/python_build.sh line 103:
  cp -a ${arrow_dir}/docs/source ${python_build_dir}/docs/
        ^----------^ SC2086 (info): Double quote to prevent globbing and word splitting.
                                 ^-----------------^ SC2086 (info): Double quote to prevent globbing and word splitting.

Did you mean:
  cp -a "${arrow_dir}"/docs/source "${python_build_dir}"/docs/

In ci/scripts/python_build.sh line 104:
  rm -rf ${python_build_dir}/format
         ^-----------------^ SC2086 (info): Double quote to prevent globbing and word splitting.

Did you mean:
  rm -rf "${python_build_dir}"/format

In ci/scripts/python_build.sh line 105:
  cp -a ${arrow_dir}/format ${python_build_dir}/
        ^----------^ SC2086 (info): Double quote to prevent globbing and word splitting.
                            ^-----------------^ SC2086 (info): Double quote to prevent globbing and word splitting.

Did you mean:
  cp -a "${arrow_dir}"/format "${python_build_dir}"/

In ci/scripts/python_build.sh line 106:
  rm -rf ${python_build_dir}/cpp/examples
         ^-----------------^ SC2086 (info): Double quote to prevent globbing and word splitting.

Did you mean:
  rm -rf "${python_build_dir}"/cpp/examples

In ci/scripts/python_build.sh line 107:
  mkdir -p ${python_build_dir}/cpp
           ^-----------------^ SC2086 (info): Double quote to prevent globbing and word splitting.

Did you mean:
  mkdir -p "${python_build_dir}"/cpp

In ci/scripts/python_build.sh line 108:
  cp -a ${arrow_dir}/cpp/examples ${python_build_dir}/cpp/
        ^----------^ SC2086 (info): Double quote to prevent globbing and word splitting.
                                  ^-----------------^ SC2086 (info): Double quote to prevent globbing and word splitting.

Did you mean:
  cp -a "${arrow_dir}"/cpp/examples "${python_build_dir}"/cpp/

In ci/scripts/python_build.sh line 109:
  rm -rf ${python_build_dir}/ci
         ^-----------------^ SC2086 (info): Double quote to prevent globbing and word splitting.

Did you mean:
  rm -rf "${python_build_dir}"/ci

In ci/scripts/python_build.sh line 110:
  cp -a ${arrow_dir}/ci/ ${python_build_dir}/
        ^----------^ SC2086 (info): Double quote to prevent globbing and word splitting.
                         ^-----------------^ SC2086 (info): Double quote to prevent globbing and word splitting.

Did you mean:
  cp -a "${arrow_dir}"/ci/ "${python_build_dir}"/

In ci/scripts/python_build.sh line 111:
  ncpus=$(python -c "import os; print(os.cpu_count())")
  ^---^ SC2034 (warning): ncpus appears unused. Verify use (or export if used externally).

In ci/scripts/python_build.sh line 113:
  pushd ${build_dir}
        ^----------^ SC2086 (info): Double quote to prevent globbing and word splitting.

Did you mean:
  pushd "${build_dir}"

In ci/scripts/python_build.sh line 116:
    ${python_build_dir}/docs/source \
    ^-----------------^ SC2086 (info): Double quote to prevent globbing and word splitting.

Did you mean:
    "${python_build_dir}"/docs/source \

In ci/scripts/python_build.sh line 117:
    ${build_dir}/docs
    ^----------^ SC2086 (info): Double quote to prevent globbing and word splitting.

Did you mean:
    "${build_dir}"/docs

For more information:
  https://www.shellcheck.net/wiki/SC2034 -- ncpus appears unused. Verify use ...
  https://www.shellcheck.net/wiki/SC1091 -- Not following: ./bin/activate: op...
  https://www.shellcheck.net/wiki/SC2086 -- Double quote to prevent globbing ...
```

### What changes are included in this PR?

* SC1091: Skip file check
* SC2034: remove the variable
* SC2086: Quote variables
* SC2223: Quote variables
* SC2236: Use `-n` instead of `! -z`

### Are these changes tested?

Yes.

### Are there any user-facing changes?

No.
* GitHub Issue: #47306

Authored-by: Hiroyuki Sato <hiroysato@gmail.com>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
### Rationale for this change

In, #47311 the dependabot bumped actions/checkout version to v5.0.0. But the dependabot didn't change some version comment parts.

PR request 

```diff
-        uses: actions/checkout@ 3df4ab11eba7bda6032a0b82a6bb43b11571feac # v4.0.0
+        uses: actions/checkout@ 08c6903cd8c0fde910a37f88322edcfb5dd907a8 # v4.0.0
```

Correct 

```diff
-        uses: actions/checkout@ 3df4ab11eba7bda6032a0b82a6bb43b11571feac # v4.0.0
+        uses: actions/checkout@ 08c6903cd8c0fde910a37f88322edcfb5dd907a8 # v5.0.0
```

### What changes are included in this PR?

Write correct version number `v5.0.0`.

### Are these changes tested?

Yes.

### Are there any user-facing changes?

No.
* GitHub Issue: #47319

Authored-by: Hiroyuki Sato <hiroysato@gmail.com>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
### Rationale for this change

`gandiva::Cache` requires pointer type for value type.

### What changes are included in this PR?

Use `std::shared_ptr<std::string>` not `std::string` for `ValueType`.

### Are these changes tested?

Yes.

### Are there any user-facing changes?

No.
* GitHub Issue: #47317

Authored-by: Sutou Kouhei <kou@clear-code.com>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
… JNI on macOS (#47305)

### Rationale for this change

Static building for JNI build on macOS is failing in apache/arrow-java. We should avoid this in apache/arrow.

See also: apache/arrow-java#799

### What changes are included in this PR?

* Add a CI job for JNI build on macOS
* Fix build problems for bundled AWS SDK for C++

### Are these changes tested?

Yes.

### Are there any user-facing changes?

Yes.
* GitHub Issue: #47222

Authored-by: Sutou Kouhei <kou@clear-code.com>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
### Rationale for this change

Testing with [Numba-CUDA](https://github.com/NVIDIA/numba-cuda) which uses the [NVIDIA CUDA Python bindings](https://github.com/NVIDIA/cuda-python) by default identified that PyArrow Numba interop has an incompatibility with Numba / Numba-CUDA using the NVIDIA bindings. See Issue #47128.

### What changes are included in this PR?

The fix is to get device pointer values from their `device_pointer_value` property, which is consistent across the ctypes and NVIDIA bindings in Numba.

I also attempted to update the CI config to install Numba-CUDA. I think some of the comments in `docker-compose.yml` were a bit out of sync with changes to it, so I also updated comments that appeared to be relevant to reflect what I had to run locally. I could have got the CI changes all wrong - happy to change these, as they're not really the critical part of this PR.

Fixes #47128.

### Are these changes tested?

Yes, by the existing `test_cuda_numba_interop.py` and the CI changes in this PR.

### Are there any user-facing changes?

No.
* GitHub Issue: #47128

Authored-by: Graham Markall <gmarkall@nvidia.com>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
… call become invalid before wrapping results (#47333)

### Rationale for this change

An obvious wrong use of `std::move()`.

### What changes are included in this PR?

Remove it.

### Are these changes tested?

Yes, by extending the existing test.

### Are there any user-facing changes?

None.
* GitHub Issue: #47332

Authored-by: Rossi Sun <zanmato1984@gmail.com>
Signed-off-by: Rossi Sun <zanmato1984@gmail.com>
### Rationale for this change
The `Rat` check was changed in #46541, but didn't update the corresponding `.gitignore`.

### What changes are included in this PR?
Explicitly ignore `apache-arrow.tar.gz` so that someone who need to fix licence headers doesn't accidentally commit the tar file.

### Are these changes tested?
Locally yes.

### Are there any user-facing changes?
No.

* GitHub Issue: #47143

Authored-by: Patrick J. Roddy <patrickjamesroddy@gmail.com>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
### Rationale for this change

Update NEWS for 21.0.0.1 release

### What changes are included in this PR?

Update NEWS for 21.0.0.1 release

### Are these changes tested?

No

### Are there any user-facing changes?

No

Lead-authored-by: Nic Crane <thisisnic@gmail.com>
Co-authored-by: Jonathan Keane <jkeane@gmail.com>
Signed-off-by: Nic Crane <thisisnic@gmail.com>
@BwL1289 BwL1289 closed this Aug 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.