POC: datafusion-cli instrumented object store #17266

BlakeOrth · 2025-08-20T21:45:36Z

Which issue does this PR close?

N/A - POC to recieve initial implementation feedback related to

[datafusion-cli] Add a way to see what object store requests are made #17207

Rationale for this change

A WIP/POC of instrumenting the object store backing datafusion-cli operations

What changes are included in this PR?

Initial implementation instrumenting basic object store calls
Initial implementation incorporating the instrumented object store into datafusion-cli

Are these changes tested?

No - This is a POC

cc @alamb

- A WIP/POC of instrumenting the object store backing datafusion-cli operations

alamb · 2025-08-22T18:52:13Z

Thank you @BlakeOrth -- I tried this out and it looks very nice 👌

> CREATE EXTERNAL TABLE hits
STORED AS PARQUET
LOCATION 'https://datasets.clickhouse.com/hits_compatible/athena_partitioned/hits_1.parquet';

0 row(s) fetched.
Elapsed 9.274 seconds.

2025-08-22T18:48:50.357439+00:00 operation=Get duration=0.618644s path=hits_compatible/athena_partitioned/hits_1.parquet
2025-08-22T18:48:50.977619+00:00 operation=Get duration=8.439665s path=hits_compatible/athena_partitioned/hits_1.parquet
2025-08-22T18:48:59.417902+00:00 operation=Get duration=0.060293s path=hits_compatible/athena_partitioned/hits_1.parquet
Total duration: 9.118602
>

I have some ideas on how to make it configurable and easy to see. BRB

BlakeOrth · 2025-08-22T20:17:30Z

@alamb Awesome, I'm looking forward to hearing your feedback.

Just so you can understand where my head was at for this (currently very rough) implementation in terms of configuration:

I thought it would be beneficial to be able to entirely disable object store profiling and use the default, non-instrumented, implementation by default. Collecting profiling information takes time, additional allocations etc. and in many cases it may not be desirable to incur the (likely quite small, but non-zero) overhead from the profiling. I was planning on adding a CLI flag to enable the instrumented store.
I haven't added any of the code around it yet, but I was going to the UX from Add memory profiling support to DataFusion CLI and memory pool metrics #17021 as you suggested in the issue for this ticket and allow users to apply either \object_store_profiling summary or \object_store_profiling trace. The first would print just a simple summary of calls, similar to my suggestion in the initial issue, and the latter would print both the summary and the individual call details as we see here.

Any feedback on the above, or general code structure/implementation is obviously welcome as well, so let me know your thoughts. This code is obviously a minimally functional example, but I'd rather incorporate feedback early than need to re-work a bunch of stuff!

alamb · 2025-08-22T21:33:33Z

I thought it would be beneficial to be able to entirely disable object store profiling and use the default, non-instrumented, implementation by default.

Yes, I agree with this

I was planning on adding a CLI flag to enable the instrumented store.

I think it would be ok to always use the "Instrumented Object REgistry" and then only pass back an instrumented object store if the profiling was enabled

I haven't added any of the code around it yet, but I was going to the UX from [

That is perfect!

\object_store_profiling summary or \object_store_profiling trace.
The first would print just a simple summary of calls, similar to my suggestion in the initial issue, and the latter would print both the summary and the individual call details as we see here.

how about something like this (maybe you can implement one of these, perhaps summary, as the intial PR and we can add the others afterwards)

# Prints an object store summary at the end of all results, after the query executed
\profile objectstore summary

# prints one line per object store request at the end of all results, after the query executed 
\profile objectstore trace

# prints  one line per object store request when it happens (potentially in the middle of results)
# this can be useful to understand how the requests happen over time
\profile objectstore inline

Any feedback on the above, or general code structure/implementation is obviously welcome as well, so let me know your thoughts. This code is obviously a minimally functional example, but I'd rather incorporate feedback early than need to re-work a bunch of stuff!

Indeed -- makes total sense

I

alamb · 2025-08-22T21:34:39Z

I have some ideas on how to make it configurable and easy to see. BRB

In my mind I was going to have time to work on this myself, but I fear that is not likely to be the case for a while (and I will be out for the next week or so on vacation, though I will be reviewing PRs as much as possible).

If you are willing to help push this forward that would be most appreciated

BlakeOrth · 2025-08-22T22:39:28Z

@alamb

In my mind I was going to have time to work on this myself, but I fear that is not likely to be the case for a while (and I will be out for the next week or so on vacation, though I will be reviewing PRs as much as possible).

If you are willing to help push this forward that would be most appreciated

I'm happy to keep driving this effort as long as it sounds like it's moving in the right direction. My interpretation of your above comments is that we currently are moving along at least mostly the right path, so I should be able to generally make progress. (And do your best to enjoy your vacation! The code will be here when you return)

I just pushed some changes the implement the summary output. I get the feeling we're about to learn quite a lot...

DataFusion CLI v49.0.1
> CREATE EXTERNAL TABLE nyc_taxi_rides
STORED AS PARQUET LOCATION 's3://altinity-clickhouse-data/nyc_taxi_rides/data/tripdata_parquet/';
0 row(s) fetched.
Elapsed 2.587 seconds.

List Summary:
  count: 1
Get Summary:
  count: 288
  duration min: 0.058361s
  duration max: 0.374491s
  duration avg: 0.122724s
  size min: 8
  size max: 44247
  size avg: 18870
  size sum: 5434702
List Summary:
  count: 1
> select count(*) from 's3://altinity-clickhouse-data/nyc_taxi_rides/data/tripdata_parquet/' where vendor_id='CMT';
+-----------+
| count(*)  |
+-----------+
| 505603754 |
+-----------+
1 row(s) fetched.
Elapsed 56.057 seconds.

Get Summary:
  count: 1126
  duration min: 0.062342s
  duration max: 1.831455s
  duration avg: 0.397414s
  size min: 47
  size max: 112
  size avg: 69
  size sum: 78422
List Summary:
  count: 4

(caveat: this is a debug build, not release-nonlto, but interesting data nonetheless)

I think it would be ok to always use the "Instrumented Object REgistry" and then only pass back an instrumented object store if the profiling was enabled

I think this is an interesting thought and I'll take some time to explore this as a solution. Are you thinking the InstrumentedObjectStoreRegistry would just pass back the provided inner object store when profiling is disabled? This seems a bit more complicated to me than either using the instrumented registry or not depending on the CLI flag. Can you help me understand the benefit you think it brings?

alamb · 2025-08-23T09:43:15Z

I just pushed some changes the implement the summary output. I get the feeling we're about to learn quite a lot...

Yes indeed -- I think this will be super valuable

Are you thinking the InstrumentedObjectStoreRegistry would just pass back the provided inner object store when profiling is disabled? This seems a bit more complicated to me than either using the instrumented registry or not depending on the CLI flag. Can you help me understand the benefit you think it brings?

Yes that is what I was thinking

I agree the implementation would be more complicated, but it would be easier to use (people wouldn't have to run datafusion-cli with a flag and then also have to enable/disable profiling)

- Introduces object_store_profiling as a command and cli argument, with options of disabled, summary, and trace - Cleans up hacked in println output in favor of carrying the necessary objects in print_options and using the expected Writer for output

BlakeOrth · 2025-08-28T18:40:24Z

datafusion-cli/src/print_options.rs

    pub maxrows: MaxRows,
    pub color: bool,
+    pub object_store_profile_mode: InstrumentedObjectStoreMode,
+    pub instrumented_registry: Arc<InstrumentedObjectStoreRegistry>,


I don't love carrying this as a public member of this struct, but I didn't want to go making too many unnecessary changes without some additional reviewer input. I feel like PrintOptions might be to the point where it needs a builder or perhaps a dedicated new() method to better encapsulate its behavior.

Yes, I agree it is time to encapsulate it

BlakeOrth · 2025-08-28T18:43:33Z

datafusion-cli/src/object_storage.rs

    Ok(store)
 }

+pub mod instrumented {


I've put this into its own module because with this addition I feel like object_storage has probably grown past the point of being a single file. It seems like it should probably become a directory with 2 sub-modules, but this change set is already getting large enough I didn't want to go making large organizational changes prior to review.

Yes, I agree that putting the instrumented object store into object_storage/instrumented.rs seems like a good idea

BlakeOrth · 2025-08-28T18:54:11Z

@alamb I've made some changes that I think get the functional side of the code where it needs to be and pushed those for review. This is still lacking tests and docs, but I thought it would be good to more or less have the architecture and functional code settled before wrapping everything up and marking it as ready for review.

Here's a little demo of the output from the current code:

$ ./datafusion-cli --object-store-profiling summary

DataFusion CLI v49.0.1
> CREATE EXTERNAL TABLE nyc_taxi_rides
STORED AS PARQUET LOCATION 's3://altinity-clickhouse-data/nyc_taxi_rides/data/tripdata_parquet/';
0 row(s) fetched.
Elapsed 2.573 seconds.

Object Store Profiling
List Summary:
  count: 1

Get Summary:
  count: 288
  duration min: 0.059328s
  duration max: 0.714468s
  duration avg: 0.128311s
  size min: 8 B
  size max: 44247 B
  size avg: 18870 B
  size sum: 5434702 B

List Summary:
  count: 1

> \object_store_profiling trace
ObjectStore Profile mode set to Trace
> select count(*) from 's3://altinity-clickhouse-data/nyc_taxi_rides/data/tripdata_parquet/';
+------------+
| count(*)   |
+------------+
| 1310903963 |
+------------+
1 row(s) fetched.
Elapsed 0.577 seconds.

Object Store Profiling
2025-08-28T18:49:22.066425183+00:00 operation=List path=nyc_taxi_rides/data/tripdata_parquet
2025-08-28T18:49:22.303706449+00:00 operation=List path=nyc_taxi_rides/data/tripdata_parquet
2025-08-28T18:49:22.419878784+00:00 operation=List path=nyc_taxi_rides/data/tripdata_parquet
2025-08-28T18:49:22.493157995+00:00 operation=List path=nyc_taxi_rides/data/tripdata_parquet
List Summary:
  count: 4

> \object_store_profiling disabled
ObjectStore Profile mode set to Disabled
> select count(*) from 's3://altinity-clickhouse-data/nyc_taxi_rides/data/tripdata_parquet/' where vendor_id='CMT';
+-----------+
| count(*)  |
+-----------+
| 505603754 |
+-----------+
1 row(s) fetched.
Elapsed 45.531 seconds.

>

alamb · 2025-09-05T20:55:21Z

Thanks @BlakeOrth -- I am back and catching up on reviews. I will review this one shortly

…ject_store

alamb

Thank you @BlakeOrth -- I think this is looking good and we should proceed with implementation. Now that I am back, I am ready/willing to help do so as well.

Also, FYI @nuno-faria as I think you are interested in this area too.

I view this as an important first step towards being able to implement listing file caching reasonably:

#17211

Shall we use this PR ?

I think it is possible to break this into a few smaller PRs which might be faster to review:

Add basic object store instrumentation and plumbing, but only instrument one operation (like get or list), and set the pattern
Other PRs to fill out the rest of the object store methods.

Also, BTW tried it out but it doesn't seem to be working anymore:

> \object_store_profiling trace
ObjectStore Profile mode set to Trace
> CREATE EXTERNAL TABLE nyc_taxi_rides
STORED AS PARQUET LOCATION 's3://altinity-clickhouse-data/nyc_taxi_rides/data/tripdata_parquet';
0 row(s) fetched.
Elapsed 2.000 seconds.

Object Store Profiling

> CREATE EXTERNAL TABLE nyc_taxi_rides
STORED AS PARQUET LOCATION 's3://altinity-clickhouse-data/nyc_taxi_rides/data/tripdata_parquet';
0 row(s) fetched.
Elapsed 0.688 seconds.

Object Store Profiling
>

alamb · 2025-09-06T11:28:49Z

datafusion-cli/src/print_options.rs

    pub maxrows: MaxRows,
    pub color: bool,
+    pub object_store_profile_mode: InstrumentedObjectStoreMode,
+    pub instrumented_registry: Arc<InstrumentedObjectStoreRegistry>,


Yes, I agree it is time to encapsulate it

alamb · 2025-09-06T11:34:45Z

datafusion-cli/src/object_storage.rs

+        requests: Mutex<Vec<RequestDetails>>,
+    }
+
+    impl fmt::Display for InstrumentedObjectStore {


I found it a bit confusing that Display also modified the object store -- maybe we could have an explicit take/display function 🤔

This is a really good point I hadn't even considered when writing this. Having the behavior of Display be dependent on underlying object state and when you called it is weird implicit behavior.

I'm open to implementation recommendations here. I could see keeping the method as-is but changing the method signature to something that better communicates the underlying state modification and dropping the Display impl. Another path I can see working is making the Display impl not modify the underlying state, but push the state modification into another method a user has to call in order to clear the existing entries to be displayed. I'm sure there are other paths as well. One thing I liked about the existing implementation is it's more difficult for a user of the API to accidentally accumulate object store operations from multiple queries because removing them does not require the user to call an additional method.

alamb · 2025-09-06T11:37:06Z

datafusion-cli/src/object_storage.rs

    Ok(store)
 }

+pub mod instrumented {


Yes, I agree that putting the instrumented object store into object_storage/instrumented.rs seems like a good idea

alamb · 2025-09-06T11:44:42Z

It does work when I ran it with the CLI flag:

> select * from nyc_taxi_rides limit 1;
+-------------+----+-----------+----------------------+----------------------+-----------------+---------------+------------------+-----------------+--------------+--------------------+-------------------+------------------+--------------+-------------+-------+---------+------------+--------------+-----------------------+--------------+--------------------+---------------------+-------+-------+
| pickup_date | id | vendor_id | pickup_datetime      | dropoff_datetime     | passenger_count | trip_distance | pickup_longitude | pickup_latitude | rate_code_id | store_and_fwd_flag | dropoff_longitude | dropoff_latitude | payment_type | fare_amount | extra | mta_tax | tip_amount | tolls_amount | improvement_surcharge | total_amount | pickup_location_id | dropoff_location_id | junk1 | junk2 |
+-------------+----+-----------+----------------------+----------------------+-----------------+---------------+------------------+-----------------+--------------+--------------------+-------------------+------------------+--------------+-------------+-------+---------+------------+--------------+-----------------------+--------------+--------------------+---------------------+-------+-------+
| 2009-01-31  | 0  | VTS       | 2009-01-31T14:25:00Z | 2009-01-31T14:42:00Z | 4               | 6.12          | 0.0              | 0.0             |              |                    | 0.0               | 0.0              | CASH         | 16.5        | 0     | 0.0     | 0.0        | 0.0          | 0.0                   | 16.5         | 0                  | 0                   |       |       |
+-------------+----+-----------+----------------------+----------------------+-----------------+---------------+------------------+-----------------+--------------+--------------------+-------------------+------------------+--------------+-------------+-------+---------+------------+--------------+-----------------------+--------------+--------------------+---------------------+-------+-------+
1 row(s) fetched.
Elapsed 0.623 seconds.

Object Store Profiling
2025-09-06T11:44:04.745828+00:00 operation=List path=nyc_taxi_rides/data/tripdata_parquet
2025-09-06T11:44:04.845685+00:00 operation=Get duration=0.104646s size=1117048 range: bytes=46706370-47823417 path=nyc_taxi_rides/data/tripdata_parquet/data-200901.parquet
2025-09-06T11:44:04.845785+00:00 operation=Get duration=0.183020s size=12170750 range: bytes=50081390-62252139 path=nyc_taxi_rides/data/tripdata_parquet/data-200901.parquet
2025-09-06T11:44:04.845679+00:00 operation=Get duration=0.199387s size=1101744 range: bytes=15763528-16865271 path=nyc_taxi_rides/data/tripdata_parquet/data-200901.parquet
2025-09-06T11:44:04.846051+00:00 operation=Get duration=0.200877s size=12058769 range: bytes=34647550-46706318 path=nyc_taxi_rides/data/tripdata_parquet/data-200901.parquet
2025-09-06T11:44:04.845679+00:00 operation=Get duration=0.204372s size=1203241 range: bytes=4-1203244 path=nyc_taxi_rides/data/tripdata_parquet/data-200901.parquet
2025-09-06T11:44:04.845676+00:00 operation=Get duration=0.205906s size=1112050 range: bytes=77700284-78812333 path=nyc_taxi_rides/data/tripdata_parquet/data-200901.parquet
2025-09-06T11:44:04.845899+00:00 operation=Get duration=0.212092s size=12167360 range: bytes=81041692-93209051 path=nyc_taxi_rides/data/tripdata_parquet/data-200901.parquet
2025-09-06T11:44:04.845861+00:00 operation=Get duration=0.212435s size=12273554 range: bytes=3489923-15763476 path=nyc_taxi_rides/data/tripdata_parquet/data-200901.parquet
2025-09-06T11:44:04.845854+00:00 operation=Get duration=0.219370s size=14064310 range: bytes=194476813-208541122 path=nyc_taxi_rides/data/tripdata_parquet/data-200901.parquet
2025-09-06T11:44:04.845899+00:00 operation=Get duration=0.219971s size=14078825 range: bytes=180397934-194476758 path=nyc_taxi_rides/data/tripdata_parquet/data-200901.parquet
2025-09-06T11:44:04.845753+00:00 operation=Get duration=0.231013s size=14060573 range: bytes=166337307-180397879 path=nyc_taxi_rides/data/tripdata_parquet/data-200901.parquet
2025-09-06T11:44:04.845864+00:00 operation=Get duration=0.232125s size=14256532 range: bytes=152080721-166337252 path=nyc_taxi_rides/data/tripdata_parquet/data-200901.parquet
2025-09-06T11:44:04.846071+00:00 operation=Get duration=0.232305s size=12218068 range: bytes=19095708-31313775 path=nyc_taxi_rides/data/tripdata_parquet/data-200901.parquet
2025-09-06T11:44:04.845676+00:00 operation=Get duration=0.232844s size=1116382 range: bytes=31313827-32430208 path=nyc_taxi_rides/data/tripdata_parquet/data-200901.parquet
2025-09-06T11:44:04.845677+00:00 operation=Get duration=0.233984s size=79963 range: bytes=93209103-93289065 path=nyc_taxi_rides/data/tripdata_parquet/data-200901.parquet
2025-09-06T11:44:04.846217+00:00 operation=Get duration=0.233612s size=12176018 range: bytes=65524215-77700232 path=nyc_taxi_rides/data/tripdata_parquet/data-200901.parquet
2025-09-06T11:44:04.845676+00:00 operation=Get duration=0.235468s size=14167756 range: bytes=123907252-138075007 path=nyc_taxi_rides/data/tripdata_parquet/data-200901.parquet
2025-09-06T11:44:04.846110+00:00 operation=Get duration=0.235193s size=322523 range: bytes=94555506-94878028 path=nyc_taxi_rides/data/tripdata_parquet/data-200901.parquet
2025-09-06T11:44:04.845678+00:00 operation=Get duration=0.236193s size=746872 range: bytes=109680418-110427289 path=nyc_taxi_rides/data/tripdata_parquet/data-200901.parquet
2025-09-06T11:44:04.846358+00:00 operation=Get duration=0.235518s size=12293949 range: bytes=97386418-109680366 path=nyc_taxi_rides/data/tripdata_parquet/data-200901.parquet
2025-09-06T11:44:04.845745+00:00 operation=Get duration=0.240246s size=175763 range: bytes=208541177-208716939 path=nyc_taxi_rides/data/tripdata_parquet/data-200901.parquet
2025-09-06T11:44:04.846173+00:00 operation=Get duration=0.240324s size=12185791 range: bytes=111721410-123907200 path=nyc_taxi_rides/data/tripdata_parquet/data-200901.parquet
2025-09-06T11:44:04.845678+00:00 operation=Get duration=0.247862s size=1052092 range: bytes=62252191-63304282 path=nyc_taxi_rides/data/tripdata_parquet/data-200901.parquet
2025-09-06T11:44:04.845676+00:00 operation=Get duration=0.300687s size=14005605 range: bytes=138075062-152080666 path=nyc_taxi_rides/data/tripdata_parquet/data-200901.parquet
Get Summary:
  count: 24
  duration min: 0.104646s
  duration max: 0.300687s
  duration avg: 0.222060s
  size min: 79963 B
  size max: 14256532 B
  size avg: 7925230 B
  size sum: 190205538 B

List Summary:
  count: 1

😍

nuno-faria · 2025-09-07T12:30:48Z

This is pretty cool!

We can easily see the effects of the metadata caching:

> CREATE EXTERNAL TABLE hits
STORED AS PARQUET
LOCATION 'https://datasets.clickhouse.com/hits_compatible/athena_partitioned/hits_1.parquet';
0 row(s) fetched.
Elapsed 0.106 seconds.

Object Store Profiling
2025-09-07T12:17:04.082191800+00:00 operation=Head duration=0.056605s path=hits_compatible/athena_partitioned/hits_1.parquet
2025-09-07T12:17:04.138839400+00:00 operation=Get duration=0.019972s size=8 range: bytes=174965036-174965043 path=hits_compatible/athena_partitioned/hits_1.parquet
2025-09-07T12:17:04.158853600+00:00 operation=Get duration=0.019854s size=34322 range: bytes=174930714-174965035 path=hits_compatible/athena_partitioned/hits_1.parquet
Get Summary:
  count: 2
  duration min: 0.019854s
  duration max: 0.019972s
  duration avg: 0.019913s
  size min: 8 B
  size max: 34322 B
  size avg: 17165 B
  size sum: 34330 B

Head Summary:
  count: 1
  duration min: 0.056605s
  duration max: 0.056605s
  duration avg: 0.056605s

With the metadata cache, three requests are done:

> select * from hits limit 1;
+---------------------+------------+------------------------------------------------------------------------------+-----------+------------+-----------+-----------+------------+----------+---------------------+--------------+----+-----------+--------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------+-----------+-------------------+-----------------+---------------+-------------+-----------------+------------------+-----------------+------------+------------+-------------+----------+----------+----------------+----------------+--------------+------------------+----------+-------------+------------------+--------+-------------+----------------+----------------+--------------+-------------+-------------+-------------------+--------------------+----------------+-----------------+---------------------+---------------------+---------------------+---------------------+----------------+-------------+--------+------------+-------------+---------------------+-------------+---------+--------------+---------+-------------+---------------+----------+----------+----------------+-----+-----+--------+-----------+-----------+------------+------------+------------+---------------+-----------------+----------------+---------------+--------------+-----------+------------+-----------+---------------+---------------------+-------------------+-------------+-----------------------+------------------+------------+--------------+---------------+-----------------+---------------------+--------------------+--------------+------------------+-----------+-----------+-------------+------------+---------+---------+----------+----------------------+---------------------+------+
| WatchID             | JavaEnable | Title                                                                        | GoodEvent | EventTime  | EventDate | CounterID | ClientIP   | RegionID | UserID              | CounterClass | OS | UserAgent | URL                                                                                                                            | Referer                                              | IsRefresh | RefererCategoryID | RefererRegionID | URLCategoryID | URLRegionID | ResolutionWidth | ResolutionHeight | ResolutionDepth | FlashMajor | FlashMinor | FlashMinor2 | NetMajor | NetMinor | UserAgentMajor | UserAgentMinor | CookieEnable | JavascriptEnable | IsMobile | MobilePhone | MobilePhoneModel | Params | IPNetworkID | TraficSourceID | SearchEngineID | SearchPhrase | AdvEngineID | IsArtifical | WindowClientWidth | WindowClientHeight | ClientTimeZone | ClientEventTime | SilverlightVersion1 | SilverlightVersion2 | SilverlightVersion3 | SilverlightVersion4 | PageCharset    | CodeVersion | IsLink | IsDownload | IsNotBounce | FUniqID             | OriginalURL | HID     | IsOldCounter | IsEvent | IsParameter | DontCountHits | WithHash | HitColor | LocalEventTime | Age | Sex | Income | Interests | Robotness | RemoteIP   | WindowName | OpenerName | HistoryLength | BrowserLanguage | BrowserCountry | SocialNetwork | SocialAction | HTTPError | SendTiming | DNSTiming | ConnectTiming | ResponseStartTiming | ResponseEndTiming | FetchTiming | SocialSourceNetworkID | SocialSourcePage | ParamPrice | ParamOrderID | ParamCurrency | ParamCurrencyID | OpenstatServiceName | OpenstatCampaignID | OpenstatAdID | OpenstatSourceID | UTMSource | UTMMedium | UTMCampaign | UTMContent | UTMTerm | FromTag | HasGCLID | RefererHash          | URLHash             | CLID |
+---------------------+------------+------------------------------------------------------------------------------+-----------+------------+-----------+-----------+------------+----------+---------------------+--------------+----+-----------+--------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------+-----------+-------------------+-----------------+---------------+-------------+-----------------+------------------+-----------------+------------+------------+-------------+----------+----------+----------------+----------------+--------------+------------------+----------+-------------+------------------+--------+-------------+----------------+----------------+--------------+-------------+-------------+-------------------+--------------------+----------------+-----------------+---------------------+---------------------+---------------------+---------------------+----------------+-------------+--------+------------+-------------+---------------------+-------------+---------+--------------+---------+-------------+---------------+----------+----------+----------------+-----+-----+--------+-----------+-----------+------------+------------+------------+---------------+-----------------+----------------+---------------+--------------+-----------+------------+-----------+---------------+---------------------+-------------------+-------------+-----------------------+------------------+------------+--------------+---------------+-----------------+---------------------+--------------------+--------------+------------------+-----------+-----------+-------------+------------+---------+---------+----------+----------------------+---------------------+------+
| 8374547729199360385 | 1          | d0a2d0b5d181d1822028d0a0d0bed181d181d0b8d18f29202d20d0afd0bdd0b4d0b5d0bad181 | 1         | 1373893805 | 15901     | 62        | 1388530699 | 229      | 3217804679217022550 | 0            | 2  | 5         | 687474703a2f2f6972722e72752f696e6465782e7068703f73686f77616c62756d2f6c6f67696e2d6c656e697961373737373239342c393338333033313330 | 687474703a2f2f6b696e6f706f69736b2e72752f3f7374617465 | 0         | 10813             | 952             | 9500          | 520         | 1638            | 1658             | 37              | 15         | 7          | 373030      | 0        | 0        | 22             | 44efbfbd       | 1            | 1                | 0        | 0           |                  |        | 3830428     | -1             | 0              |              | 0           | 0           | 1654              | 936                | 135            | 1373857827      | 4                   | 1                   | 16561               | 0                   | 77696e646f7773 | 1601        | 0      | 0          | 0           | 8731137316151599477 |             | 4563091 | 0            | 0       | 0           | 0             | 0        | 35       | 1373847066     | 0   | 0   | 0      | 0         | 0         | 1547096432 | -1         | -1         | -1            | 5330            | efbfbd0c       |               |              | 0         | 0          | 0         | 190           | 987                 | 55                | 35          | 0                     |                  | 0          |              | 4e481c        | 0               |                     |                    |              |                  |           |           |             |            |         |         | 0        | -1172318462146836803 | 2868770270353813622 | 0    |
+---------------------+------------+------------------------------------------------------------------------------+-----------+------------+-----------+-----------+------------+----------+---------------------+--------------+----+-----------+--------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------+-----------+-------------------+-----------------+---------------+-------------+-----------------+------------------+-----------------+------------+------------+-------------+----------+----------+----------------+----------------+--------------+------------------+----------+-------------+------------------+--------+-------------+----------------+----------------+--------------+-------------+-------------+-------------------+--------------------+----------------+-----------------+---------------------+---------------------+---------------------+---------------------+----------------+-------------+--------+------------+-------------+---------------------+-------------+---------+--------------+---------+-------------+---------------+----------+----------+----------------+-----+-----+--------+-----------+-----------+------------+------------+------------+---------------+-----------------+----------------+---------------+--------------+-----------+------------+-----------+---------------+---------------------+-------------------+-------------+-----------------------+------------------+------------+--------------+---------------+-----------------+---------------------+--------------------+--------------+------------------+-----------+-----------+-------------+------------+---------+---------+----------+----------------------+---------------------+------+
1 row(s) fetched.
Elapsed 0.405 seconds.

Object Store Profiling
2025-09-07T12:17:16.132117800+00:00 operation=Head duration=0.023943s path=hits_compatible/athena_partitioned/hits_1.parquet
2025-09-07T12:17:16.157671300+00:00 operation=Get duration=0.028673s size=8134104 range: bytes=4-8134107 path=hits_compatible/athena_partitioned/hits_1.parquet
2025-09-07T12:17:16.157998800+00:00 operation=Get duration=0.068516s size=97589907 range: bytes=77340716-174930622 path=hits_compatible/athena_partitioned/hits_1.parquet
Get Summary:
  count: 2
  duration min: 0.028673s
  duration max: 0.068516s
  duration avg: 0.048595s
  size min: 8134104 B
  size max: 97589907 B
  size avg: 52862005 B
  size sum: 105724011 B

Head Summary:
  count: 1
  duration min: 0.023943s
  duration max: 0.023943s
  duration avg: 0.023943s

After disabling the metadata cache, a lot more requests will be done:

> set datafusion.runtime.metadata_cache_limit = '0M';
0 row(s) fetched.
Elapsed 0.000 seconds.

Object Store Profiling
> select * from hits limit 1;
+---------------------+------------+------------------------------------------------------------------------------+-----------+------------+-----------+-----------+------------+----------+---------------------+--------------+----+-----------+--------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------+-----------+-------------------+-----------------+---------------+-------------+-----------------+------------------+-----------------+------------+------------+-------------+----------+----------+----------------+----------------+--------------+------------------+----------+-------------+------------------+--------+-------------+----------------+----------------+--------------+-------------+-------------+-------------------+--------------------+----------------+-----------------+---------------------+---------------------+---------------------+---------------------+----------------+-------------+--------+------------+-------------+---------------------+-------------+---------+--------------+---------+-------------+---------------+----------+----------+----------------+-----+-----+--------+-----------+-----------+------------+------------+------------+---------------+-----------------+----------------+---------------+--------------+-----------+------------+-----------+---------------+---------------------+-------------------+-------------+-----------------------+------------------+------------+--------------+---------------+-----------------+---------------------+--------------------+--------------+------------------+-----------+-----------+-------------+------------+---------+---------+----------+----------------------+---------------------+------+
| WatchID             | JavaEnable | Title                                                                        | GoodEvent | EventTime  | EventDate | CounterID | ClientIP   | RegionID | UserID              | CounterClass | OS | UserAgent | URL                                                                                                                            | Referer                                              | IsRefresh | RefererCategoryID | RefererRegionID | URLCategoryID | URLRegionID | ResolutionWidth | ResolutionHeight | ResolutionDepth | FlashMajor | FlashMinor | FlashMinor2 | NetMajor | NetMinor | UserAgentMajor | UserAgentMinor | CookieEnable | JavascriptEnable | IsMobile | MobilePhone | MobilePhoneModel | Params | IPNetworkID | TraficSourceID | SearchEngineID | SearchPhrase | AdvEngineID | IsArtifical | WindowClientWidth | WindowClientHeight | ClientTimeZone | ClientEventTime | SilverlightVersion1 | SilverlightVersion2 | SilverlightVersion3 | SilverlightVersion4 | PageCharset    | CodeVersion | IsLink | IsDownload | IsNotBounce | FUniqID             | OriginalURL | HID     | IsOldCounter | IsEvent | IsParameter | DontCountHits | WithHash | HitColor | LocalEventTime | Age | Sex | Income | Interests | Robotness | RemoteIP   | WindowName | OpenerName | HistoryLength | BrowserLanguage | BrowserCountry | SocialNetwork | SocialAction | HTTPError | SendTiming | DNSTiming | ConnectTiming | ResponseStartTiming | ResponseEndTiming | FetchTiming | SocialSourceNetworkID | SocialSourcePage | ParamPrice | ParamOrderID | ParamCurrency | ParamCurrencyID | OpenstatServiceName | OpenstatCampaignID | OpenstatAdID | OpenstatSourceID | UTMSource | UTMMedium | UTMCampaign | UTMContent | UTMTerm | FromTag | HasGCLID | RefererHash          | URLHash             | CLID |
+---------------------+------------+------------------------------------------------------------------------------+-----------+------------+-----------+-----------+------------+----------+---------------------+--------------+----+-----------+--------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------+-----------+-------------------+-----------------+---------------+-------------+-----------------+------------------+-----------------+------------+------------+-------------+----------+----------+----------------+----------------+--------------+------------------+----------+-------------+------------------+--------+-------------+----------------+----------------+--------------+-------------+-------------+-------------------+--------------------+----------------+-----------------+---------------------+---------------------+---------------------+---------------------+----------------+-------------+--------+------------+-------------+---------------------+-------------+---------+--------------+---------+-------------+---------------+----------+----------+----------------+-----+-----+--------+-----------+-----------+------------+------------+------------+---------------+-----------------+----------------+---------------+--------------+-----------+------------+-----------+---------------+---------------------+-------------------+-------------+-----------------------+------------------+------------+--------------+---------------+-----------------+---------------------+--------------------+--------------+------------------+-----------+-----------+-------------+------------+---------+---------+----------+----------------------+---------------------+------+
| 8374547729199360385 | 1          | d0a2d0b5d181d1822028d0a0d0bed181d181d0b8d18f29202d20d0afd0bdd0b4d0b5d0bad181 | 1         | 1373893805 | 15901     | 62        | 1388530699 | 229      | 3217804679217022550 | 0            | 2  | 5         | 687474703a2f2f6972722e72752f696e6465782e7068703f73686f77616c62756d2f6c6f67696e2d6c656e697961373737373239342c393338333033313330 | 687474703a2f2f6b696e6f706f69736b2e72752f3f7374617465 | 0         | 10813             | 952             | 9500          | 520         | 1638            | 1658             | 37              | 15         | 7          | 373030      | 0        | 0        | 22             | 44efbfbd       | 1            | 1                | 0        | 0           |                  |        | 3830428     | -1             | 0              |              | 0           | 0           | 1654              | 936                | 135            | 1373857827      | 4                   | 1                   | 16561               | 0                   | 77696e646f7773 | 1601        | 0      | 0          | 0           | 8731137316151599477 |             | 4563091 | 0            | 0       | 0           | 0             | 0        | 35       | 1373847066     | 0   | 0   | 0      | 0         | 0         | 1547096432 | -1         | -1         | -1            | 5330            | efbfbd0c       |               |              | 0         | 0          | 0         | 190           | 987                 | 55                | 35          | 0                     |                  | 0          |              | 4e481c        | 0               |                     |                    |              |                  |           |           |             |            |         |         | 0        | -1172318462146836803 | 2868770270353813622 | 0    |
+---------------------+------------+------------------------------------------------------------------------------+-----------+------------+-----------+-----------+------------+----------+---------------------+--------------+----+-----------+--------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------+-----------+-------------------+-----------------+---------------+-------------+-----------------+------------------+-----------------+------------+------------+-------------+----------+----------+----------------+----------------+--------------+------------------+----------+-------------+------------------+--------+-------------+----------------+----------------+--------------+-------------+-------------+-------------------+--------------------+----------------+-----------------+---------------------+---------------------+---------------------+---------------------+----------------+-------------+--------+------------+-------------+---------------------+-------------+---------+--------------+---------+-------------+---------------+----------+----------+----------------+-----+-----+--------+-----------+-----------+------------+------------+------------+---------------+-----------------+----------------+---------------+--------------+-----------+------------+-----------+---------------+---------------------+-------------------+-------------+-----------------------+------------------+------------+--------------+---------------+-----------------+---------------------+--------------------+--------------+------------------+-----------+-----------+-------------+------------+---------+---------+----------+----------------------+---------------------+------+
1 row(s) fetched.
Elapsed 0.637 seconds.

Object Store Profiling
2025-09-07T12:17:31.352590200+00:00 operation=Head duration=0.020476s path=hits_compatible/athena_partitioned/hits_1.parquet
2025-09-07T12:17:31.373586300+00:00 operation=Get duration=0.017040s size=8 range: bytes=174965036-174965043 path=hits_compatible/athena_partitioned/hits_1.parquet
2025-09-07T12:17:31.373610700+00:00 operation=Get duration=0.036302s size=8 range: bytes=174965036-174965043 path=hits_compatible/athena_partitioned/hits_1.parquet
2025-09-07T12:17:31.373727700+00:00 operation=Get duration=0.056047s size=8 range: bytes=174965036-174965043 path=hits_compatible/athena_partitioned/hits_1.parquet
2025-09-07T12:17:31.373893500+00:00 operation=Get duration=0.056901s size=8 range: bytes=174965036-174965043 path=hits_compatible/athena_partitioned/hits_1.parquet
2025-09-07T12:17:31.373614300+00:00 operation=Get duration=0.059208s size=8 range: bytes=174965036-174965043 path=hits_compatible/athena_partitioned/hits_1.parquet
2025-09-07T12:17:31.373669600+00:00 operation=Get duration=0.061699s size=8 range: bytes=174965036-174965043 path=hits_compatible/athena_partitioned/hits_1.parquet
2025-09-07T12:17:31.373679500+00:00 operation=Get duration=0.062037s size=8 range: bytes=174965036-174965043 path=hits_compatible/athena_partitioned/hits_1.parquet
2025-09-07T12:17:31.373698900+00:00 operation=Get duration=0.064385s size=8 range: bytes=174965036-174965043 path=hits_compatible/athena_partitioned/hits_1.parquet
2025-09-07T12:17:31.373688700+00:00 operation=Get duration=0.064557s size=8 range: bytes=174965036-174965043 path=hits_compatible/athena_partitioned/hits_1.parquet
2025-09-07T12:17:31.390647700+00:00 operation=Get duration=0.050088s size=34322 range: bytes=174930714-174965035 path=hits_compatible/athena_partitioned/hits_1.parquet
2025-09-07T12:17:31.373669100+00:00 operation=Get duration=0.071315s size=8 range: bytes=174965036-174965043 path=hits_compatible/athena_partitioned/hits_1.parquet
2025-09-07T12:17:31.373625700+00:00 operation=Get duration=0.071612s size=8 range: bytes=174965036-174965043 path=hits_compatible/athena_partitioned/hits_1.parquet
2025-09-07T12:17:31.409928500+00:00 operation=Get duration=0.035328s size=34322 range: bytes=174930714-174965035 path=hits_compatible/athena_partitioned/hits_1.parquet
2025-09-07T12:17:31.430810100+00:00 operation=Get duration=0.019518s size=34322 range: bytes=174930714-174965035 path=hits_compatible/athena_partitioned/hits_1.parquet
2025-09-07T12:17:31.373877300+00:00 operation=Get duration=0.076532s size=8 range: bytes=174965036-174965043 path=hits_compatible/athena_partitioned/hits_1.parquet
2025-09-07T12:17:31.429786100+00:00 operation=Get duration=0.021806s size=34322 range: bytes=174930714-174965035 path=hits_compatible/athena_partitioned/hits_1.parquet
2025-09-07T12:17:31.432826500+00:00 operation=Get duration=0.022541s size=34322 range: bytes=174930714-174965035 path=hits_compatible/athena_partitioned/hits_1.parquet
2025-09-07T12:17:31.435378800+00:00 operation=Get duration=0.020154s size=34322 range: bytes=174930714-174965035 path=hits_compatible/athena_partitioned/hits_1.parquet
2025-09-07T12:17:31.435720300+00:00 operation=Get duration=0.021886s size=34322 range: bytes=174930714-174965035 path=hits_compatible/athena_partitioned/hits_1.parquet
2025-09-07T12:17:31.438249100+00:00 operation=Get duration=0.020374s size=34322 range: bytes=174930714-174965035 path=hits_compatible/athena_partitioned/hits_1.parquet
2025-09-07T12:17:31.438117600+00:00 operation=Get duration=0.021973s size=34322 range: bytes=174930714-174965035 path=hits_compatible/athena_partitioned/hits_1.parquet
2025-09-07T12:17:31.445244800+00:00 operation=Get duration=0.019013s size=34322 range: bytes=174930714-174965035 path=hits_compatible/athena_partitioned/hits_1.parquet
2025-09-07T12:17:31.444999300+00:00 operation=Get duration=0.020614s size=34322 range: bytes=174930714-174965035 path=hits_compatible/athena_partitioned/hits_1.parquet
2025-09-07T12:17:31.441698900+00:00 operation=Get duration=0.024848s size=8134104 range: bytes=4-8134107 path=hits_compatible/athena_partitioned/hits_1.parquet
2025-09-07T12:17:31.450412700+00:00 operation=Get duration=0.023832s size=34322 range: bytes=174930714-174965035 path=hits_compatible/athena_partitioned/hits_1.parquet
2025-09-07T12:17:31.465032100+00:00 operation=Get duration=0.021047s size=97589907 range: bytes=77340716-174930622 path=hits_compatible/athena_partitioned/hits_1.parquet
Get Summary:
  count: 26
  duration min: 0.017040s
  duration max: 0.076532s
  duration avg: 0.040025s
  size min: 8 B
  size max: 97589907 B
  size avg: 4082152 B
  size sum: 106135971 B

Head Summary:
  count: 1
  duration min: 0.020476s
  duration max: 0.020476s
  duration avg: 0.020476s

Its also cool to see that it works on the local object store, but the output appears duplicated:

> select * from t limit 1;
+---+---+
| k | v |
+---+---+
| 1 | 1 |
+---+---+
1 row(s) fetched.
Elapsed 0.002 seconds.

Object Store Profiling
2025-09-07T12:26:34.064444500+00:00 operation=Head duration=0.000192s path=17266/datafusion-cli/t.parquet
2025-09-07T12:26:34.064800300+00:00 operation=Get duration=0.000099s size=72166 range: bytes=4-72169 path=17266/datafusion-cli/t.parquet
Get Summary:
  count: 1
  duration min: 0.000099s
  duration max: 0.000099s
  duration avg: 0.000099s
  size min: 72166 B
  size max: 72166 B
  size avg: 72166 B
  size sum: 72166 B

Head Summary:
  count: 1
  duration min: 0.000192s
  duration max: 0.000192s
  duration avg: 0.000192s

2025-09-07T12:26:34.064443400+00:00 operation=Head duration=0.000194s path=17266/datafusion-cli/t.parquet
2025-09-07T12:26:34.064799400+00:00 operation=Get duration=0.000100s size=72166 range: bytes=4-72169 path=17266/datafusion-cli/t.parquet
Get Summary:
  count: 1
  duration min: 0.000100s
  duration max: 0.000100s
  duration avg: 0.000100s
  size min: 72166 B
  size max: 72166 B
  size avg: 72166 B
  size sum: 72166 B

Head Summary:
  count: 1
  duration min: 0.000194s
  duration max: 0.000194s
  duration avg: 0.000194s

BlakeOrth · 2025-09-08T17:21:01Z

@alamb Thanks for the review! I'll take a look into why it's suddenly stopped working (or perhaps it's a "works on my machine" situation, which is also never good).

I think it is possible to break this into a few smaller PRs which might be faster to review:

Add basic object store instrumentation and plumbing, but only instrument one operation (like get or list), and set the pattern

Other PRs to fill out the rest of the object store methods.

I'm happy to split and divide up the work in whatever manner you think will be best for reviews. I know that review bandwidth is almost always strained, so let me know how we can make that process the smoothest and I'm happy to facilitate as much as I can.

I will note that splitting up the actual instrumentation of the object_store methods might end up being a bit awkward because the different methods communicate somewhat different data. As an example, since list returns a stream of futures collecting a duration for it (at least in this simple instrumentation) doesn't make much sense because it's effectively instant. get, however, can be awaited in the instrumented call and as such the duration is an accurate representation of the duration of the get call. I guess my concern here is mostly that the final structure of the instrumented object store and its metadata might not make sense if the context is an individual method.

Looking at the changes here, and the comments you left, I can see two easy PRs that can be done immediately to help streamline the implementation. They'd likely be a good first code contribution if any community members are looking for a simple task to pick up!

Turn object_storage.rs into a directory and named module to prepare for object_storage/instrumented.rs to be introduced
Implement a builder for print_options so it can be better encapsulated and has better ergonomics now that additional options are being added

BlakeOrth · 2025-09-08T19:50:22Z

Its also cool to see that it works on the local object store, but the output appears duplicated:

@nuno-faria thanks for giving this a test run! Looking at this output I initially thought you were correct that it was duplicated, but on closer inspection the timestamps and duration of the operations is different between the first and the 2nd case. This leads me to believe the object_store operations may actually get executed twice for some reason. Generating the text output currently leverages a drain iterator behind a mutex on the underlying data, so getting any duplication seems unlikely (although if someone can identify how this might be possible that would be an excellent find). Perhaps we've just uncovered some curious behavior in DataFusion?

- Fixes an issue where the first profiling command would not be recognized if the CLI was not started with profiling enabled

BlakeOrth · 2025-09-09T00:02:49Z

Also, BTW tried it out but it doesn't seem to be working anymore

@alamb I've found the bug and fixed this behavior. Although this is one of those scenarios where I'm somewhat questioning how it ever worked correctly in terms of switching modes using commands. Regardless, I'm glad you chose to run a functional test case that I clearly had not run! The issue seems to have been with the initial command to change the state. Curiously, once the profiling had been enabled one time the changes thereafter seemed to work as expected. With the bug fix, the initial command now seems to be working though.

/datafusion-cli$ ../target/debug/datafusion-cli

DataFusion CLI v49.0.2
> \object_store_profiling trace
ObjectStore Profile mode set to Trace
> CREATE EXTERNAL TABLE nyc_taxi_rides
STORED AS PARQUET LOCATION 's3://altinity-clickhouse-data/nyc_taxi_rides/data/tripdata_parquet/';
0 row(s) fetched.
Elapsed 1.906 seconds.

Object Store Profiling
2025-09-08T23:55:35.057244770+00:00 operation=List path=nyc_taxi_rides/data/tripdata_parquet
List Summary:
  count: 1

2025-09-08T23:55:35.395591891+00:00 operation=List path=nyc_taxi_rides/data/tripdata_parquet
2025-09-08T23:55:35.630754482+00:00 operation=Get duration=0.100976s size=8 range: bytes=222192975-222192982 path=nyc_taxi_rides/data/tripdata_parquet/data-200901.parquet
2025-09-08T23:55:35.731796892+00:00 operation=Get duration=0.105280s size=38976 range: bytes=222153999-222192974 path=nyc_taxi_rides/data/tripdata_parquet/data-200901.parquet
2025-09-08T23:55:35.635551741+00:00 operation=Get duration=0.220758s size=8 range: bytes=217303101-217303108 path=nyc_taxi_rides/data/tripdata_parquet/data-200908.parquet
2025-09-08T23:55:35.632923076+00:00 operation=Get duration=0.238810s size=8 range: bytes=225659957-225659964 path=nyc_taxi_rides/data/tripdata_parquet/data-200904.parquet
2025-09-08T23:55:35.633575925+00:00 operation=Get duration=0.240542s size=8 range: bytes=232847298-232847305 path=nyc_taxi_rides/data/tripdata_parquet/data-200905.parquet
2025-09-08T23:55:35.638022172+00:00 operation=Get duration=0.237329s size=8 range: bytes=235166917-235166924 path=nyc_taxi_rides/data/tripdata_parquet/data-201001.parquet
2025-09-08T23:55:35.634262563+00:00 operation=Get duration=0.244533s size=8 range: bytes=224226567-224226574 path=nyc_taxi_rides/data/tripdata_parquet/data-200906.parquet

. . . Truncated for brevity . . .

2025-09-08T23:55:36.814991494+00:00 operation=Get duration=0.073694s size=19872 range: bytes=214807880-214827751 path=nyc_taxi_rides/data/tripdata_parquet/data-201603.parquet
2025-09-08T23:55:36.774456617+00:00 operation=Get duration=0.124367s size=15508 range: bytes=158722835-158738342 path=nyc_taxi_rides/data/tripdata_parquet/data-201612.parquet
2025-09-08T23:55:36.837998603+00:00 operation=Get duration=0.064300s size=18219 range: bytes=200688011-200706229 path=nyc_taxi_rides/data/tripdata_parquet/data-201602.parquet
List Summary:
  count: 1

Get Summary:
  count: 288
  duration min: 0.057396s
  duration max: 0.357809s
  duration avg: 0.108023s
  size min: 8 B
  size max: 44247 B
  size avg: 18870 B
  size sum: 5434702 B

>

alamb · 2025-09-09T14:01:31Z

@alamb I've found the bug and fixed this behavior.

Thank you!

@BlakeOrth I am very interested in getting this PR merged in as a precursor to some other work we are doing. Would you be willing to accept patches / commits to this branch (I could add some tests, for example)?

BlakeOrth · 2025-09-09T16:12:26Z

Would you be willing to accept patches / commits to this branch (I could add some tests, for example)?

@alamb Please do! My interest lies in keeping the code moving forward, not hoarding tasks!

alamb · 2025-09-12T14:18:43Z

I have not forgotten about this PR -- I am just busy with other projects now. I will come back to this soon (TM)

BlakeOrth · 2025-10-02T16:15:20Z

@alamb It's not entirely clear to me how we should proceed to keep this effort moving. There's been some discussion of using this PR and polishing it up with tests/docs etc vs splitting it up into several smaller PRs. I'm happy to help facilitate either approach. Do you have an idea on how you'd like to move forward?

BlakeOrth · 2025-10-02T17:26:06Z

@alamb OK, great. I completely understand the desire to have smaller reviews. Would you prefer to reduce the scope of this PR for the first iteration to better preserve the discussion/progression and how it's associated with the work, or should I close this PR and a new one can be opened with reduced scope?

alamb · 2025-10-02T21:12:48Z

How about we close this one and start making smaller PRs to build up to it

alamb · 2025-10-02T21:12:55Z

Thank you for your patience @BlakeOrth

BlakeOrth · 2025-10-02T22:06:54Z

@alamb Perfect, I will consider this POC complete at this point and will carve out some time to submit an initial, smaller PR to get us started. As I noted in #17214 (comment) some other efforts that are high priority for me need to be addressed first. My hope is those will be resolved within a week or so, at which point I will be able to get back to these changes with a more concerted effort. I will make sure to tag you once I get the first PR ready and submitted!

alamb · 2025-10-03T17:17:36Z

Thank you @BlakeOrth 🙏

@alamb

## Which issue does this PR close? This does not fully close, but is an incremental building block component for: - #17207 The full context of how this code is likely to progress can be seen in the POC for this effort: - #17266 ## Rationale for this change Continued progress filling out the methods that are instrumented for the instrumented object store. ## What changes are included in this PR? - Adds instrumentation around basic list operations into the instrumented object store - Adds test cases for new code ## Are these changes tested? Yes. Example output: ```sql DataFusion CLI v50.2.0 > \object_store_profiling trace ObjectStore Profile mode set to Trace > CREATE EXTERNAL TABLE nyc_taxi_rides STORED AS PARQUET LOCATION 's3://altinity-clickhouse-data/nyc_taxi_rides/data/tripdata_parquet'; 0 row(s) fetched. Elapsed 2.679 seconds. Object Store Profiling Instrumented Object Store: instrument_mode: Trace, inner: AmazonS3(altinity-clickhouse-data) 2025-10-16T18:53:09.512970085+00:00 operation=List path=nyc_taxi_rides/data/tripdata_parquet Summaries: List count: 1 Instrumented Object Store: instrument_mode: Trace, inner: AmazonS3(altinity-clickhouse-data) 2025-10-16T18:53:09.929709943+00:00 operation=List path=nyc_taxi_rides/data/tripdata_parquet 2025-10-16T18:53:10.106757629+00:00 operation=List path=nyc_taxi_rides/data/tripdata_parquet 2025-10-16T18:53:10.220555058+00:00 operation=Get duration=0.230604s size=8 range: bytes=222192975-222192982 path=nyc_taxi_rides/data/tripdata_parquet/data-200901.parquet 2025-10-16T18:53:10.226399832+00:00 operation=Get duration=0.263826s size=8 range: bytes=233123927-233123934 path=nyc_taxi_rides/data/tripdata_parquet/data-201104.parquet 2025-10-16T18:53:10.226194195+00:00 operation=Get duration=0.269754s size=8 range: bytes=252843253-252843260 path=nyc_taxi_rides/data/tripdata_parquet/data-201103.parquet . . . 2025-10-16T18:53:11.928787014+00:00 operation=Get duration=0.072248s size=18278 range: bytes=201384109-201402386 path=nyc_taxi_rides/data/tripdata_parquet/data-201509.parquet 2025-10-16T18:53:11.933475464+00:00 operation=Get duration=0.068880s size=17175 range: bytes=195411804-195428978 path=nyc_taxi_rides/data/tripdata_parquet/data-201601.parquet 2025-10-16T18:53:11.949629591+00:00 operation=Get duration=0.065645s size=19872 range: bytes=214807880-214827751 path=nyc_taxi_rides/data/tripdata_parquet/data-201603.parquet Summaries: List count: 2 Get count: 288 duration min: 0.060930s duration max: 0.444601s duration avg: 0.133339s size min: 8 B size max: 44247 B size avg: 18870 B size sum: 5434702 B > ``` ## Are there any user-facing changes? No-ish ## cc @alamb

@alamb

## Which issue does this PR close? This does not fully close, but is an incremental building block component for: - #17207 The full context of how this code is likely to progress can be seen in the POC for this effort: - #17266 ## Rationale for this change Continued progress filling out methods that are instrumented by the instrumented object store ## What changes are included in this PR? - Adds instrumentation around delimited list operations into the instrumented object store - Adds test cases for the new code ## Are these changes tested? Yes, unit tests have been added. Example output: ```sql DataFusion CLI v50.2.0 > CREATE EXTERNAL TABLE overture_partitioned STORED AS PARQUET LOCATION 's3://overturemaps-us-west-2/release/2025-09-24.0/theme=addresses/'; 0 row(s) fetched. Elapsed 2.307 seconds. > \object_store_profiling trace ObjectStore Profile mode set to Trace > select count(*) from overture_partitioned; +-----------+ | count(*) | +-----------+ | 446544475 | +-----------+ 1 row(s) fetched. Elapsed 1.932 seconds. Object Store Profiling Instrumented Object Store: instrument_mode: Trace, inner: AmazonS3(overturemaps-us-west-2) 2025-10-17T17:05:27.922724180+00:00 operation=List duration=0.132154s path=release/2025-09-24.0/theme=addresses 2025-10-17T17:05:28.054894440+00:00 operation=List duration=0.049048s path=release/2025-09-24.0/theme=addresses/type=address 2025-10-17T17:05:28.104233937+00:00 operation=Get duration=0.053522s size=8 range: bytes=1070778162-1070778169 path=release/2025-09-24.0/theme=addresses/type=address/part-00000-52872134-68de-44a6-822d-15fa29a0f606-c000.zstd.parquet 2025-10-17T17:05:28.106862343+00:00 operation=Get duration=0.108103s size=8 range: bytes=1017940335-1017940342 path=release/2025-09-24.0/theme=addresses/type=address/part-00003-52872134-68de-44a6-822d-15fa29a0f606-c000.zstd.parquet ... 2025-10-17T17:05:28.589084204+00:00 operation=Get duration=0.084737s size=836971 range: bytes=1112791717-1113628687 path=release/2025-09-24.0/theme=addresses/type=address/part-00009-52872134-68de-44a6-822d-15fa29a0f606-c000.zstd.parquet Summaries: List count: 2 duration min: 0.049048s duration max: 0.132154s duration avg: 0.090601s Get count: 33 duration min: 0.045500s duration max: 0.162114s duration avg: 0.089775s size min: 8 B size max: 917946 B size avg: 336000 B size sum: 11088026 B > ``` Note that a `LIST` report showing a duration must be a `list_with_delimiter()` call because a standard `list` call does not currently report a duration. ## Are there any user-facing changes? No-ish cc @alamb

@alamb

## Which issue does this PR close? This does not fully close, but is an incremental building block component for: - #17207 The full context of how this code is likely to progress can be seen in the POC for this effort: - #17266 ## Rationale for this change Further fills out the missing methods that have yet to be instrumented in the instrumented object store. ## What changes are included in this PR? - Adds instrumentation around put_opts - Adds instrumentation around put_multipart - Adds tests for newly instrumented methods ## Are these changes tested? Yes. Unit tests have been added for the new methods Example output: ```sql DataFusion CLI v50.2.0 > CREATE EXTERNAL TABLE test(a bigint, b bigint) STORED AS parquet LOCATION '../../test_table/'; 0 row(s) fetched. Elapsed 0.003 seconds. > \object_store_profiling trace ObjectStore Profile mode set to Trace > INSERT INTO test values (1, 2), (3, 4); +-------+ | count | +-------+ | 2 | +-------+ 1 row(s) fetched. Elapsed 0.007 seconds. Object Store Profiling Instrumented Object Store: instrument_mode: Trace, inner: LocalFileSystem(file:///) 2025-10-17T19:02:15.440246215+00:00 operation=List path=home/blake/open_source_src/datafusion-BlakeOrth/test_table 2025-10-17T19:02:15.444096012+00:00 operation=Put duration=0.000249s size=815 path=home/blake/open_source_src/datafusion-BlakeOrth/test_table/a9pjKBxSOtXZobJO_0.parquet Summaries: List count: 1 Put count: 1 duration min: 0.000249s duration max: 0.000249s duration avg: 0.000249s size min: 815 B size max: 815 B size avg: 815 B size sum: 815 B > ``` (note: I have no idea how to exercise/show a multi-part put operation, or if DataFusion even utilizes multipart puts for large files) ## Are there any user-facing changes? No-ish cc @alamb --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

@alamb

## Which issue does this PR close? This does not fully close, but is an incremental building block component for: - #17207 The full context of how this code is likely to progress can be seen in the POC for this effort: - #17266 ## Rationale for this change Further fills out method instrumentation ## What changes are included in this PR? - Adds instrumentation to head requests in the instrumented object store - Adds instrumentatin to delete requests in the instrumented object store - Adds tests for new code ## Are these changes tested? Yes. New unit tests have been added. ## Are there any user-facing changes? No-ish ## cc @alamb

* Refactor: split test_window_partial_constant_and_set_monotonicity into multiple tests (#17952) * fix: Ensure ListingTable partitions are pruned when filters are not used (#17958) * fix: Prune partitions when no filters are defined * fix: Formatting * chore: Cargo fmt * chore: Clippy * Push Down Filter Subexpressions in Nested Loop Joins as Projections (#17906) * Check-in NestedLoopJoinProjectionPushDown * Update Cargo.lock * Add some comments * Update slts that are affected by the nl-join-projection-push-down * please lints * Move code into projection_pushdown.rs * Remove explicit coalesce batches * Docs * feat: support Spark `concat` string function (#18063) * chore: Extend backtrace coverage * fmt * part2 * feedback * clippy * feat: support Spark `concat` * clippy * comments * test * doc * Add independent configs for topk/join dynamic filter (#18090) * Add independent configs for topk/join dynamic filter * fix ci * update doc * fix typo * Adds Trace and Summary to CLI instrumented stores (#18064) - Adds the ability for a user to choose a summary only output for an instrumented object store when using the CLI - The existing "enabled" setting that displays both a summary and a detailed usage for each object store call has been renamed to `Trace` to improve clarity - Adds additional test cases for summary only and modifies existing tests to use trace - Updates user guide docs to reflect the CLI flag and command line changes * fix: Improve null handling in array_to_string function (#18076) * fix: Improve null handling in array_to_string function * chore * feat: update .asf.yaml configuration settings (#18027) * Fix extended tests on main to get CI green (#18096) ## Which issue does this PR close? - Closes https://github.com/apache/datafusion/issues/18084 ## Rationale for this change Some of the extended tests are failing because we have fixed case conditional evaluation and queries that (incorrectly) previously did not pass are now.  ## What changes are included in this PR? Update datafusion-testing pin ## Are these changes tested? I tested locally with: ```shell INCLUDE_SQLITE=true cargo test --profile release-nonlto --test sqllogictests ``` ## Are there any user-facing changes? No * chore(deps): bump taiki-e/install-action from 2.62.29 to 2.62.31 (#18094) Bumps [taiki-e/install-action](https://github.com/taiki-e/install-action) from 2.62.29 to 2.62.31. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="/taiki-e/install-action/releases">taiki-e/install-action's releases</a>.</em></p> <blockquote> <h2>2.62.31</h2> <ul> <li> <p>Update <code>protoc@latest</code> to 3.33.0.</p> </li> <li> <p>Update <code>uv@latest</code> to 0.9.3.</p> </li> <li> <p>Update <code>syft@latest</code> to 1.34.1.</p> </li> <li> <p>Update <code>mise@latest</code> to 2025.10.9.</p> </li> <li> <p>Update <code>cargo-shear@latest</code> to 1.6.0.</p> </li> </ul> <h2>2.62.30</h2> <ul> <li> <p>Update <code>vacuum@latest</code> to 0.18.6.</p> </li> <li> <p>Update <code>zizmor@latest</code> to 1.15.2.</p> </li> </ul> </blockquote> </details> <details> <summary>Changelog</summary> <p><em>Sourced from <a href="/taiki-e/install-action/blob/main/CHANGELOG.md">taiki-e/install-action's changelog</a>.</em></p> <blockquote> <h1>Changelog</h1> <p>All notable changes to this project will be documented in this file.</p> <p>This project adheres to <a href="https://semver.org">Semantic Versioning</a>.</p>  <h2>[Unreleased]</h2> <h2>[2.62.31] - 2025-10-16</h2> <ul> <li> <p>Update <code>protoc@latest</code> to 3.33.0.</p> </li> <li> <p>Update <code>uv@latest</code> to 0.9.3.</p> </li> <li> <p>Update <code>syft@latest</code> to 1.34.1.</p> </li> <li> <p>Update <code>mise@latest</code> to 2025.10.9.</p> </li> <li> <p>Update <code>cargo-shear@latest</code> to 1.6.0.</p> </li> </ul> <h2>[2.62.30] - 2025-10-15</h2> <ul> <li> <p>Update <code>vacuum@latest</code> to 0.18.6.</p> </li> <li> <p>Update <code>zizmor@latest</code> to 1.15.2.</p> </li> </ul> <h2>[2.62.29] - 2025-10-14</h2> <ul> <li> <p>Update <code>zizmor@latest</code> to 1.15.1.</p> </li> <li> <p>Update <code>cargo-nextest@latest</code> to 0.9.106.</p> </li> <li> <p>Update <code>mise@latest</code> to 2025.10.8.</p> </li> <li> <p>Update <code>ubi@latest</code> to 0.8.1.</p> </li> </ul> <h2>[2.62.28] - 2025-10-11</h2> <ul> <li> <p>Update <code>release-plz@latest</code> to 0.3.148.</p> </li> <li> <p>Update <code>cargo-sort@latest</code> to 2.0.2.</p> </li> <li> <p>Update <code>cargo-binstall@latest</code> to 1.15.7.</p> </li> <li> <p>Update <code>uv@latest</code> to 0.9.2.</p> </li> </ul>  </blockquote> <p>... (truncated)</p> </details> <details> <summary>Commits</summary> <ul> <li><a href="/taiki-e/install-action/commit/0005e0116e92d8489d8d96fbff83f061c79ba95a"><code>0005e01</code></a> Release 2.62.31</li> <li><a href="/taiki-e/install-action/commit/6936d999d90424ed013e4f325d91e14d7ddba27f"><code>6936d99</code></a> Update <code>protoc@latest</code> to 3.33.0</li> <li><a href="/taiki-e/install-action/commit/ac7ad6efa1b1bb919bcaa357eb1873f328ee07f7"><code>ac7ad6e</code></a> Update <code>uv@latest</code> to 0.9.3</li> <li><a href="/taiki-e/install-action/commit/005833aaf18c1621513995406c3bc0397747afc2"><code>005833a</code></a> Update <code>syft@latest</code> to 1.34.1</li> <li><a href="/taiki-e/install-action/commit/2b32ff6f3dc99bc9fa6647cbc9f7da71cf979b65"><code>2b32ff6</code></a> Update <code>mise@latest</code> to 2025.10.9</li> <li><a href="/taiki-e/install-action/commit/74c0274864f156f487aee04623a20b315fb2125a"><code>74c0274</code></a> Update <code>cargo-shear@latest</code> to 1.6.0</li> <li><a href="/taiki-e/install-action/commit/f13d8e15c52b25c79b608d399cc802adc73d83da"><code>f13d8e1</code></a> Release 2.62.30</li> <li><a href="/taiki-e/install-action/commit/1034dc55996706645239db97d3ea04f42a708f22"><code>1034dc5</code></a> Update <code>vacuum@latest</code> to 0.18.6</li> <li><a href="/taiki-e/install-action/commit/55b5d509b8761e9696e1cfec0d6f66f0655e8fff"><code>55b5d50</code></a> Update <code>zizmor@latest</code> to 1.15.2</li> <li>See full diff in <a href="/taiki-e/install-action/compare/5b5de1b4da26ad411330c0454bdd72929bfcbeb2...0005e0116e92d8489d8d96fbff83f061c79ba95a">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=taiki-e/install-action&package-manager=github_actions&previous-version=2.62.29&new-version=2.62.31)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * chore: run extended suite on PRs for critical areas (#18088) ## Which issue does this PR close?  - Closes #. Related to https://github.com/apache/datafusion/issues/18084 ## Rationale for this change Run extended suite on PRs for critical areas, to avoid post merge bugfixing  ## What changes are included in this PR?  ## Are these changes tested?  ## Are there any user-facing changes?   --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> * refactor: add dialect enum (#18043) ## Which issue does this PR close? - Closes #18042 ## Rationale for this change This PR introduces a new dialect enum to improve type safety and code maintainability when handling different SQL dialects in DataFusion 1. Provide compile-time guarantees for dialect handling 2. Improve code readability and self-documentation 3. Enable better IDE support and autocomplete ## What changes are included in this PR? - Added a new `Dialect` enum to represent supported SQL dialects - Refactored existing code to use the new enum instead of previous representations - Modified tests to work with the new enum-based approach ## Are these changes tested? Yes ## Are there any user-facing changes? Yes, this is an API change: the type of the `dialect` field changed from `String` to `Dialect` * #17982 Make `nvl` a thin wrapper for `coalesce` (#17991) ## Which issue does this PR close? - Closes #17982 ## Rationale for this change By making `NVLFunc` a wrapper for `CoalesceFunc` with a more restrictive signature the implementation automatically benefits from any optimisation work related to `coalesce`. ## What changes are included in this PR? - Make `NVLFunc` a thin wrapper of `CoalesceFunc`. This seemed like the simplest way to reuse the coalesce logic, but keep the stricter signature of `nvl`. - Add `ScalarUDF::conditional_arguments` as a more precise complement to `ScalarUDF::short_circuits`. By letting each function expose which arguments are eager and which are lazy, we provide more precise information to the optimizer which may enable better optimisation. ## Are these changes tested? Assumed to be covered by sql logic tests. Unit tests for the custom implementation were removed since those are no longer relevant. ## Are there any user-facing changes? The rewriting of `nvl` to `case when ... then ... else ... end` is visible in the physical query plan. --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> * minor: fix incorrect deprecation version & window docs (#18093) * chore: use `NullBuffer::union` for Spark `concat` (#18087) ## Which issue does this PR close?  - Closes #. Followup on https://github.com/apache/datafusion/pull/18063#pullrequestreview-3341818221 ## Rationale for this change Use cheaper `NullBuffer::union` to apply null mask instead of iterator approach  ## What changes are included in this PR?  ## Are these changes tested?  ## Are there any user-facing changes?   * feat: support `null_treatment`, `distinct`, and `filter` for window functions in proto (#18024) ## Which issue does this PR close?  - Closes #17417. ## Rationale for this change  - Support `null_treatment`, `distinct`, and `filter` for window function in proto. - Support `null_treatment` for aggregate udf in proto. ## What changes are included in this PR?  - [x] Add `null_treatment`, `distinct`, `filter` fields to `WindowExprNode` message and handle them in `to/from_proto.rs`. - [x] Add `null_treatment` field to `AggregateUDFExprNode` message and handle them in `to/from_proto.rs`. - [ ] Docs update: I'm not sure where to add docs as declared in the issue description. ## Are these changes tested?  - Add tests to `roundtrip_window` for respectnulls, ignorenulls, distinct, filter. - Add tests to `roundtrip_aggregate_udf` for respectnulls, ignorenulls. ## Are there any user-facing changes?   N/A --------- Co-authored-by: Jeffrey Vo <jeffrey.vo.australia@gmail.com> * feat: Add percentile_cont aggregate function (#17988) ## Summary Adds exact `percentile_cont` aggregate function as the counterpart to the existing `approx_percentile_cont` function. ## What changes were made? ### New Implementation - Created `percentile_cont.rs` with full implementation - `PercentileCont` struct implementing `AggregateUDFImpl` - `PercentileContAccumulator` for standard aggregation - `DistinctPercentileContAccumulator` for DISTINCT mode - `PercentileContGroupsAccumulator` for efficient grouped aggregation - `calculate_percentile` function with linear interpolation ### Features - **Exact calculation**: Stores all values in memory for precise results - **WITHIN GROUP syntax**: Supports `WITHIN GROUP (ORDER BY ...)` - **Interpolation**: Uses linear interpolation between values - **All numeric types**: Works with integers, floats, and decimals - **Ordered-set aggregate**: Properly marked as `is_ordered_set_aggregate()` - **GROUP BY support**: Efficient grouped aggregation via GroupsAccumulator ### Tests Added comprehensive tests in `aggregate.slt`: - Error conditions validation - Basic percentile calculations (0.0, 0.25, 0.5, 0.75, 1.0) - Comparison with `median` function - Ascending and descending order - GROUP BY aggregation - NULL handling - Edge cases (empty sets, single values) - Float interpolation - Various numeric data types ## Example Usage ```sql -- Basic usage with WITHIN GROUP syntax SELECT percentile_cont(0.75) WITHIN GROUP (ORDER BY column_name) FROM table_name; -- With GROUP BY SELECT category, percentile_cont(0.95) WITHIN GROUP (ORDER BY value) FROM sales GROUP BY category; -- Compare with median (percentile_cont(0.5) == median) SELECT percentile_cont(0.5) WITHIN GROUP (ORDER BY price) FROM products; ``` ## Performance Considerations Like `median`, this function stores all values in memory before computing results. For large datasets or when approximation is acceptable, use `approx_percentile_cont` instead. ## Related Issues Closes #6714 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude <noreply@anthropic.com> * fix: Re-bump latest datafusion-testing module so extended tests succeed (#18110) Looks like #17988 accidentally reverted the bump from #18096 * chore(deps): bump taiki-e/install-action from 2.62.31 to 2.62.33 (#18113) Bumps [taiki-e/install-action](https://github.com/taiki-e/install-action) from 2.62.31 to 2.62.33. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="/taiki-e/install-action/releases">taiki-e/install-action's releases</a>.</em></p> <blockquote> <h2>2.62.33</h2> <ul> <li>Update <code>mise@latest</code> to 2025.10.10.</li> </ul> <h2>2.62.32</h2> <ul> <li> <p>Update <code>syft@latest</code> to 1.34.2.</p> </li> <li> <p>Update <code>vacuum@latest</code> to 0.18.7.</p> </li> </ul> </blockquote> </details> <details> <summary>Changelog</summary> <p><em>Sourced from <a href="/taiki-e/install-action/blob/main/CHANGELOG.md">taiki-e/install-action's changelog</a>.</em></p> <blockquote> <h1>Changelog</h1> <p>All notable changes to this project will be documented in this file.</p> <p>This project adheres to <a href="https://semver.org">Semantic Versioning</a>.</p>  <h2>[Unreleased]</h2> <h2>[2.62.33] - 2025-10-17</h2> <ul> <li>Update <code>mise@latest</code> to 2025.10.10.</li> </ul> <h2>[2.62.32] - 2025-10-16</h2> <ul> <li> <p>Update <code>syft@latest</code> to 1.34.2.</p> </li> <li> <p>Update <code>vacuum@latest</code> to 0.18.7.</p> </li> </ul> <h2>[2.62.31] - 2025-10-16</h2> <ul> <li> <p>Update <code>protoc@latest</code> to 3.33.0.</p> </li> <li> <p>Update <code>uv@latest</code> to 0.9.3.</p> </li> <li> <p>Update <code>syft@latest</code> to 1.34.1.</p> </li> <li> <p>Update <code>mise@latest</code> to 2025.10.9.</p> </li> <li> <p>Update <code>cargo-shear@latest</code> to 1.6.0.</p> </li> </ul> <h2>[2.62.30] - 2025-10-15</h2> <ul> <li> <p>Update <code>vacuum@latest</code> to 0.18.6.</p> </li> <li> <p>Update <code>zizmor@latest</code> to 1.15.2.</p> </li> </ul> <h2>[2.62.29] - 2025-10-14</h2> <ul> <li> <p>Update <code>zizmor@latest</code> to 1.15.1.</p> </li> <li> <p>Update <code>cargo-nextest@latest</code> to 0.9.106.</p> </li> <li> <p>Update <code>mise@latest</code> to 2025.10.8.</p> </li> <li> <p>Update <code>ubi@latest</code> to 0.8.1.</p> </li> </ul>  </blockquote> <p>... (truncated)</p> </details> <details> <summary>Commits</summary> <ul> <li><a href="/taiki-e/install-action/commit/e43a5023a747770bfcb71ae048541a681714b951"><code>e43a502</code></a> Release 2.62.33</li> <li><a href="/taiki-e/install-action/commit/2ae4258c3daeaf460c202b95aa4272c1f594d78e"><code>2ae4258</code></a> Update <code>mise@latest</code> to 2025.10.10</li> <li><a href="/taiki-e/install-action/commit/e79914c740f0acf092c59adfa2a61d3d2266b6bf"><code>e79914c</code></a> Release 2.62.32</li> <li><a href="/taiki-e/install-action/commit/40168eab5f259c94f094865825dbdefd1cf31bbf"><code>40168ea</code></a> Update <code>syft@latest</code> to 1.34.2</li> <li><a href="/taiki-e/install-action/commit/6d89b16c494331f0cdbca002e68ea5ab4fa8e3f6"><code>6d89b16</code></a> Update <code>vacuum@latest</code> to 0.18.7</li> <li>See full diff in <a href="/taiki-e/install-action/compare/0005e0116e92d8489d8d96fbff83f061c79ba95a...e43a5023a747770bfcb71ae048541a681714b951">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=taiki-e/install-action&package-manager=github_actions&previous-version=2.62.31&new-version=2.62.33)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Adding hiop as known user (#18114) ## Which issue does this PR close?  - Doesn't close an issue. ## Rationale for this change  Hi we are hiop, a Serverless Data Logistic Platform. We use DataFusion as a core part of our backend engine, and it plays a crucial role in our data infrastructure. Our team members are passionate about the project and actively try contribute to its development (@dariocurr). We’d love to have Hiop listed among the Known Users to show our support and help the DataFusion community continue to grow. ## What changes are included in this PR?  Just adding hiop as known user ## Are these changes tested?  ## Are there any user-facing changes?   * chore: remove unnecessary `skip_failed_rules` config in slt (#18117) ## Which issue does this PR close?  - Closes #3695 - Closes #3797 ## Rationale for this change  Was looking at above issues and I don't believe we skip the failed rules for any tests anymore (default for the config is also `false`), apart from this cleanup, so filing this PR so we can close the issues. Seems we only do in this `window.slt` test after this fix: https://github.com/apache/datafusion/blob/621a24978a7a9c6d2b27973d1853dbc8776a56b5/datafusion/sqllogictest/test_files/window.slt#L2587-L2611 Which seems intentional. ## What changes are included in this PR?  Remove unnecessary `skip_failed_rules` config. ## Are these changes tested?  Existing tests. ## Are there any user-facing changes?  No.  * move repartition to insta (#18106) Related https://github.com/apache/datafusion/pull/16324 https://github.com/apache/datafusion/pull/16617 almost there! * refactor: move ListingTable over to the catalog-listing-table crate (#18080) ## Which issue does this PR close? - This addresses part of https://github.com/apache/datafusion/issues/17713 - Closes https://github.com/apache/datafusion/issues/14462 ## Rationale for this change In order to remove the `datafusion` core crate from `proto` as a dependency, we need to access `ListingTable` but it is within the `core` crate. There already exists a `datafusion-catalog-listing` which is bare and appears to be the place this should exist. ## What changes are included in this PR? Move `ListingTable` and some of its dependent structs over to the `datafusion-catalog-listing` crate. There is one dependency I wasn't able to remove from the `core` crate, which is inferring the listing table configuration options. That is because within this method it downcasts `Session` to `SessionState`. If a downstream user ever attempts to implement `Session` themselves, these methods also would not work. Because it would cause a circular dependency, we cannot also lift the method we need out of `SessionState` to `Session`. Instead I took the approach of splitting off the two methods that require `SessionState` as an extension trait for the listing table config. From the git diff this appears to be a large change (+1637/-1519) however the *vast* majority of that is copying the code from one file into another. I have added a comment on the significant change. ## Are these changes tested? Existing unit tests show no regression. This is just a code refactor. ## Are there any user-facing changes? Users may need to update their use paths. * refactor: move arrow datasource to new `datafusion-datasource-arrow` crate (#18082) ## Which issue does this PR close? - This addresses part of https://github.com/apache/datafusion/issues/17713 but it does not close it. ## Rationale for this change In order to remove `core` from `proto` crate, we need `ArrowFormat` to be available. Similar to the other datasource types (csv, avro, json, parquet) this splits the Arrow IPC file format into its own crate. ## What changes are included in this PR? This is a straight refactor. Code is merely moved around. The size of the diff is the additional files that are required (cargo.toml, readme.md, etc) ## Are these changes tested? Existing unit tests. ## Are there any user-facing changes? Users that include `ArrowSource` may need to update their include paths. For most, the reexports will cover this need. * Adds instrumentation to LIST operations in CLI (#18103) ## Which issue does this PR close? This does not fully close, but is an incremental building block component for: - https://github.com/apache/datafusion/issues/17207 The full context of how this code is likely to progress can be seen in the POC for this effort: - https://github.com/apache/datafusion/pull/17266 ## Rationale for this change Continued progress filling out the methods that are instrumented for the instrumented object store. ## What changes are included in this PR? - Adds instrumentation around basic list operations into the instrumented object store - Adds test cases for new code ## Are these changes tested? Yes. Example output: ```sql DataFusion CLI v50.2.0 > \object_store_profiling trace ObjectStore Profile mode set to Trace > CREATE EXTERNAL TABLE nyc_taxi_rides STORED AS PARQUET LOCATION 's3://altinity-clickhouse-data/nyc_taxi_rides/data/tripdata_parquet'; 0 row(s) fetched. Elapsed 2.679 seconds. Object Store Profiling Instrumented Object Store: instrument_mode: Trace, inner: AmazonS3(altinity-clickhouse-data) 2025-10-16T18:53:09.512970085+00:00 operation=List path=nyc_taxi_rides/data/tripdata_parquet Summaries: List count: 1 Instrumented Object Store: instrument_mode: Trace, inner: AmazonS3(altinity-clickhouse-data) 2025-10-16T18:53:09.929709943+00:00 operation=List path=nyc_taxi_rides/data/tripdata_parquet 2025-10-16T18:53:10.106757629+00:00 operation=List path=nyc_taxi_rides/data/tripdata_parquet 2025-10-16T18:53:10.220555058+00:00 operation=Get duration=0.230604s size=8 range: bytes=222192975-222192982 path=nyc_taxi_rides/data/tripdata_parquet/data-200901.parquet 2025-10-16T18:53:10.226399832+00:00 operation=Get duration=0.263826s size=8 range: bytes=233123927-233123934 path=nyc_taxi_rides/data/tripdata_parquet/data-201104.parquet 2025-10-16T18:53:10.226194195+00:00 operation=Get duration=0.269754s size=8 range: bytes=252843253-252843260 path=nyc_taxi_rides/data/tripdata_parquet/data-201103.parquet . . . 2025-10-16T18:53:11.928787014+00:00 operation=Get duration=0.072248s size=18278 range: bytes=201384109-201402386 path=nyc_taxi_rides/data/tripdata_parquet/data-201509.parquet 2025-10-16T18:53:11.933475464+00:00 operation=Get duration=0.068880s size=17175 range: bytes=195411804-195428978 path=nyc_taxi_rides/data/tripdata_parquet/data-201601.parquet 2025-10-16T18:53:11.949629591+00:00 operation=Get duration=0.065645s size=19872 range: bytes=214807880-214827751 path=nyc_taxi_rides/data/tripdata_parquet/data-201603.parquet Summaries: List count: 2 Get count: 288 duration min: 0.060930s duration max: 0.444601s duration avg: 0.133339s size min: 8 B size max: 44247 B size avg: 18870 B size sum: 5434702 B > ``` ## Are there any user-facing changes? No-ish ## cc @alamb * feat: spark udf array shuffle (#17674) ## Which issue does this PR close? ## Rationale for this change support shuffle udf ## What changes are included in this PR? support shuffle udf ## Are these changes tested? UT ## Are there any user-facing changes? No * make Union::try_new pub (#18125) ## Which issue does this PR close? - Closes #18126. ## Rationale for this change It's a useful constructor for users manipulating logical plans where they know the schemas will match exactly. We already expose other constructors for Union and constructors for logical plans. ## What changes are included in this PR? Makes `Union::try_new` a public function. ## Are these changes tested? Seems unnecessary. ## Are there any user-facing changes? The function is now public. Not a breaking change, but going forward changes to it would breaking changes to users of the logical plan API. * fix: window unparsing (#17367) ## Which issue does this PR close? - Closes #17360. ## Rationale for this change in LogicalPlan::Filter unparsing, if there's a window expr, it should be converted to quailify. postgres must has an alias for derived table. otherwise it will complain: ``` ERROR: subquery in FROM must have an alias. ``` fixed this issue at the same time. ## What changes are included in this PR? If window expr is found, convert filter to quailify. ## Are these changes tested? UT ## Are there any user-facing changes? No --------- Co-authored-by: Jeffrey Vo <jeffrey.vo.australia@gmail.com> * feat: Support configurable `EXPLAIN ANALYZE` detail level (#18098) ## Which issue does this PR close?  - Closes #. ## Rationale for this change  `EXPLAIN ANALYZE` can be used for profiling and displays the results alongside the EXPLAIN plan. The issue is that it currently shows too many low-level details. It would provide a better user experience if only the most commonly used metrics were shown by default, with more detailed metrics available through specific configuration options. ### Example In `datafusion-cli`: ``` > CREATE EXTERNAL TABLE IF NOT EXISTS lineitem STORED AS parquet LOCATION '/Users/yongting/Code/datafusion/benchmarks/data/tpch_sf1/lineitem'; 0 row(s) fetched. Elapsed 0.000 seconds. explain analyze select * from lineitem where l_orderkey = 3000000; ``` The parquet reader includes a large number of low-level details: ``` metrics=[output_rows=19813, elapsed_compute=14ns, batches_split=0, bytes_scanned=2147308, file_open_errors=0, file_scan_errors=0, files_ranges_pruned_statistics=18, num_predicate_creation_errors=0, page_index_rows_matched=19813, page_index_rows_pruned=729088, predicate_cache_inner_records=0, predicate_cache_records=0, predicate_evaluation_errors=0, pushdown_rows_matched=0, pushdown_rows_pruned=0, row_groups_matched_bloom_filter=0, row_groups_matched_statistics=1, row_groups_pruned_bloom_filter=0, row_groups_pruned_statistics=0, bloom_filter_eval_time=21.997µs, metadata_load_time=273.83µs, page_index_eval_time=29.915µs, row_pushdown_eval_time=42ns, statistics_eval_time=76.248µs, time_elapsed_opening=4.02146ms, time_elapsed_processing=24.787461ms, time_elapsed_scanning_total=24.17671ms, time_elapsed_scanning_until_data=23.103665ms] ``` I believe only a subset of it is commonly used, for example `output_rows`, `metadata_load_time`, and how many file/row-group/pages are pruned, and it would better to only display the most common ones by default. ### Existing `VERBOSE` keyword There is a existing verbose keyword in `EXPLAIN ANALYZE VERBOSE`, however it's turning on per-partition metrics instead of controlling detail level. I think it would be hard to mix this partition control and the detail level introduced in this PR, so they're separated: the following config will be used for detail level and the semantics of `EXPLAIN ANALYZE VERBOSE` keep unchanged. ### This PR: configurable explain analyze level 1. Introduced a new config option `datafusion.explain.analyze_level`. When set to `dev` (default value), all existing metrics will be shown. If set to `summary`, only `BaselineMetrics` will be displayed (i.e. `output_rows` and `elapsed_compute`). Note now we only include `BaselineMetrics` for simplicity, in the follow-up PRs we can figure out what's the commonly used metrics for each operator, and add them to `summary` analyze level, finally set the `summary` analyze level to default. 2. Add a `MetricType` field associated with `Metric` for detail level or potentially category in the future. For different configurations, a certain `MetricType` set will be shown accordingly. #### Demo ``` -- continuing the above example > set datafusion.explain.analyze_level = summary; 0 row(s) fetched. Elapsed 0.000 seconds. > explain analyze select * from lineitem where l_orderkey = 3000000; +-------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | plan_type | plan | +-------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Plan with Metrics | CoalesceBatchesExec: target_batch_size=8192, metrics=[output_rows=5, elapsed_compute=25.339µs] | | | FilterExec: l_orderkey@0 = 3000000, metrics=[output_rows=5, elapsed_compute=81.221µs] | | | DataSourceExec: file_groups={14 groups: [[Users/yongting/Code/datafusion/benchmarks/data/tpch_sf1/lineitem/part-0.parquet:0..11525426], [Users/yongting/Code/datafusion/benchmarks/data/tpch_sf1/lineitem/part-0.parquet:11525426..20311205, Users/yongting/Code/datafusion/benchmarks/data/tpch_sf1/lineitem/part-1.parquet:0..2739647], [Users/yongting/Code/datafusion/benchmarks/data/tpch_sf1/lineitem/part-1.parquet:2739647..14265073], [Users/yongting/Code/datafusion/benchmarks/data/tpch_sf1/lineitem/part-1.parquet:14265073..20193593, Users/yongting/Code/datafusion/benchmarks/data/tpch_sf1/lineitem/part-2.parquet:0..5596906], [Users/yongting/Code/datafusion/benchmarks/data/tpch_sf1/lineitem/part-2.parquet:5596906..17122332], ...]}, projection=[l_orderkey, l_partkey, l_suppkey, l_linenumber, l_quantity, l_extendedprice, l_discount, l_tax, l_returnflag, l_linestatus, l_shipdate, l_commitdate, l_receiptdate, l_shipinstruct, l_shipmode, l_comment], file_type=parquet, predicate=l_orderkey@0 = 3000000, pruning_predicate=l_orderkey_null_count@2 != row_count@3 AND l_orderkey_min@0 <= 3000000 AND 3000000 <= l_orderkey_max@1, required_guarantees=[l_orderkey in (3000000)], metrics=[output_rows=19813, elapsed_compute=14ns] | | | | +-------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 1 row(s) fetched. Elapsed 0.025 seconds. ``` Only `BaselineMetrics` are shown. ## What changes are included in this PR?  ## Are these changes tested?  UT ## Are there any user-facing changes? No   --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> * refactor: remove unused `type_coercion/aggregate.rs` functions (#18091) ## Which issue does this PR close?  N/A ## Rationale for this change  There's a few functions in `datafusion/expr-common/src/type_coercion/aggregates.rs` that are unused elsewhere in the codebase, likely a remnant before the refactor to UDF, so removing them. Some are still used (`coerce_avg_type()` and `avg_return_type()`) so these are inlined into the Avg aggregate function (similar to Sum). Also refactor some window functions to use already available macros. ## What changes are included in this PR?  - Remove some unused functions - Inline avg coerce & return type logic - Refactor Spark Avg a bit to remove unnecessary code - Refactor ntile & nth window functions to use available macros ## Are these changes tested?  Existing tests. ## Are there any user-facing changes? Yes as these functions were publicly exported; however I'm not sure they were meant to be used by users anyway, given what they do.   * Add extra case_when benchmarks (#18097) ## Which issue does this PR close? None ## Rationale for this change More microbenchmarks make it easier to asses the performance impact of `CaseExpr` implementation changes. ## What changes are included in this PR? Add microbenchmarks for `case` expressions that are a bit more representative for real world queries. ## Are these changes tested? n/a ## Are there any user-facing changes? no * fix: Add dictionary coercion support for numeric comparison operations (#18099) ## Which issue does this PR close?  Fixes comparison errors when using dictionary-encoded types with comparison functions like NULLIF. ## Rationale for this change When using dictionary-encoded columns (e.g., Dictionary(Int32, Utf8)) in comparison operations with literals or other types, DataFusion would throw an error stating the types are not comparable. This was particularly problematic for functions like NULLIF which rely on comparison coercion. The issue was that comparison_coercion_numeric didn't handle dictionary types, even though the general comparison_coercion function did have dictionary support.  ## What changes are included in this PR? 1. Refactored dictionary comparison logic: Extracted common dictionary coercion logic into dictionary_comparison_coercion_generic to avoid code duplication. 2. Added numeric-specific dictionary coercion: Introduced dictionary_comparison_coercion_numeric that uses numeric-preferring comparison rules when dealing with dictionary value types. 3. Updated comparison_coercion_numeric: Added a call to dictionary_comparison_coercion_numeric in the coercion chain to properly handle dictionary types. 4. Added sqllogictest cases demonstrating the fix works for various dictionary comparison scenarios.  ## Are these changes tested? Yes, added tests in datafusion/sqllogictest/test_files/nullif.slt covering: - Dictionary type compared with string literal - String compared with dictionary type - Dictionary compared with dictionary All tests pass with the fix and would fail without it.  ## Are there any user-facing changes? This is a bug fix that enables previously failing queries to work correctly. No breaking changes or API modifications.   * Adds instrumentation to delimited LIST operations in CLI (#18134) ## Which issue does this PR close? This does not fully close, but is an incremental building block component for: - https://github.com/apache/datafusion/issues/17207 The full context of how this code is likely to progress can be seen in the POC for this effort: - https://github.com/apache/datafusion/pull/17266 ## Rationale for this change Continued progress filling out methods that are instrumented by the instrumented object store ## What changes are included in this PR? - Adds instrumentation around delimited list operations into the instrumented object store - Adds test cases for the new code ## Are these changes tested? Yes, unit tests have been added. Example output: ```sql DataFusion CLI v50.2.0 > CREATE EXTERNAL TABLE overture_partitioned STORED AS PARQUET LOCATION 's3://overturemaps-us-west-2/release/2025-09-24.0/theme=addresses/'; 0 row(s) fetched. Elapsed 2.307 seconds. > \object_store_profiling trace ObjectStore Profile mode set to Trace > select count(*) from overture_partitioned; +-----------+ | count(*) | +-----------+ | 446544475 | +-----------+ 1 row(s) fetched. Elapsed 1.932 seconds. Object Store Profiling Instrumented Object Store: instrument_mode: Trace, inner: AmazonS3(overturemaps-us-west-2) 2025-10-17T17:05:27.922724180+00:00 operation=List duration=0.132154s path=release/2025-09-24.0/theme=addresses 2025-10-17T17:05:28.054894440+00:00 operation=List duration=0.049048s path=release/2025-09-24.0/theme=addresses/type=address 2025-10-17T17:05:28.104233937+00:00 operation=Get duration=0.053522s size=8 range: bytes=1070778162-1070778169 path=release/2025-09-24.0/theme=addresses/type=address/part-00000-52872134-68de-44a6-822d-15fa29a0f606-c000.zstd.parquet 2025-10-17T17:05:28.106862343+00:00 operation=Get duration=0.108103s size=8 range: bytes=1017940335-1017940342 path=release/2025-09-24.0/theme=addresses/type=address/part-00003-52872134-68de-44a6-822d-15fa29a0f606-c000.zstd.parquet ... 2025-10-17T17:05:28.589084204+00:00 operation=Get duration=0.084737s size=836971 range: bytes=1112791717-1113628687 path=release/2025-09-24.0/theme=addresses/type=address/part-00009-52872134-68de-44a6-822d-15fa29a0f606-c000.zstd.parquet Summaries: List count: 2 duration min: 0.049048s duration max: 0.132154s duration avg: 0.090601s Get count: 33 duration min: 0.045500s duration max: 0.162114s duration avg: 0.089775s size min: 8 B size max: 917946 B size avg: 336000 B size sum: 11088026 B > ``` Note that a `LIST` report showing a duration must be a `list_with_delimiter()` call because a standard `list` call does not currently report a duration. ## Are there any user-facing changes? No-ish cc @alamb * feat: add fp16 support to Substrait (#18086) ## Which issue does this PR close? - Closes #16298 ## Rationale for this change Float16 is an Arrow type. Substrait serialization for the type is defined in https://github.com/apache/arrow/blame/main/format/substrait/extension_types.yaml as part of Arrow. We should support it. This picks up where https://github.com/apache/datafusion/pull/16793 leaves off. ## What changes are included in this PR? Support for converting DataType::Float16 to/from Substrait. Support for converting ScalarValue::Float16 to/from Substrait. ## Are these changes tested? Yes ## Are there any user-facing changes? Yes. The `SubstraitProducer` trait received a new method (`register_type`) which downstream implementors will need to provide an implementation for. The example custom producer has been updated with a default implementation. One public method that changed is [`datafusion_substrait::logical_plan::producer::from_empty_relation`](https://docs.rs/datafusion-substrait/50.2.0/datafusion_substrait/logical_plan/producer/fn.from_empty_relation.html). I'm not sure if that is meant to be part of the public API (for one thing, it is undocumented, though maybe this is because it serves an obvious purpose. It also returns a `Rel` which is a pretty internal structure). * fix(substrait): schema errors for Aggregates with no groupings (#17909) ## Which issue does this PR close? Closes https://github.com/apache/datafusion/issues/16590 ## Rationale for this change When consuming Substrait plans containing aggregates with no groupings, we would see the following error ``` Error: Substrait("Named schema must contain names for all fields") ``` The Substrait plan had one _less_ field than DataFusion expected because DataFusion was adding an extra "__grouping_id" to the output of the Aggregate node. This happens when the https://github.com/apache/datafusion/blob/daeb6597a0c7344735460bb2dce13879fd89d7bd/datafusion/expr/src/logical_plan/plan.rs#L3418 condition is true. A natural followup question to this is "Why are we creating an Aggregate with a single empty GroupingSet for the group by, instead of just leaving the group by entirely?". ## What changes are included in this PR? Instead of setting group_exprs to a vector with a single empty grouping set, let's just leave group_exprs empty entirely. This means that the `is_grouping_set` is not triggered, so the Datafusion schema matches the Substrait schema. ## Are these changes tested? Yes I have added direct tests via example Substrait plans ## Are there any user-facing changes? Substrait plans that were not consumable before are now consumable. --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> * Improve datafusion-cli object store profiling summary display (#18085) ## Which issue does this PR close? - part of https://github.com/apache/datafusion/issues/17207 ## Rationale for this change As suggested by @BlakeOrth in https://github.com/apache/datafusion/pull/18045#issuecomment-3403692516 here is an attempt to improve the output of datafusion object store trace profiling: ## What changes are included in this PR? Update the output format when `\object_store_profiling summary` is set Current format (on main, before this PR): ```sql Summaries: Get count: 2 duration min: 0.024603s duration max: 0.031946s duration avg: 0.028274s size min: 8 B size max: 34322 B size avg: 17165 B size sum: 34330 B ``` New format (after this PR): ```sql DataFusion CLI v50.2.0 > \object_store_profiling summary ObjectStore Profile mode set to Summary > select count(*) from 'https://datasets.clickhouse.com/hits_compatible/athena_partitioned/hits_1.parquet'; +----------+ | count(*) | +----------+ | 1000000 | +----------+ 1 row(s) fetched. Elapsed 6.754 seconds. Object Store Profiling Instrumented Object Store: instrument_mode: Summary, inner: HttpStore Summaries: +-----------+----------+-----------+-----------+-----------+-----------+-------+ | Operation | Metric | min | max | avg | sum | count | +-----------+----------+-----------+-----------+-----------+-----------+-------+ | Get | duration | 0.031645s | 0.047780s | 0.039713s | 0.079425s | 2 | | Get | size | 8 B | 34322 B | 17165 B | 34330 B | 2 | +-----------+----------+-----------+-----------+-----------+-----------+-------+ ``` ## Are these changes tested? Yes ## Are there any user-facing changes? Nicer datafusion-cli output * test: `to_timestamp(double)` for vectorized input (#18147) ## Which issue does this PR close?  - Closes #16678. ## Rationale for this change  The issue has been fixed in #16639, this PR just adds a testcase for it. ## What changes are included in this PR?  Add a test case for `to_timestamp(double)` with vectorized input. Similar to the one presented in the issue. ## Are these changes tested?  Yes ## Are there any user-facing changes?   No * Fix `concat_elements_utf8view` capacity initialization. (#18003) ## Which issue does this PR close? - Relates to #17857 (See https://github.com/apache/datafusion/issues/17857#issuecomment-3368519097) ## Rationale for this change The capacity calculation replaced with `left.len()` (assuming `left.len()` and `right.len()` are the same). As the `with_capacity` refers to the length of the views (or strings), not to the length of the bytes ## Are these changes tested? The function is already covered by tests. ## Are there any user-facing changes? No * Use < instead of = in case benchmark predicates, use Integers (#18144) ## Which issue does this PR close? - Followup to #18097 ## Rationale for this change The last benchmark was incorrectly essentially indentical to the second to last one. The actual predicate was using `=` instead of `<`. ## What changes are included in this PR? - Adjust the operator in the case predicates to `<` - Adds two additional benchmarks covering `case x when ...` ## Are these changes tested? Verified with debugger. ## Are there …

@alamb

## Which issue does this PR close? This does not fully close, but is an incremental building block component for: - apache#17207 The full context of how this code is likely to progress can be seen in the POC for this effort: - apache#17266 ## Rationale for this change Continued progress filling out the methods that are instrumented for the instrumented object store. ## What changes are included in this PR? - Adds instrumentation around basic list operations into the instrumented object store - Adds test cases for new code ## Are these changes tested? Yes. Example output: ```sql DataFusion CLI v50.2.0 > \object_store_profiling trace ObjectStore Profile mode set to Trace > CREATE EXTERNAL TABLE nyc_taxi_rides STORED AS PARQUET LOCATION 's3://altinity-clickhouse-data/nyc_taxi_rides/data/tripdata_parquet'; 0 row(s) fetched. Elapsed 2.679 seconds. Object Store Profiling Instrumented Object Store: instrument_mode: Trace, inner: AmazonS3(altinity-clickhouse-data) 2025-10-16T18:53:09.512970085+00:00 operation=List path=nyc_taxi_rides/data/tripdata_parquet Summaries: List count: 1 Instrumented Object Store: instrument_mode: Trace, inner: AmazonS3(altinity-clickhouse-data) 2025-10-16T18:53:09.929709943+00:00 operation=List path=nyc_taxi_rides/data/tripdata_parquet 2025-10-16T18:53:10.106757629+00:00 operation=List path=nyc_taxi_rides/data/tripdata_parquet 2025-10-16T18:53:10.220555058+00:00 operation=Get duration=0.230604s size=8 range: bytes=222192975-222192982 path=nyc_taxi_rides/data/tripdata_parquet/data-200901.parquet 2025-10-16T18:53:10.226399832+00:00 operation=Get duration=0.263826s size=8 range: bytes=233123927-233123934 path=nyc_taxi_rides/data/tripdata_parquet/data-201104.parquet 2025-10-16T18:53:10.226194195+00:00 operation=Get duration=0.269754s size=8 range: bytes=252843253-252843260 path=nyc_taxi_rides/data/tripdata_parquet/data-201103.parquet . . . 2025-10-16T18:53:11.928787014+00:00 operation=Get duration=0.072248s size=18278 range: bytes=201384109-201402386 path=nyc_taxi_rides/data/tripdata_parquet/data-201509.parquet 2025-10-16T18:53:11.933475464+00:00 operation=Get duration=0.068880s size=17175 range: bytes=195411804-195428978 path=nyc_taxi_rides/data/tripdata_parquet/data-201601.parquet 2025-10-16T18:53:11.949629591+00:00 operation=Get duration=0.065645s size=19872 range: bytes=214807880-214827751 path=nyc_taxi_rides/data/tripdata_parquet/data-201603.parquet Summaries: List count: 2 Get count: 288 duration min: 0.060930s duration max: 0.444601s duration avg: 0.133339s size min: 8 B size max: 44247 B size avg: 18870 B size sum: 5434702 B > ``` ## Are there any user-facing changes? No-ish ## cc @alamb

@alamb

## Which issue does this PR close? This does not fully close, but is an incremental building block component for: - apache#17207 The full context of how this code is likely to progress can be seen in the POC for this effort: - apache#17266 ## Rationale for this change Continued progress filling out methods that are instrumented by the instrumented object store ## What changes are included in this PR? - Adds instrumentation around delimited list operations into the instrumented object store - Adds test cases for the new code ## Are these changes tested? Yes, unit tests have been added. Example output: ```sql DataFusion CLI v50.2.0 > CREATE EXTERNAL TABLE overture_partitioned STORED AS PARQUET LOCATION 's3://overturemaps-us-west-2/release/2025-09-24.0/theme=addresses/'; 0 row(s) fetched. Elapsed 2.307 seconds. > \object_store_profiling trace ObjectStore Profile mode set to Trace > select count(*) from overture_partitioned; +-----------+ | count(*) | +-----------+ | 446544475 | +-----------+ 1 row(s) fetched. Elapsed 1.932 seconds. Object Store Profiling Instrumented Object Store: instrument_mode: Trace, inner: AmazonS3(overturemaps-us-west-2) 2025-10-17T17:05:27.922724180+00:00 operation=List duration=0.132154s path=release/2025-09-24.0/theme=addresses 2025-10-17T17:05:28.054894440+00:00 operation=List duration=0.049048s path=release/2025-09-24.0/theme=addresses/type=address 2025-10-17T17:05:28.104233937+00:00 operation=Get duration=0.053522s size=8 range: bytes=1070778162-1070778169 path=release/2025-09-24.0/theme=addresses/type=address/part-00000-52872134-68de-44a6-822d-15fa29a0f606-c000.zstd.parquet 2025-10-17T17:05:28.106862343+00:00 operation=Get duration=0.108103s size=8 range: bytes=1017940335-1017940342 path=release/2025-09-24.0/theme=addresses/type=address/part-00003-52872134-68de-44a6-822d-15fa29a0f606-c000.zstd.parquet ... 2025-10-17T17:05:28.589084204+00:00 operation=Get duration=0.084737s size=836971 range: bytes=1112791717-1113628687 path=release/2025-09-24.0/theme=addresses/type=address/part-00009-52872134-68de-44a6-822d-15fa29a0f606-c000.zstd.parquet Summaries: List count: 2 duration min: 0.049048s duration max: 0.132154s duration avg: 0.090601s Get count: 33 duration min: 0.045500s duration max: 0.162114s duration avg: 0.089775s size min: 8 B size max: 917946 B size avg: 336000 B size sum: 11088026 B > ``` Note that a `LIST` report showing a duration must be a `list_with_delimiter()` call because a standard `list` call does not currently report a duration. ## Are there any user-facing changes? No-ish cc @alamb

@alamb

## Which issue does this PR close? This does not fully close, but is an incremental building block component for: - apache#17207 The full context of how this code is likely to progress can be seen in the POC for this effort: - apache#17266 ## Rationale for this change Further fills out the missing methods that have yet to be instrumented in the instrumented object store. ## What changes are included in this PR? - Adds instrumentation around put_opts - Adds instrumentation around put_multipart - Adds tests for newly instrumented methods ## Are these changes tested? Yes. Unit tests have been added for the new methods Example output: ```sql DataFusion CLI v50.2.0 > CREATE EXTERNAL TABLE test(a bigint, b bigint) STORED AS parquet LOCATION '../../test_table/'; 0 row(s) fetched. Elapsed 0.003 seconds. > \object_store_profiling trace ObjectStore Profile mode set to Trace > INSERT INTO test values (1, 2), (3, 4); +-------+ | count | +-------+ | 2 | +-------+ 1 row(s) fetched. Elapsed 0.007 seconds. Object Store Profiling Instrumented Object Store: instrument_mode: Trace, inner: LocalFileSystem(file:///) 2025-10-17T19:02:15.440246215+00:00 operation=List path=home/blake/open_source_src/datafusion-BlakeOrth/test_table 2025-10-17T19:02:15.444096012+00:00 operation=Put duration=0.000249s size=815 path=home/blake/open_source_src/datafusion-BlakeOrth/test_table/a9pjKBxSOtXZobJO_0.parquet Summaries: List count: 1 Put count: 1 duration min: 0.000249s duration max: 0.000249s duration avg: 0.000249s size min: 815 B size max: 815 B size avg: 815 B size sum: 815 B > ``` (note: I have no idea how to exercise/show a multi-part put operation, or if DataFusion even utilizes multipart puts for large files) ## Are there any user-facing changes? No-ish cc @alamb --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

@alamb

## Which issue does this PR close? This does not fully close, but is an incremental building block component for: - apache#17207 The full context of how this code is likely to progress can be seen in the POC for this effort: - apache#17266 ## Rationale for this change Further fills out method instrumentation ## What changes are included in this PR? - Adds instrumentation to head requests in the instrumented object store - Adds instrumentatin to delete requests in the instrumented object store - Adds tests for new code ## Are these changes tested? Yes. New unit tests have been added. ## Are there any user-facing changes? No-ish ## cc @alamb

@alamb

## Which issue does this PR close? This does not fully close, but is an incremental building block component for: - apache#17207 The full context of how this code is likely to progress can be seen in the POC for this effort: - apache#17266 ## Rationale for this change Continued progress filling out the methods that are instrumented for the instrumented object store. ## What changes are included in this PR? - Adds instrumentation around basic list operations into the instrumented object store - Adds test cases for new code ## Are these changes tested? Yes. Example output: ```sql DataFusion CLI v50.2.0 > \object_store_profiling trace ObjectStore Profile mode set to Trace > CREATE EXTERNAL TABLE nyc_taxi_rides STORED AS PARQUET LOCATION 's3://altinity-clickhouse-data/nyc_taxi_rides/data/tripdata_parquet'; 0 row(s) fetched. Elapsed 2.679 seconds. Object Store Profiling Instrumented Object Store: instrument_mode: Trace, inner: AmazonS3(altinity-clickhouse-data) 2025-10-16T18:53:09.512970085+00:00 operation=List path=nyc_taxi_rides/data/tripdata_parquet Summaries: List count: 1 Instrumented Object Store: instrument_mode: Trace, inner: AmazonS3(altinity-clickhouse-data) 2025-10-16T18:53:09.929709943+00:00 operation=List path=nyc_taxi_rides/data/tripdata_parquet 2025-10-16T18:53:10.106757629+00:00 operation=List path=nyc_taxi_rides/data/tripdata_parquet 2025-10-16T18:53:10.220555058+00:00 operation=Get duration=0.230604s size=8 range: bytes=222192975-222192982 path=nyc_taxi_rides/data/tripdata_parquet/data-200901.parquet 2025-10-16T18:53:10.226399832+00:00 operation=Get duration=0.263826s size=8 range: bytes=233123927-233123934 path=nyc_taxi_rides/data/tripdata_parquet/data-201104.parquet 2025-10-16T18:53:10.226194195+00:00 operation=Get duration=0.269754s size=8 range: bytes=252843253-252843260 path=nyc_taxi_rides/data/tripdata_parquet/data-201103.parquet . . . 2025-10-16T18:53:11.928787014+00:00 operation=Get duration=0.072248s size=18278 range: bytes=201384109-201402386 path=nyc_taxi_rides/data/tripdata_parquet/data-201509.parquet 2025-10-16T18:53:11.933475464+00:00 operation=Get duration=0.068880s size=17175 range: bytes=195411804-195428978 path=nyc_taxi_rides/data/tripdata_parquet/data-201601.parquet 2025-10-16T18:53:11.949629591+00:00 operation=Get duration=0.065645s size=19872 range: bytes=214807880-214827751 path=nyc_taxi_rides/data/tripdata_parquet/data-201603.parquet Summaries: List count: 2 Get count: 288 duration min: 0.060930s duration max: 0.444601s duration avg: 0.133339s size min: 8 B size max: 44247 B size avg: 18870 B size sum: 5434702 B > ``` ## Are there any user-facing changes? No-ish ## cc @alamb

@alamb

## Which issue does this PR close? This does not fully close, but is an incremental building block component for: - apache#17207 The full context of how this code is likely to progress can be seen in the POC for this effort: - apache#17266 ## Rationale for this change Continued progress filling out methods that are instrumented by the instrumented object store ## What changes are included in this PR? - Adds instrumentation around delimited list operations into the instrumented object store - Adds test cases for the new code ## Are these changes tested? Yes, unit tests have been added. Example output: ```sql DataFusion CLI v50.2.0 > CREATE EXTERNAL TABLE overture_partitioned STORED AS PARQUET LOCATION 's3://overturemaps-us-west-2/release/2025-09-24.0/theme=addresses/'; 0 row(s) fetched. Elapsed 2.307 seconds. > \object_store_profiling trace ObjectStore Profile mode set to Trace > select count(*) from overture_partitioned; +-----------+ | count(*) | +-----------+ | 446544475 | +-----------+ 1 row(s) fetched. Elapsed 1.932 seconds. Object Store Profiling Instrumented Object Store: instrument_mode: Trace, inner: AmazonS3(overturemaps-us-west-2) 2025-10-17T17:05:27.922724180+00:00 operation=List duration=0.132154s path=release/2025-09-24.0/theme=addresses 2025-10-17T17:05:28.054894440+00:00 operation=List duration=0.049048s path=release/2025-09-24.0/theme=addresses/type=address 2025-10-17T17:05:28.104233937+00:00 operation=Get duration=0.053522s size=8 range: bytes=1070778162-1070778169 path=release/2025-09-24.0/theme=addresses/type=address/part-00000-52872134-68de-44a6-822d-15fa29a0f606-c000.zstd.parquet 2025-10-17T17:05:28.106862343+00:00 operation=Get duration=0.108103s size=8 range: bytes=1017940335-1017940342 path=release/2025-09-24.0/theme=addresses/type=address/part-00003-52872134-68de-44a6-822d-15fa29a0f606-c000.zstd.parquet ... 2025-10-17T17:05:28.589084204+00:00 operation=Get duration=0.084737s size=836971 range: bytes=1112791717-1113628687 path=release/2025-09-24.0/theme=addresses/type=address/part-00009-52872134-68de-44a6-822d-15fa29a0f606-c000.zstd.parquet Summaries: List count: 2 duration min: 0.049048s duration max: 0.132154s duration avg: 0.090601s Get count: 33 duration min: 0.045500s duration max: 0.162114s duration avg: 0.089775s size min: 8 B size max: 917946 B size avg: 336000 B size sum: 11088026 B > ``` Note that a `LIST` report showing a duration must be a `list_with_delimiter()` call because a standard `list` call does not currently report a duration. ## Are there any user-facing changes? No-ish cc @alamb

@alamb

## Which issue does this PR close? This does not fully close, but is an incremental building block component for: - apache#17207 The full context of how this code is likely to progress can be seen in the POC for this effort: - apache#17266 ## Rationale for this change Further fills out the missing methods that have yet to be instrumented in the instrumented object store. ## What changes are included in this PR? - Adds instrumentation around put_opts - Adds instrumentation around put_multipart - Adds tests for newly instrumented methods ## Are these changes tested? Yes. Unit tests have been added for the new methods Example output: ```sql DataFusion CLI v50.2.0 > CREATE EXTERNAL TABLE test(a bigint, b bigint) STORED AS parquet LOCATION '../../test_table/'; 0 row(s) fetched. Elapsed 0.003 seconds. > \object_store_profiling trace ObjectStore Profile mode set to Trace > INSERT INTO test values (1, 2), (3, 4); +-------+ | count | +-------+ | 2 | +-------+ 1 row(s) fetched. Elapsed 0.007 seconds. Object Store Profiling Instrumented Object Store: instrument_mode: Trace, inner: LocalFileSystem(file:///) 2025-10-17T19:02:15.440246215+00:00 operation=List path=home/blake/open_source_src/datafusion-BlakeOrth/test_table 2025-10-17T19:02:15.444096012+00:00 operation=Put duration=0.000249s size=815 path=home/blake/open_source_src/datafusion-BlakeOrth/test_table/a9pjKBxSOtXZobJO_0.parquet Summaries: List count: 1 Put count: 1 duration min: 0.000249s duration max: 0.000249s duration avg: 0.000249s size min: 815 B size max: 815 B size avg: 815 B size sum: 815 B > ``` (note: I have no idea how to exercise/show a multi-part put operation, or if DataFusion even utilizes multipart puts for large files) ## Are there any user-facing changes? No-ish cc @alamb --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

@alamb

## Which issue does this PR close? This does not fully close, but is an incremental building block component for: - apache#17207 The full context of how this code is likely to progress can be seen in the POC for this effort: - apache#17266 ## Rationale for this change Further fills out method instrumentation ## What changes are included in this PR? - Adds instrumentation to head requests in the instrumented object store - Adds instrumentatin to delete requests in the instrumented object store - Adds tests for new code ## Are these changes tested? Yes. New unit tests have been added. ## Are there any user-facing changes? No-ish ## cc @alamb

POC: datafusion-cli instrumented object store

f40b923

- A WIP/POC of instrumenting the object store backing datafusion-cli operations

BlakeOrth mentioned this pull request Aug 20, 2025

[datafusion-cli] Add a way to see what object store requests are made #17207

Closed

BlakeOrth marked this pull request as draft August 20, 2025 22:20

alamb mentioned this pull request Aug 21, 2025

Consolidate Parquet Metadata handling into its own module and struct DFParquetMetadata #17127

Merged

Adds basic summary support and output

d4ad285

Adds CLI/Command options and cleanup

14fa0c8

- Introduces object_store_profiling as a command and cli argument, with options of disabled, summary, and trace - Cleans up hacked in println output in favor of carrying the necessary objects in print_options and using the expected Writer for output

BlakeOrth commented Aug 28, 2025

View reviewed changes

Merge remote-tracking branch 'apache/main' into feature/instrument_ob…

b4add9c

…ject_store

alamb reviewed Sep 6, 2025

View reviewed changes

alamb mentioned this pull request Sep 8, 2025

Enable the ListFilesCache to be available for partitioned tables #17211

Closed

Fixes proiling command not being recognized

d193ee9

- Fixes an issue where the first profiling command would not be recognized if the CLI was not started with profiling enabled

Merge branch 'apache:main' into feature/instrument_object_store

7addf0a

BlakeOrth closed this Oct 2, 2025

BlakeOrth mentioned this pull request Oct 16, 2025

Adds instrumentation to LIST operations in CLI #18103

Merged

BlakeOrth mentioned this pull request Oct 17, 2025

Adds instrumentation to delimited LIST operations in CLI #18134

Merged

BlakeOrth mentioned this pull request Oct 17, 2025

Adds instrumentation to PUT ops in the CLI #18139

Merged

BlakeOrth mentioned this pull request Oct 21, 2025

Adds DELETE and HEAD instrumentation to CLI #18206

Merged

POC: datafusion-cli instrumented object store #17266

POC: datafusion-cli instrumented object store #17266

Uh oh!

Conversation

BlakeOrth commented Aug 20, 2025

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Uh oh!

alamb commented Aug 22, 2025

Uh oh!

BlakeOrth commented Aug 22, 2025

Uh oh!

alamb commented Aug 22, 2025

Uh oh!

alamb commented Aug 22, 2025

Uh oh!

BlakeOrth commented Aug 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alamb commented Aug 23, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

BlakeOrth commented Aug 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alamb commented Sep 5, 2025

Uh oh!

alamb left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alamb commented Sep 6, 2025

Uh oh!

nuno-faria commented Sep 7, 2025

Uh oh!

BlakeOrth commented Sep 8, 2025

Uh oh!

BlakeOrth commented Sep 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

BlakeOrth commented Sep 9, 2025

Uh oh!

alamb commented Sep 9, 2025

Uh oh!

BlakeOrth commented Sep 9, 2025

Uh oh!

alamb commented Sep 12, 2025

Uh oh!

BlakeOrth commented Oct 2, 2025

Uh oh!

BlakeOrth commented Oct 2, 2025

Uh oh!

alamb commented Oct 2, 2025

Uh oh!

alamb commented Oct 2, 2025

Uh oh!

BlakeOrth commented Oct 2, 2025

Uh oh!

alamb commented Oct 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

BlakeOrth commented Aug 22, 2025 •

edited

Loading

BlakeOrth commented Aug 28, 2025 •

edited

Loading

BlakeOrth commented Sep 8, 2025 •

edited

Loading