-
Notifications
You must be signed in to change notification settings - Fork 31
Clarification regarding missing query metrics #84
Comments
Q1: Q2: Here's a comparison of PyAthena's Cursor and PandasCursor: I'm interested in the performance of the JDBC version. |
The benchmark results have been added to the following branches: |
I have ported my results into your benchmark script, here are the results in case you want to append the case. Regarding the data, it is stored in ORC format compressed with snappy and it's partitioned by day. And again, thanks for the awesome work.
|
Hello aughingman7743,
First of all, awesome work you have here! Your implementation is quite handy.
I was looking for upgrading the version from 1 to 2 due to the new driver released by AWS and I noticed that the queries' execution information was lost (e.g. data_scanned_in_bytes).
I was looking into the code to try to hack this information since it is relevant to me, although without any success. Also, I tried to look for the query id to fetch the execution properties offline as work around.
Q1: Therefore, my question is if this is a limitation from the Driver itself ?
Your help is highly appreciated!
Moreover, I noticed that your PyAthena implementation has the information I am looking for since you use the athena api. Do you have any benchmark or concern regarding the utilization on the latter vs the former (PyAthenaJDBC).
Up to know I ran a simple test "select * from mytable limit 1000" and consistently, the PyAthenaJDBC has better performance over PyAthena with the simple cursor or pandas cursor.
Q2: Did you find the same results?
Cheers!
Thanks in advance
The text was updated successfully, but these errors were encountered: