Skip to content
This repository has been archived by the owner on Sep 20, 2023. It is now read-only.

Return the Query ID after execute query in Athena (in 2020) #105

Closed
isabelleoliveira3000 opened this issue May 5, 2020 · 6 comments
Closed

Comments

@isabelleoliveira3000
Copy link

There is this already closed issue #9, created and solved in 2017.

It is about get query ID from cursor, and at that time there was a property cursor.query_id that solved this issue. But this property was removed from the PyAthenaJDBC code in 2018 (in this commit 8d9b5de#diff-d6a2ffe1ed91aa544f98e96db3d239ca)

So, my issue is pretty the same as his :) There is a way to get the query ID in cursor?

  • Purpose: Get the query ID to search for the file in S3, download that file and add data to our DB.
  • Current situation: We have no way to retrieve the query ID after execute the query. We need to search for the latest file in S3 and download that file.
  • Proposed approach: Have a function, maybe called getQueryID in cursor to return the executed query ID.
@laughingman7743
Copy link
Owner

The Athena JDBC driver 2.0.9 allows you to retrieve the QueryID.
https://github.com/laughingman7743/PyAthenaJDBC/blob/master/jdbc/release-notes.txt#L21-L25

You can retrieve the QueryID in the following way:

import jpype

from pyathenajdbc import connect

conn = connect(s3_staging_dir='s3://YOUR_BUCKET/path/to/',
               region_name='us-west-2')
try:
    with conn.cursor() as cursor:
        cursor.execute(
            """
            SELECT * FROM YOUR_TABLE
            """)
        print(cursor._statement.unwrap(jpype.JClass(
            "com.interfaces.core.IStatementQueryInfoProvider")).getQueryId())
finally:
    conn.close()

Please refer to the following document on how to use the JDBC driver.
https://athena-downloads.s3.amazonaws.com/drivers/JDBC/athena-preview/SimbaAthenaJDBC_2.0.11_preview/docs/Simba+Athena+JDBC+Driver+Install+and+Configuration+Guide.pdf
There is also a description of how to retrieve the QueryID.

@laughingman7743
Copy link
Owner

I'd like to see you use the Boto3 version if possible.
https://github.com/laughingman7743/PyAthena

@isabelleoliveira3000
Copy link
Author

Thank you for answering me, @laughingman7743 ! :)

I read the links you recommended and some jpype docs and examples, but I am still struggling with java class unwrapped...

My code is something like you said:

import jpype
import pyathenajdbc
...
self._client = pyathenajdbc.connect(
                s3_staging_dir=self.S3_DIR,
                region_name=region_name,
                access_key=aws_access_key,
                secret_key=aws_secret_key
            )
...
if not jpype.isJVMStarted():
        jpype.startJVM(jpype.getDefaultJVMPath())

cursor = self._client.cursor()
cursor.execute("my query", params)
print(cursor._statement.unwrap(jpype.JClass(
        "com.interfaces.core.IStatementQueryInfoProvider")).getQueryId())

if jpype.isJVMStarted():
        jpype.shutdownJVM()

But I got this error, like it is not found this IStatementQueryInfoProvider class:

Traceback (most recent call last):
  File "filenameeeeee", line 41, in query
   print(cursor._statement.unwrap(jpype.JClass("com.interfaces.core.IStatementQueryInfoProvider")).getQueryId())
  File "/opt/anaconda3/envs/pier-data-airflow/lib/python3.7/site-packages/jpype/_jclass.py", line 130, in __new__
    return _JClassNew(args[0], **kwargs)
  File "/opt/anaconda3/envs/pier-data-airflow/lib/python3.7/site-packages/jpype/_jclass.py", line 228, in _JClassNew
    javaClass = _jpype.PyJPClass(arg)
jpype._jclass.NoClassDefFoundError: java.lang.NoClassDefFoundError: com/interfaces/core/IStatementQueryInfoProvider

Can you help me again, please?

@isabelleoliveira3000
Copy link
Author

Both my Python environment and my Java is 64 bit.
I've read that different versions could be a problem, but this is not my case :x

@laughingman7743
Copy link
Owner

I think you have an older version of the JDBC driver.
https://github.com/laughingman7743/PyAthenaJDBC/releases/tag/v2.0.6

You can update with pip install -U PyathenaJDBC, or download Jar file of JDBC driver and specify it in driver_path argument.
https://s3.amazonaws.com/athena-downloads/drivers/JDBC/SimbaAthenaJDBC_2.0.9/AthenaJDBC42_2.0.9.jar

conn = connect(s3_staging_dir='s3://YOUR_BUCKET/path/to/',
               region_name='us-west-2',
               driver_path='/path/to/AthenaJDBC42_2.0.9.jar')

@isabelleoliveira3000
Copy link
Author

Thank you again, @laughingman7743 !

We are studying a refactoring to change from PyAthena to Boto3 at my work :)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants