Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HiveServer2Connection.cursor has always user=None when called from ImpalaExecutionContext #473

Open
marqueewinq opened this issue Sep 20, 2021 · 3 comments

Comments

@marqueewinq
Copy link

marqueewinq commented Sep 20, 2021

Steps to reproduce:

  1. Create impala + LDAP stack with docker-compose
version: "3.5"

services:
  impala:
    image: codingtony/impala
    command: /start-bash.sh && /bin/true
    stdin_open: true
    tty: true
    privileged: true
    ports:
      - "9000:9000" 
      - "50010:50010" 
      - "50020:50020" 
      - "50070:50070" 
      - "50075:50075" 
      - "21000:21000" 
      - "21050:21050" 
      - "25000:25000" 
      - "25010:25010" 
      - "25020:25020"
    volumes:
      - ./etc.default.impala:/etc/default/impala
    depends_on:
      - ldap
  ldap:
    image: osixia/openldap:1.5.0
    privileged: true
    ports:
      - "389:389" 
      - "636:636" 
  ldap_admin:
    # access on 6443
    # echo "Login DN: cn=admin,dc=example,dc=org"
    # echo "Password: admin"
    image: osixia/phpldapadmin:0.9.0
    ports:
      - 6443:443
    environment:
      PHPLDAPADMIN_LDAP_HOSTS: ldap 
    depends_on:
      - ldap
  1. Create the following script:
import logging

logging.basicConfig(level=logging.DEBUG)
from impala.dbapi import connect
from sqlalchemy import create_engine, inspect

host = "localhost"
port = "21050"
username = "admin"
password = "admin"
database = "default"
use_ssl = False
# auth_mechanism = "NOSASL" # or "LDAP"
auth_mechanism = "LDAP"

engine = create_engine(
    "impala://",
    connect_args={},
    creator=lambda: connect(
        host=host,
        port=port,
        database=database,
        timeout=5,
        user=username,
        password=password,
        use_ssl=use_ssl,
        auth_mechanism=auth_mechanism,
    ),
)
connection = engine.connect().execution_options(user=username)
inspector = inspect(connection)
table_names = inspector.get_table_names()
print(table_names)
  1. Execute the script with python3 connect.py 2> >(grep 'req=TOpenSessionReq')

Expected output:

DEBUG:impala.hiveserver2:OpenSession: req=TOpenSessionReq(client_protocol=5, username='admin', password=None, configuration={})
[]

Actual output (notice the user name):

DEBUG:impala.hiveserver2:OpenSession: req=TOpenSessionReq(client_protocol=5, username='marqueewinq', password=None, configuration={})
[]
@marqueewinq
Copy link
Author

With little pdb-ing i found that the cursor method of HiveServer2Connection does not receive the user argument from ImpalaExecutionContext.

I don't see the way to pass the user name from the script to the cursor method of HiveServer2Connection.

I'm not sure what would be the correct solution here; maybe read the user configuration from execution_options (configuration arg in cursor method)

@marqueewinq
Copy link
Author

Monkey patch helps:

# connect.py
import logging

logging.basicConfig(level=logging.DEBUG)
from impala.dbapi import connect
from sqlalchemy import create_engine, inspect

from impala.sqlalchemy import ImpalaExecutionContext


def my_create_cursor(self):
    self._is_server_side = False
    cursor_configuration = self.execution_options.get("cursor_configuration", {})
    username = self.execution_options.get("user", None)
    return self._dbapi_connection.cursor(
        user=username, configuration=cursor_configuration
    )


ImpalaExecutionContext.create_cursor = my_create_cursor

host = "localhost"
port = "21050"
username = "admin"
password = "admin"
database = "default"
use_ssl = False
# auth_mechanism = "NOSASL" # or "LDAP"
auth_mechanism = "LDAP"

engine = create_engine(
    "impala://",
    connect_args={},
    creator=lambda: connect(
        host=host,
        port=port,
        database=database,
        timeout=5,
        user=username,
        password=password,
        use_ssl=use_ssl,
        auth_mechanism=auth_mechanism,
    ),
)
connection = engine.connect().execution_options(user=username)
inspector = inspect(connection)
table_names = inspector.get_table_names()
print(table_names)

Output:

$ python3 connect.py 2> >(grep 'req=TOpenSessionReq')
DEBUG:impala.hiveserver2:OpenSession: req=TOpenSessionReq(client_protocol=5, username='admin', password=None, configuration={})
[]

@marqueewinq
Copy link
Author

I would happily create a PR with tests, but i need an advice from maintainers

  • how exactly provide the system user for hive2 queries
  • which tests would be acceptable in this case?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant