Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Connection pooling issue Unix #774

Closed
JelleVE opened this issue Jun 3, 2020 · 16 comments
Closed

Connection pooling issue Unix #774

JelleVE opened this issue Jun 3, 2020 · 16 comments

Comments

@JelleVE
Copy link

JelleVE commented Jun 3, 2020

Environment

  • Python: 3.6.8
  • pyodbc: 4.0.30
  • unixodbc: 2.3.7
  • OS: Ubuntu 18.04
  • DB: SQL Server Azure
  • driver: ODBC Driver 17 for SQL Server

Issue

I am refactoring some old code to handle SQL connection in a more consistent manner. Now, I have read about connection pooling and I wanted to check whether this was supported. In order to check this, I have run the following code:

import pyodbc

def runQuery(con, query, parameters=None):
    result = list()

    if parameters is None:
        parameters = ()

    with con.cursor() as cur:
        cur.execute(query, parameters)

        column_names = [col[0] for col in cur.description]
        result = [
            dict(zip(column_names, row))
            for row in cur.fetchall()
        ]

    return result

conn1 = pyodbc.connect('DRIVER={ODBC Driver 17 for SQL Server};'
                            f'SERVER={server};'
                            f'DATABASE={database};'
                            f'UID={username};'
                            f'PWD={password}')

with conn1:
    result1 = runQuery(conn1, 'SELECT @@SPID')
    print(result1)


conn2 = pyodbc.connect('DRIVER={ODBC Driver 17 for SQL Server};'
                            f'SERVER={server};'
                            f'DATABASE={database};'
                            f'UID={username};'
                            f'PWD={password}')

with conn2:
    result2 = runQuery(conn2, 'SELECT @@SPID')
    print(result2)

which results in

[{'': 76}]
[{'': 79}]

Is this the correct way of testing whether connection pooling is being used? If so, does this in fact indicate that this is not the case here?

My odbcinst.ini file contains the following:

[ODBC]
Pooling=Yes

[ODBC Driver 17 for SQL Server]
Description=Microsoft ODBC Driver 17 for SQL Server
Driver=/opt/microsoft/msodbcsql17/lib64/libmsodbcsql-17.4.so.1.1
UsageCount=1

Any help would be greatly appreciated. Thanks!

@gordthompson
Copy link
Collaborator

Related: #48

@keitherskine
Copy link
Collaborator

Checking the SPIDs is as good a way as any. As @gordthompson hints at though, using a context manager does not close the connection when it exits (contrary to popular belief). Try positively closing the connection after the context manager, e.g.:

with conn1:
    result1 = runQuery(conn1, 'SELECT @@SPID')
    print(result1)
conn1.close()

and then see if the SPIDs are the same. It's possible they might still be different of course, pooling is after all more of a suggestion than a mandate.

FYI, just in case you hadn't seen it, pooling can also be controlled using the pooling pyodbc attribute, but only before any connections have been made.

@JelleVE
Copy link
Author

JelleVE commented Jun 3, 2020

When I explicitly close the first connection, the SPIDs are indeed identical. However, they seem to be identical regardless of whether pooling is enabled or not (through the pooling pyodbc attribute and/or the odbcinst.ini config file).

I realize that this might not be strictly within the scope of pyodbc issues, but is there a better way of testing whether this is working or not? The reason that I am asking is the following. I have run the above code in a loop and the timings for both cases (pooling enabled/disabled) seem very similar, whereas I would expect the pooling variant to be noticeably faster. (Of course, I have tested this over an internet connection, so the latency on my connection might very well be able to explain these results.)

@v-chojas
Copy link
Contributor

v-chojas commented Jun 3, 2020

You can open another connection (or several) to the server from a different machine, between the close and open from Python, which will force a non-pooled connection to not reuse the same SPID (because one of those other connections will have taken it.)

@JelleVE
Copy link
Author

JelleVE commented Jun 3, 2020

When running the example from my original post (but closing the connection explicitly) in a loop, and querying the DB for the active connections using this query:

SELECT
      s.login_name
    , s.host_name
    , s.program_name
    , s.host_process_id
    , COUNT(*) AS connection_count
FROM sys.dm_exec_sessions AS s
GROUP BY
      s.login_name
    , s.host_name
    , s.program_name
    , s.host_process_id;

I get this result:
image

Everything remains fairly constant, except for the single connection on row 6. This line is constantly being removed and being added again over consecutive executions of the query. Maybe my understanding of the query and/or the concept of connection pooling is completely wrong, but this doesn't seem like connection pooling to me?

@JelleVE
Copy link
Author

JelleVE commented Jun 3, 2020

I found out that the CPTimeout parameter was missing from my config. This changes the behavior: now the number of connections keeps growing (1 per iteration), presumably until the timeouts start to occur. Connections are seemingly not being reused.

Testing code:

def runQuery(conn, query, parameters=None):
    result = list()

    if parameters is None:
        parameters = ()

    with conn.cursor() as cur:
        cur.execute(query, parameters)

        column_names = [col[0] for col in cur.description]
        result = [
            dict(zip(column_names, row))
            for row in cur.fetchall()
        ]

    return result


for i in range(5000):
    print(i)
    conn = pyodbc.connect('DRIVER={ODBC Driver 17 for SQL Server};'
                                    f'SERVER={server};'
                                    f'DATABASE={database};'
                                    f'UID={username};'
                                    f'PWD={password}')

    result = runQuery(conn, 'SELECT @@SPID')
    conn.close()

image

Edit:
The timeout in fact throws an exception.

Traceback (most recent call last):
  File "temp.py", line 93, in <module>
pyodbc.OperationalError: ('HYT00', '[HYT00] [Microsoft][ODBC Driver 17 for SQL Server]Login timeout expired (0) (SQLDriverConnect)')
Error in sys.excepthook:
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/apport_python_hook.py", line 63, in apport_excepthook
  File "<frozen importlib._bootstrap>", line 971, in _find_and_load
  File "<frozen importlib._bootstrap>", line 955, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 665, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 674, in exec_module
  File "<frozen importlib._bootstrap_external>", line 780, in get_code
  File "<frozen importlib._bootstrap_external>", line 832, in get_data
OSError: [Errno 24] Too many open files: '/usr/lib/python3/dist-packages/apport/__init__.py'

Original exception was:
Traceback (most recent call last):
  File "temp.py", line 93, in <module>
pyodbc.OperationalError: ('HYT00', '[HYT00] [Microsoft][ODBC Driver 17 for SQL Server]Login timeout expired (0) (SQLDriverConnect)')

@christian-vorhemus
Copy link

christian-vorhemus commented Feb 2, 2022

The reason that I am asking is the following. I have run the above code in a loop and the timings for both cases (pooling enabled/disabled) seem very similar, whereas I would expect the pooling variant to be noticeably faster.

@JelleVE I tested the "connection speed" with the following code on my Windows 11 as well as my Ubuntu 20.04 with pyodbc 4.0.32 and I can confirm that on Windows, the difference between pyodbc.pooling = True and pyodbc.pooling = False is noticeably, namely 3.73 sec. to 20.52 sec. On Ubuntu with unixODBC 2.3.7 however, in both cases I get a duration of ~21 sec.

import pyodbc
import time

sql_server_url = "[...]"
admin_username = "[...]"
admin_password = "[...]"
database_name = "[...]"
enable_pooling = False

start = time.time()
pyodbc.pooling = enable_pooling
for i in range(0, 50):
    connection = pyodbc.connect(
        "DRIVER={ODBC Driver 17 for SQL Server};SERVER="
        + sql_server_url
        + ";PORT=1433;DATABASE="
        + database_name
        + ";UID="
        + admin_username
        + ";PWD="
        + admin_password,
        autocommit=True,
    )
    with connection.cursor() as cursor:
        cursor.execute(f"SELECT @@SPID")
        columns = [column[0] for column in cursor.description]
        result = [dict(zip(columns, row)) for row in cursor.fetchall()]
        print(result)
    connection.close()
end = time.time()
print(end - start)

What pyodbc is doing is setting the SQLSetEnvAttr accordingly, so it seems to be an issue with either unixODBC or the ODBC SQL Server driver on Linux rather than pyodbc.

@jikuja
Copy link

jikuja commented Oct 6, 2022

Unixodbc ignores pyodbc's pooling logic:

    if (bPooling)
    {
        if (!SQL_SUCCEEDED(SQLSetEnvAttr(SQL_NULL_HANDLE, SQL_ATTR_CONNECTION_POOLING, (SQLPOINTER)SQL_CP_ONE_PER_HENV, sizeof(int))))
        {
            PyErr_SetString(PyExc_RuntimeError, "Unable to set SQL_ATTR_CONNECTION_POOLING attribute.");
            return false;
        }
    }
    /*
     * we may do someting with these later
     */

    if ( !environment_handle && 
            ( attribute == SQL_ATTR_CONNECTION_POOLING || 
              attribute == SQL_ATTR_CP_MATCH ))
    {
        return SQL_SUCCESS;
    }

The only way to control pooling with unixodbc is to use odbcinst.ini:

  • Pooling : global
  • CPTimeout: driver
  • PoolMaxSize: global
  • PoolWaitTimeout: global

ref:


The other option to do pooling is to add pooling layer on python. e.g. asyncodbc


The behaviour of unixodbc pooling should be documented on pyodbc somehow.

@greatvovan
Copy link

@jikuja this issue seems to be fixed in unixodbc 2.3.12, which can be seen if you follow your unixODBC repo link again.

A bit of problem is that as of now Debian package repositories still have 2.3.11 as the latest version, and we need to install 2.3.12 manually. Below is a demonstration

check_pooling.py:

import os
import sys
import pyodbc

SERVER = os.environ['SERVER']
DATABASE = os.environ['DATABASE']
USERNAME = os.environ['USERNAME']
PASSWORD = os.environ['PASSWORD']
CONNECTION_STR = f'DRIVER={{ODBC Driver 18 for SQL Server}};SERVER={SERVER};DATABASE={DATABASE};UID={USERNAME};PWD={PASSWORD}'
QUERY = "select MAX(login_time) as login_time from sys.dm_exec_sessions where login_name = ? and program_name like 'python%'"

pyodbc.pooling = sys.argv[1].lower() == 'true'

for i in range(0, 5):
    connection = pyodbc.connect(CONNECTION_STR)
    cursor = connection.cursor()
    cursor.execute(QUERY, USERNAME)
    row = cursor.fetchone()
    print(row[0])
    connection.close()

Dockerfile

FROM mcr.microsoft.com/azure-functions/python:4-python3.10-core-tools

WORKDIR /root

# Download and install unixodbc 2.3.12
RUN curl -sO https://www.unixodbc.org/unixODBC-2.3.12.tar.gz
RUN gunzip unixODBC*.tar.gz
RUN tar xvf unixODBC*.tar
WORKDIR /root/unixODBC-2.3.12
RUN ./configure
RUN make
RUN make install

WORKDIR /root
RUN apt-get update
# Install dependencies of msodbcsql18 except unixodbc
RUN apt-get install -y libc6 libstdc++6 libkrb5-3 openssl debconf odbcinst
RUN curl -s https://packages.microsoft.com/keys/microsoft.asc | sudo tee /etc/apt/trusted.gpg.d/microsoft.asc
RUN curl -s https://packages.microsoft.com/config/debian/11/prod.list | sudo tee /etc/apt/sources.list.d/mssql-release.list
# Install msodbcsql18 ignoring unixodbc dependency
RUN apt-get download msodbcsql18
RUN ACCEPT_EULA=Y dpkg --ignore-depends=unixodbc -i msodbcsql18_18.3.2.1-1_amd64.deb

RUN pip install --upgrade pip --user
RUN pip install pyodbc

ENV PYTHONUNBUFFERED=TRUE
ADD check_pooling.py ./

(My Linux skills are quite sloppy, please let me know if there is more optimal way to install/update to 2.3.12.)

docker build -t pyodbc_cp .
docker run -e ... pyodbc_cp python check_pooling.py true
docker run -e ... pyodbc_cp python check_pooling.py false

You will see that connection timestamps will differ when changing the argument true/false. Also, if you are on a slow/remote connection, you will notice a connection delay with "false", indicating that pooling is disabled, whereas with "true" they are obtained instantly.

For comparison, on version 2.3.11, which is installed through the package manager,

FROM mcr.microsoft.com/azure-functions/python:4-python3.10-core-tools

RUN curl -s https://packages.microsoft.com/keys/microsoft.asc | sudo tee /etc/apt/trusted.gpg.d/microsoft.asc
RUN curl -s https://packages.microsoft.com/config/debian/11/prod.list | sudo tee /etc/apt/sources.list.d/mssql-release.list

RUN apt-get update
RUN ACCEPT_EULA=Y apt-get install -y msodbcsql18

# Uncomment the follwing lines to enable pooling:
#RUN mv /etc/odbcinst.ini /etc/odbcinst.old
#RUN echo "[ODBC]\nPooling=Yes\n" | cat - /etc/odbcinst.old > /etc/odbcinst.ini

RUN pip install --upgrade pip --user
RUN pip install pyodbc

ENV PYTHONUNBUFFERED=TRUE

WORKDIR /app
ADD check_pooling.py ./

you will see that the value of pyodbc.pooling is ignored and pooling unfortunately never works. unixodbc will however respect the setting in odbcinst.ini allowing you to enable pooling, however, in this case, again pyodbc.pooling will be disregarded and pooling will be always on.

With that said, it looks like the current issue may be closed, because with release of version 2.3.12 the behavior of pyodbc.pooling aligns with the documentation (@mkleehammer please validate).

@gordthompson
Copy link
Collaborator

Related unixODBC thread, including discussion about multiprocessing

lurcher/unixODBC#149

@greatvovan
Copy link

^ That's a useful thread for understanding how multiprocessing can interfere with pooling, but IMO it is different from the above issue where unixodbc essentially disobeyed pyodbc's instruction to use pooling.

@aAEeLll
Copy link

aAEeLll commented Apr 16, 2024

Hello @gordthompson thanks for your analysis.
We tried it and it works, however, we use pyodbc with Flask, and each request uses a different thread. As a result, the connection is not reused among different requests.
I also tested using your script, and the connections are not re-used:

import os
import sys
import pyodbc
import threading

SERVER = os.environ['SERVER']
DATABASE = os.environ['DATABASE']
USERNAME = os.environ['USERNAME']
PASSWORD = os.environ['PASSWORD']
CONNECTION_STR = f'DRIVER={{ODBC Driver 18 for SQL Server}};SERVER={SERVER};DATABASE={DATABASE};UID={USERNAME};PWD={PASSWORD}'
QUERY = "select MAX(login_time) as login_time from sys.dm_exec_sessions where login_name = ? and program_name like 'python%'"

pyodbc.pooling = sys.argv[1].lower() == 'true'


def main_func(iteration=-1):
    connection = pyodbc.connect(CONNECTION_STR)
    cursor = connection.cursor()
    cursor.execute(QUERY, USERNAME)
    row = cursor.fetchone()
    print(row[0], iteration)
    connection.close()


for i in range(0, 5):
    th = threading.Thread(
        target=main_func,
        kwargs={
            "iteration": i,
        },
    )
    th.start()

Do you know how can we make this work?

In pyodbc source code we have a comment saying:

threadsafety
The integer 1, indicating that threads may share the module but not
connections. Note that connections and cursors may be used by different
threads, just not at the same time.

However, we have not found a way of modifying it.
Thanks

@mkleehammer
Copy link
Owner

The threadsafety is a read-only flag required by the DBAPI to indicate pyodbc can be used on multiple threads. The driver needs to also be threadsafe too, but pyodbc itself is.

@greatvovan
Copy link

@aAEeLll if you wanted to tag me, I can respond.

You are creating 5 threads and launch them all simultaneously. What do you expect? As they all demand a connection at the same time, the driver will have to create a connection for each. It does not seem to me a problem with threading.

The connection can be reused from the pool if the previous consumer released it back to the pool.

I tried my test in a scenario close to yours. I used FastAPI, which creates threads, too, in case of synchronous methods. I don't think it will be much different from Flask.

import time
import pyodbc
import fastapi
import uvicorn
from threading import get_ident
from constants import CONNECTION_STR, QUERY, USERNAME

app = fastapi.FastAPI()


@app.get('/cp')
def cp():
    connection = pyodbc.connect(CONNECTION_STR)
    cursor = connection.cursor()
    cursor.execute(QUERY, USERNAME)
    row = cursor.fetchone()
    connection.close()
    time.sleep(3)
    return get_ident(), row[0]


if __name__ == '__main__':
    uvicorn.run(app, host='0.0.0.0')

Test:

 % curl 'http://localhost:8000/cp' &
[1] 23575
 % curl 'http://localhost:8000/cp' &
[2] 23580
 % curl 'http://localhost:8000/cp' &
[3] 23627
 % curl 'http://localhost:8000/cp' &
[4] 23634
 % [140467339183872,"2024-04-16T21:40:21.900000"]
[1]    done       curl 'http://localhost:8000/cp'
 % [140467347576576,"2024-04-16T21:40:22.360000"]
[2]    done       curl 'http://localhost:8000/cp'
 % [140467355969280,"2024-04-16T21:40:22.360000"]
[3]  - done       curl 'http://localhost:8000/cp'
 % [140467372054272,"2024-04-16T21:40:22.360000"]
[4]  + done       curl 'http://localhost:8000/cp'

I use a pause of 3 s to have time to issue new requests before the previous one ends, so that the framework has to create new threads to serve them (otherwise it will be one thread). I issue 4 request in parallel quite quickly with <Up arrow> and <Enter>.

Note the connection 22.360000, which was reused by threads 140467347576576, 140467355969280, and 140467372054272.

Timing is crucial, though, the repeated request after a release must be within quite a short timeframe, like around 0.5 s or less. Using my previous script posted in February I repeatedly get a new connection every time time if I insert a pause of 0.9 s or more. It looks like the connection timeout is very short in unixODBC.

@aAEeLll
Copy link

aAEeLll commented Apr 17, 2024

Thanks both for the clarification! 😃
Yes, I can confirm the problem was the connection timeout in unixODBC

This is the content of /etc/odbcinst.ini:

[ODBC Driver 18 for SQL Server]
Description=Microsoft ODBC Driver 18 for SQL Server
Driver=/opt/microsoft/msodbcsql18/lib64/libmsodbcsql-18.3.so.3.1
UsageCount=1
CPTimeout=120

It seems CPTimeout is ignored...
The file actually used was /usr/local/etc/odbcinst.ini (For the record, check the output of odbcinst -j).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants