-
Notifications
You must be signed in to change notification settings - Fork 562
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Connection pooling issue Unix #774
Comments
Related: #48 |
Checking the SPIDs is as good a way as any. As @gordthompson hints at though, using a context manager does not close the connection when it exits (contrary to popular belief). Try positively closing the connection after the context manager, e.g.:
and then see if the SPIDs are the same. It's possible they might still be different of course, pooling is after all more of a suggestion than a mandate. FYI, just in case you hadn't seen it, pooling can also be controlled using the pooling pyodbc attribute, but only before any connections have been made. |
When I explicitly close the first connection, the SPIDs are indeed identical. However, they seem to be identical regardless of whether pooling is enabled or not (through the pooling pyodbc attribute and/or the I realize that this might not be strictly within the scope of pyodbc issues, but is there a better way of testing whether this is working or not? The reason that I am asking is the following. I have run the above code in a loop and the timings for both cases (pooling enabled/disabled) seem very similar, whereas I would expect the pooling variant to be noticeably faster. (Of course, I have tested this over an internet connection, so the latency on my connection might very well be able to explain these results.) |
You can open another connection (or several) to the server from a different machine, between the close and open from Python, which will force a non-pooled connection to not reuse the same SPID (because one of those other connections will have taken it.) |
When running the example from my original post (but closing the connection explicitly) in a loop, and querying the DB for the active connections using this query:
Everything remains fairly constant, except for the single connection on row 6. This line is constantly being removed and being added again over consecutive executions of the query. Maybe my understanding of the query and/or the concept of connection pooling is completely wrong, but this doesn't seem like connection pooling to me? |
I found out that the Testing code:
Edit:
|
@JelleVE I tested the "connection speed" with the following code on my Windows 11 as well as my Ubuntu 20.04 with pyodbc 4.0.32 and I can confirm that on Windows, the difference between import pyodbc
import time
sql_server_url = "[...]"
admin_username = "[...]"
admin_password = "[...]"
database_name = "[...]"
enable_pooling = False
start = time.time()
pyodbc.pooling = enable_pooling
for i in range(0, 50):
connection = pyodbc.connect(
"DRIVER={ODBC Driver 17 for SQL Server};SERVER="
+ sql_server_url
+ ";PORT=1433;DATABASE="
+ database_name
+ ";UID="
+ admin_username
+ ";PWD="
+ admin_password,
autocommit=True,
)
with connection.cursor() as cursor:
cursor.execute(f"SELECT @@SPID")
columns = [column[0] for column in cursor.description]
result = [dict(zip(columns, row)) for row in cursor.fetchall()]
print(result)
connection.close()
end = time.time()
print(end - start) What pyodbc is doing is setting the SQLSetEnvAttr accordingly, so it seems to be an issue with either unixODBC or the ODBC SQL Server driver on Linux rather than pyodbc. |
Unixodbc ignores pyodbc's pooling logic: if (bPooling)
{
if (!SQL_SUCCEEDED(SQLSetEnvAttr(SQL_NULL_HANDLE, SQL_ATTR_CONNECTION_POOLING, (SQLPOINTER)SQL_CP_ONE_PER_HENV, sizeof(int))))
{
PyErr_SetString(PyExc_RuntimeError, "Unable to set SQL_ATTR_CONNECTION_POOLING attribute.");
return false;
}
} /*
* we may do someting with these later
*/
if ( !environment_handle &&
( attribute == SQL_ATTR_CONNECTION_POOLING ||
attribute == SQL_ATTR_CP_MATCH ))
{
return SQL_SUCCESS;
} The only way to control pooling with unixodbc is to use odbcinst.ini:
ref: The other option to do pooling is to add pooling layer on python. e.g. asyncodbc The behaviour of unixodbc pooling should be documented on pyodbc somehow. |
@jikuja this issue seems to be fixed in unixodbc 2.3.12, which can be seen if you follow your unixODBC repo link again. A bit of problem is that as of now Debian package repositories still have 2.3.11 as the latest version, and we need to install 2.3.12 manually. Below is a demonstration check_pooling.py: import os
import sys
import pyodbc
SERVER = os.environ['SERVER']
DATABASE = os.environ['DATABASE']
USERNAME = os.environ['USERNAME']
PASSWORD = os.environ['PASSWORD']
CONNECTION_STR = f'DRIVER={{ODBC Driver 18 for SQL Server}};SERVER={SERVER};DATABASE={DATABASE};UID={USERNAME};PWD={PASSWORD}'
QUERY = "select MAX(login_time) as login_time from sys.dm_exec_sessions where login_name = ? and program_name like 'python%'"
pyodbc.pooling = sys.argv[1].lower() == 'true'
for i in range(0, 5):
connection = pyodbc.connect(CONNECTION_STR)
cursor = connection.cursor()
cursor.execute(QUERY, USERNAME)
row = cursor.fetchone()
print(row[0])
connection.close() Dockerfile FROM mcr.microsoft.com/azure-functions/python:4-python3.10-core-tools
WORKDIR /root
# Download and install unixodbc 2.3.12
RUN curl -sO https://www.unixodbc.org/unixODBC-2.3.12.tar.gz
RUN gunzip unixODBC*.tar.gz
RUN tar xvf unixODBC*.tar
WORKDIR /root/unixODBC-2.3.12
RUN ./configure
RUN make
RUN make install
WORKDIR /root
RUN apt-get update
# Install dependencies of msodbcsql18 except unixodbc
RUN apt-get install -y libc6 libstdc++6 libkrb5-3 openssl debconf odbcinst
RUN curl -s https://packages.microsoft.com/keys/microsoft.asc | sudo tee /etc/apt/trusted.gpg.d/microsoft.asc
RUN curl -s https://packages.microsoft.com/config/debian/11/prod.list | sudo tee /etc/apt/sources.list.d/mssql-release.list
# Install msodbcsql18 ignoring unixodbc dependency
RUN apt-get download msodbcsql18
RUN ACCEPT_EULA=Y dpkg --ignore-depends=unixodbc -i msodbcsql18_18.3.2.1-1_amd64.deb
RUN pip install --upgrade pip --user
RUN pip install pyodbc
ENV PYTHONUNBUFFERED=TRUE
ADD check_pooling.py ./ (My Linux skills are quite sloppy, please let me know if there is more optimal way to install/update to 2.3.12.)
You will see that connection timestamps will differ when changing the argument true/false. Also, if you are on a slow/remote connection, you will notice a connection delay with "false", indicating that pooling is disabled, whereas with "true" they are obtained instantly. For comparison, on version 2.3.11, which is installed through the package manager, FROM mcr.microsoft.com/azure-functions/python:4-python3.10-core-tools
RUN curl -s https://packages.microsoft.com/keys/microsoft.asc | sudo tee /etc/apt/trusted.gpg.d/microsoft.asc
RUN curl -s https://packages.microsoft.com/config/debian/11/prod.list | sudo tee /etc/apt/sources.list.d/mssql-release.list
RUN apt-get update
RUN ACCEPT_EULA=Y apt-get install -y msodbcsql18
# Uncomment the follwing lines to enable pooling:
#RUN mv /etc/odbcinst.ini /etc/odbcinst.old
#RUN echo "[ODBC]\nPooling=Yes\n" | cat - /etc/odbcinst.old > /etc/odbcinst.ini
RUN pip install --upgrade pip --user
RUN pip install pyodbc
ENV PYTHONUNBUFFERED=TRUE
WORKDIR /app
ADD check_pooling.py ./ you will see that the value of With that said, it looks like the current issue may be closed, because with release of version 2.3.12 the behavior of |
Related unixODBC thread, including discussion about multiprocessing |
^ That's a useful thread for understanding how multiprocessing can interfere with pooling, but IMO it is different from the above issue where unixodbc essentially disobeyed pyodbc's instruction to use pooling. |
Hello @gordthompson thanks for your analysis. import os
import sys
import pyodbc
import threading
SERVER = os.environ['SERVER']
DATABASE = os.environ['DATABASE']
USERNAME = os.environ['USERNAME']
PASSWORD = os.environ['PASSWORD']
CONNECTION_STR = f'DRIVER={{ODBC Driver 18 for SQL Server}};SERVER={SERVER};DATABASE={DATABASE};UID={USERNAME};PWD={PASSWORD}'
QUERY = "select MAX(login_time) as login_time from sys.dm_exec_sessions where login_name = ? and program_name like 'python%'"
pyodbc.pooling = sys.argv[1].lower() == 'true'
def main_func(iteration=-1):
connection = pyodbc.connect(CONNECTION_STR)
cursor = connection.cursor()
cursor.execute(QUERY, USERNAME)
row = cursor.fetchone()
print(row[0], iteration)
connection.close()
for i in range(0, 5):
th = threading.Thread(
target=main_func,
kwargs={
"iteration": i,
},
)
th.start() Do you know how can we make this work? In pyodbc source code we have a comment saying:
However, we have not found a way of modifying it. |
The threadsafety is a read-only flag required by the DBAPI to indicate pyodbc can be used on multiple threads. The driver needs to also be threadsafe too, but pyodbc itself is. |
@aAEeLll if you wanted to tag me, I can respond. You are creating 5 threads and launch them all simultaneously. What do you expect? As they all demand a connection at the same time, the driver will have to create a connection for each. It does not seem to me a problem with threading. The connection can be reused from the pool if the previous consumer released it back to the pool. I tried my test in a scenario close to yours. I used FastAPI, which creates threads, too, in case of synchronous methods. I don't think it will be much different from Flask. import time
import pyodbc
import fastapi
import uvicorn
from threading import get_ident
from constants import CONNECTION_STR, QUERY, USERNAME
app = fastapi.FastAPI()
@app.get('/cp')
def cp():
connection = pyodbc.connect(CONNECTION_STR)
cursor = connection.cursor()
cursor.execute(QUERY, USERNAME)
row = cursor.fetchone()
connection.close()
time.sleep(3)
return get_ident(), row[0]
if __name__ == '__main__':
uvicorn.run(app, host='0.0.0.0') Test:
I use a pause of 3 s to have time to issue new requests before the previous one ends, so that the framework has to create new threads to serve them (otherwise it will be one thread). I issue 4 request in parallel quite quickly with Note the connection 22.360000, which was reused by threads 140467347576576, 140467355969280, and 140467372054272. Timing is crucial, though, the repeated request after a release must be within quite a short timeframe, like around 0.5 s or less. Using my previous script posted in February I repeatedly get a new connection every time time if I insert a pause of 0.9 s or more. It looks like the connection timeout is very short in unixODBC. |
Thanks both for the clarification! 😃 This is the content of
|
Environment
Issue
I am refactoring some old code to handle SQL connection in a more consistent manner. Now, I have read about connection pooling and I wanted to check whether this was supported. In order to check this, I have run the following code:
which results in
Is this the correct way of testing whether connection pooling is being used? If so, does this in fact indicate that this is not the case here?
My
odbcinst.ini
file contains the following:Any help would be greatly appreciated. Thanks!
The text was updated successfully, but these errors were encountered: