Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Buffer overflow in cursor create_name_map #931

Merged
merged 13 commits into from
Aug 16, 2021

Conversation

juliuszsompolski
Copy link
Contributor

@juliuszsompolski juliuszsompolski commented Jul 14, 2021

Function create_name_map used a static buffer szName[300] for the column name in call to SQLDescribeColW. That function would truncate the column name to the size of the provided buffer, but return the actual length of the column name in the &cchName return parameter.
That cchName output was used to compute cbName parameter for the call to TextBufferToObject(enc, szName, cbName). When the actual column name was longer than the buffer, cbName was to big, and was causing a buffer overflow, resulting in crashes, invalid data, or errors.

Fix this by making szName a dynamic buffer. This buffer overflow is very similar to the one fixed in #881. I check through the rest of the code base, and I don't see more places with static buffers that could have a similar issue.

Other systems do not support such long column names.
Because I couldn't test it with other systems, I adapted python3 test suites to run on Spark, and added the test there.

$ python tests3/sparktests.py DSN=DatabricksAzureWestEurope
Library: /home/julek/dev/pyodbc/build/lib.linux-x86_64-3.7/pyodbc.cpython-37m-x86_64-linux-gnu.so
----------------------------------------------------------------------
Ran 58 tests in 557.160s

OK

@juliuszsompolski
Copy link
Contributor Author

cc @mkleehammer @keitherskine This is a very similar buffer overflow to the one fixed in #881. We encountered it with Simba Spark ODBC driver.

PyErr_NoMemory();
goto done;
}
goto retry;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think goto is justified here to not repeat that logic...

Comment on lines +185 to +186
nameLen = cchName + 1;
if (!pyodbc_realloc((BYTE**) &szName, (nameLen + 1) * sizeof(ODBCCHAR))) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like in my previous PR, I prefer to do some defensive double +1 here to rather have a too big buffer, and space for an extra character (like a termination character) than a too small one.

@v-chojas
Copy link
Contributor

I am curious to learn what databases will let you use such long column names, and what demand there is for that capability in the first place -- 300 already seems very generous. I can understand error messages could be extremely verbose, but a column name doesn't seem like it would make sense being so long.

SQL Server has a maximum of 128. The first paragraph of my comment is 308 characters long.

@juliuszsompolski
Copy link
Contributor Author

juliuszsompolski commented Jul 14, 2021

@v-chojas Apache Spark allows it. Simba ODBC driver for Spark allows it.
We are hitting it in our internal testing of Simba ODBC, and Redash which uses pyodbc hits it.
When a complex expression is specified in Spark and no alias name is given to it, Spark will autogenerate column name based on the full expression. If it's a complex case expression, it can end up overflowing 300 characters.
It can be argued that that behaviour of Spark is stupid, but it's a bigger change to change the behaviour there.

@juliuszsompolski
Copy link
Contributor Author

@v-chojas the tests I added show that MS Access, Informix, SQLLite and SQL DW also allow it.
SQL Server, MySQL and PG SQL did not allow it, so I remove the tests for them.

@v-chojas
Copy link
Contributor

MS Access has a limit of 64, and SQL DW is based on SQL Server, so I would expect the same limit applies there.

My research shows that Informix also has 128, and SQLite is the only one that claims "no limit".

juliuszsompolski added a commit to databricks/pyodbc that referenced this pull request Jul 15, 2021
Function create_name_map used a static buffer `szName[300]` for the column name in call to `SQLDescribeColW`. That function would truncate the column name to the size of the provided buffer, but return the actual length of the column name in the `&cchName` return parameter.
That `cchName` output was used to compute `cbName` parameter for the call to `TextBufferToObject(enc, szName, cbName)`. When the actual column name was longer than the buffer, cbName was to big, and was causing a buffer overflow, resulting in crashes, invalid data, or errors.

Fix this by making szName a dynamic buffer. This buffer overflow is very similar to the one fixed in mkleehammer#881. I check through the rest of the code base, and I don't see more places with static buffers that could have a similar issue.

Upstream PR: mkleehammer#931
@juliuszsompolski
Copy link
Contributor Author

@v-chojas The tests added here suggest that they are nevertheless happy to create a table with column names 500 and 600 characters long, to query it, and to return such long column names.
I am not proposing this change because I'm on a nitpicking crusade to hunt irrelevant buffer overflows, but because our users have been hitting this and experiencing crashes or mangled column names, or UTF encoding/decoding errors. We have been asked for an urgent fix, so tentatively I forked pyodbc and created https://github.com/databricks/pyodbc/releases/tag/v20210715. We do however want to work with upstream to get our changes and fixes into the official pyodbc releases.

@v-chojas
Copy link
Contributor

I don't know how you tested, but I have access to a DW instance and...

>>> conn = pyodbc.connect('Driver=ODBC Driver 17 for SQL Server;Server=<my DW>;...')
>>> stmt = conn.cursor()
>>> colname = 'col' + ('1234567890' * 60)
>>> stmt.execute('create table test931(' + colname + ' int)')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
pyodbc.ProgrammingError: ('42000', "[42000] [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]Parse Error: Identiier 'col1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890...' exceeded the maximum length of 128. (104307) (SQLExecDirectW)")
>>> stmt.execute('select @@version').fetchall()
[('Microsoft Azure SQL Data Warehouse  10.0.16003.0 Apr 28 2021 04:55:16 Copyright (c) Microsoft Corporation', )]

Comment on lines 902 to 914
def test_long_column_name(self):
"ensure super long column names are handled correctly."
c1 = 'abcdefghij' * 50
c2 = 'klmnopqrst' * 60
self.cursor = self.cnxn.cursor()

self.cursor.execute("create table t1({} int, {} int)".format(c1, c2))
self.cursor.execute("select * from t1")

names = [ t[0] for t in self.cursor.description ]
names.sort()

self.assertEqual(names, [ c1, c2 ])
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@v-chojas I don't have access to a SQL DW instance, so I depended on the CI...
Oh, I get it. https://app.travis-ci.com/github/mkleehammer/pyodbc/jobs/524518887 The CI actually runs only SQL Server, PG SQL and MySql tests...

@juliuszsompolski
Copy link
Contributor Author

I removed tests for Access, Informix, SQL DW
I tested with SQLite.
What SQLLite does is that if BufferLength of 300 is passed to SQLDescribeCol function, it returns the column name truncated to 300 characters, and returns 300 in NameLengthPtr.
This is not according to the spec of https://docs.microsoft.com/en-us/sql/odbc/reference/syntax/sqldescribecol-function?view=sql-server-ver15, which says that the actual column name length should be returned

NameLengthPtr
[Output] Pointer to a buffer in which to return the total number of characters (excluding the null termination) available to return in *ColumnName. If the number of characters available to return is greater than or equal to BufferLength, the column name in *ColumnName is truncated to BufferLength minus the length of a null-termination character.

Therefore, it seems it can't be tested with the existing set of tests. Thanks @v-chojas for checking all these systems.

@juliuszsompolski
Copy link
Contributor Author

Because I couldn't test it with other systems, I adapted python3 test suites to run on Spark, and added the test there.

$ python tests3/sparktests.py DSN=DatabricksAzureWestEurope
Library: /home/julek/dev/pyodbc/build/lib.linux-x86_64-3.7/pyodbc.cpython-37m-x86_64-linux-gnu.so
----------------------------------------------------------------------
Ran 58 tests in 557.160s

OK

@mkleehammer mkleehammer merged commit 24e0cca into mkleehammer:master Aug 16, 2021
@mkleehammer
Copy link
Owner

Thanks for all the work you put into this!

@mkleehammer
Copy link
Owner

Sorry everyone, I somehow merged this without the rebase. I thought I configured Github to require that or make it the default.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants