-
-
Notifications
You must be signed in to change notification settings - Fork 338
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid inferring type varchar #1188
Avoid inferring type varchar #1188
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See specific comment regarding the DAG.
db/columns/operations/infer_types.py
Outdated
@@ -15,6 +15,7 @@ | |||
MAX_INFERENCE_DAG_DEPTH = 100 | |||
|
|||
TYPE_INFERENCE_DAG = { | |||
base.PostgresType.TEXT.value: [], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The way this is set up, you'll never infer a column to be any type except TEXT
, if it starts as TEXT
. You need to put TEXT
upstream of all other types (i.e., where String
is now).
You need to change the reverse type map function to avoid an infinite loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The way this is set up, you'll never infer a column to be any type except TEXT, if it starts as TEXT.
@mathemancer I'm fuzzy about how this works too, and I'm surprised that this didn't trigger a test failure. If I understand this correctly, we should update our test suite to cover this, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay. So i need to add base.PostgresType.TEXT.value
as first value in base.STRING
and then change the reverse_type_function()
to avoid an infinite loop.
Please confirm it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please elaborate what _get_reverse_type_map()
function is doing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The way the type inference algorithm works is:
- reflect a column.
- get its type.
- map that type to a string representation through
_get_reverse_type_map
. - Look in the
TYPE_INFERENCE_DAG
dictionary for that string as a key - Get the list value associated with that key.
- Attempt to cast the column to the type represented by each list value in turn.
- If all fail, you're done (leave the column as-is) if you succeed, go to (1).
@dmos62 Actually, there is a test that checks this, and it's failing, but for the wrong reasons. 💩 I interpreted this failure incorrectly in previous comments. It's actually failing just because it wasn't updated to the new regime where the default column type is TEXT
.
We treat all string-like types as the same, except for the extra-restrictive CHAR
. This is what the little dictionary update in _get_reverse_type_map
is for. It maps all those types to the same string representation.
@vrutik2809 You should try to accomplish the following:
If a column is of type TEXT
, in the database, it should have every type as a downstream option. Columns of type TEXT
in the database will have an SQLAlchemy type of either sqlalchemy.TEXT
or sqlalchemy.Text
. (The former when reflected, and the latter when freshly created in the Python layer). To represent this, I prefer replacing the base.STRING
key with base.PostgresType.TEXT.value
in the TYPE_INFERENCE_DAG
, and then modifying the _get_reverse_type_map
function to point all of the python types Text
, TEXT
, and VARCHAR
(as imported in that file) to base.PostgresType.TEXT.value
in the reverse type map.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay. I will look into that.
Thank you
`String` to `Text`
from: `string` to `text`
…avoid-inferring-type-VARCHAR
…tik2809/mathesar into avoid-inferring-type-VARCHAR
These tests are failing:
The 1st one I got it that it is failing because the default is set to |
@vrutik2809 For many of the tests you listed, the data is being tested against def test_table_patch_columns_invalid_type_with_name(create_data_types_table, client, engine_email_type):
table_name = 'PATCH columns 15'
table = create_data_types_table(table_name)
column_data = _get_data_types_column_data()
column_data[1]['name'] = 'hello'
column_data[3]['type'] = 'BOOLEAN'
body = {
'columns': column_data
}
response = client.patch(f'/api/db/v0/tables/{table.id}/', body)
response_json = response.json()
assert response.status_code == 400
assert 'Pizza is not a boolean' in response_json[0]['message']
current_table_response = client.get(f'/api/db/v0/tables/{table.id}/')
# The table should not have changed
original_column_data = _get_data_types_column_data()
_check_columns(current_table_response.json()['columns'], original_column_data) For example, see the penultimate line of this test, which is getting |
…avoid-inferring-type-VARCHAR
- Change test types from `VARCHAR` to `TEXT`
Codecov Report
@@ Coverage Diff @@
## master #1188 +/- ##
=======================================
Coverage 93.39% 93.39%
=======================================
Files 113 113
Lines 4332 4332
=======================================
Hits 4046 4046
Misses 286 286
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this looks good. Thank you for your effort on this, @vrutik2809 !
Fixes #1121
While importing the data from .csv file, previously default datatype for the string was
VARCHAR
but now it is changed toTEXT
(check the Screenshot)Screenshots
Checklist
Update index.md
).master
branch of the repositoryvisible errors.
Developer Certificate of Origin
Developer Certificate of Origin