fix(dataset-api): get_or_create creates a dataset for an existing table_name but different schema #30379
+79
−2
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
SUMMARY
At Pinterest we were trying to use the
get_or_create
endpoint to automate the integration between our MetricsLayer and Superset. During our tests we've encountered the following issue: #30377The issue happens because at the moment the
get_or_create
code does not account for the payloadschema
attribute, doing a search bytable_name
only.This PR changes
get_or_create
to take theschema
into account:get_table_by_schema_and_name
was added to theDatasetDAO
classget_or_create
now checks if the dataset exists by callingDatasetDAO.get_table_by_schema_and_name
(instead ofDatasetDAO.get_table_by_name
)TESTING INSTRUCTIONS
Case 1 - False Positive
users
datasets)200
pointing to the new dataset. No false positives anymore.Case 2 - Internal Server Error
table_name
and differentschemas
(either via UI or create dataset API)table_name
but a different schema. Payload example:200
pointing to the new dataset. No 500 errors anymore.Backward Compatibility
200
with the response body pointing to the existing dataset.ADDITIONAL INFORMATION