Skip to content

Dataset get_or_create API fail to create a dataset with the same table_name but different schema #30377

Open
@luizcapu

Description

@luizcapu

Bug description

get_or_create Dataset API endpoint breaks with 500 - Internal Server error if there is any existing dataset with the same table_name, regardless of the dataset schema.

This is not the behaviour we observe on other endpoints, such as a POST /dataset from where users are able to create datasets with the same table_name, as long as they belong to either a different database_id or a different schema.

Furthermore, if there is already an existing dataset for a given table_name and a user is trying to create a new dataset with the same table_name but different schema, the API will return 200, which is a false positive.

How to reproduce the bug

Case 1 - False Positive

  1. Go to the datasets page
  2. Pick any existing dataset name and prepare a payload as follows (example using the users datasets)
{
  'table_name': 'users',
  'schema': 'other',
  'database_id': 1,
}
  1. Submit this payload via a POST request to /api/v1/dataset/get_or_create
  2. Note how the API will return with a 200 pointing to the existing public.users dataset ID. The new dataset is not created.

Case 2 - Internal Server Error

  1. Create 2 or more datasets with the same table_name and different schemas (either via UI or create dataset API)
  2. Try to create a new dataset. Again, with same table_name but a different schema. Payload example:
{
  'table_name': 'users',
  'schema': 'any_new_schema_name',
  'database_id': 1,
}
  1. Submit this payload via a POST request to /api/v1/dataset/get_or_create
  2. Note how the API will return with a 500 - Internal Server Error. The new dataset is not created.

Screenshots/recordings

No response

Superset version

master / latest-dev

Python version

3.9

Node version

16

Browser

Chrome

Additional context

No response

Checklist

  • I have searched Superset docs and Slack and didn't find a solution to my problem.
  • I have searched the GitHub issue tracker and didn't find a similar bug report.
  • I have checked Superset's logs for errors and if I found a relevant Python stacktrace, I included it here as text in the "additional context" section.

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    apiRelated to the REST APIdata:datasetRelated to dataset configurations

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions