-
Notifications
You must be signed in to change notification settings - Fork 13.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore: Change get_table_names/get_view_names return type #22085
chore: Change get_table_names/get_view_names return type #22085
Conversation
6e49fc4
to
e7c73d8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with some besserwisser-comments
superset/models/core.py
Outdated
@@ -556,13 +556,17 @@ def get_all_table_names_in_schema( # pylint: disable=unused-argument | |||
:param cache: whether cache is enabled for the function | |||
:param cache_timeout: timeout in seconds for the cache | |||
:param force: whether to force refresh the cache | |||
:return: list of tables | |||
:return: set of tables |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
code style: Some people consider including the type in the return description bad practice if it's already typed in the sig. I'm not sure which camp I belong to (it varies from day to day), but maybe we could rather try to be more expressive regarding the funny Tuple[str, str]
thingy, something like
:return: table name and schema pairs
base_result = BaseEngineSpec.get_table_names( | ||
database=mock.ANY, schema="schema", inspector=inspector | ||
) | ||
self.assertListEqual(base_result_expected, base_result) | ||
self.assertSetEqual(base_result_expected, base_result) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we just do this as that's what appears to be preferred these days?
assert base_result_expected == base_result
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. I didn't know if unittest supported this.
I don't understand why you have to return set. Shouldn't DBAPI return unique table names to begin with? |
@ktmud regarding your comment,
Yes the DB-API will return a unique set of table names so the set logic isn't for deduping purposes (except for when we need to do set differing for determining the tables vs. views). Personally I think—from reading the code—that the type hints help one grok the logic of the code better, i.e., when I see |
Codecov Report
@@ Coverage Diff @@
## master #22085 +/- ##
===========================================
+ Coverage 52.50% 67.02% +14.51%
===========================================
Files 1818 1819 +1
Lines 69632 69555 -77
Branches 7496 7496
===========================================
+ Hits 36562 46618 +10056
+ Misses 31132 20999 -10133
Partials 1938 1938
Flags with carried forward coverage won't be shown. Click here to find out more.
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
Set has the additional memory cost of keeping the hashtable of elements: https://towardsdatascience.com/memory-efficiency-of-common-python-data-structures-88f0f720421 Maybe not a big deal in this case but in general I don't think we should sacrifice performance for an implied convention for readability that is not very obvious. Regarding readability, in the case where sets are used, people would wonder "why this HAS to be a set" (like I did), which actually makes the code more confusing to them. Also, using sets to collect data that is already supposed to be unique may hide duplicates that were erroneously generated, hiding a potential performance issue or bug. |
SUMMARY
After working on #21794 I realized that the
get_table_names
andget_view_names
methods should really return a set of names rather than an ordered list. The results can then be sorted—if needed—within the API response. Hopefully this refactor simplifies the code logic.BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF
TESTING INSTRUCTIONS
CI.
ADDITIONAL INFORMATION