fix(security): enforce datasource access control in get_samples() #36550
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
SUMMARY
This PR fixes a security vulnerability (issue #31944) where users with "can samples on Datasource" permission could read data samples from datasets they don't have proper access to.
Root Cause:
The
get_samples()function insuperset/views/datasource/utils.pywas creatingQueryContextinstances and callingget_payload()directly without first enforcing access control throughraise_for_access().This allowed users who only had the "can samples on Datasource" permission to bypass datasource-level security checks and read samples from datasets they shouldn't have access to.
Fix:
Added
raise_for_access()calls on both the samples and count_star query contexts before fetching any data. This ensures users must have proper datasource access (schema access, datasource access, or ownership) before samples can be retrieved.Code Change:
BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF
N/A - This is a security fix with no UI changes.
TESTING INSTRUCTIONS
/datasource/samples?datasource_id=<id>&datasource_type=tableand retrieve samplesUnit Tests:
The PR includes comprehensive unit tests that verify:
raise_for_access()is called before fetching dataADDITIONAL INFORMATION
🤖 Generated with Claude Code