Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Efficiently calculate list things a user can do / list of users who can do a thing #1152

Open
Tracked by #2333
simonw opened this issue Dec 18, 2020 · 16 comments
Open
Tracked by #2333

Comments

@simonw
Copy link
Owner

simonw commented Dec 18, 2020

The homepage currently performs a massive flurry of permission checks - one for each, database, table and view: https://github.com/simonw/datasette/blob/0.53/datasette/views/index.py#L21-L75

A paginated version of this is a little daunting as the permission checks would have to be carried out in every single table just to calculate the count that will be paginated.

Originally posted by @simonw in #1150 (comment)

UPDATE 25th April 2024: This should also cover efficient lookup of "what users are allowed to view this table / do this thing?"

@simonw
Copy link
Owner Author

simonw commented Dec 18, 2020

This is a classic challenge in permissions systems. If I want Datasette to be able to handle thousands of tables I need a reasonable solution for it.

Twitter conversation: https://twitter.com/simonw/status/1339791768842248192

@simonw
Copy link
Owner Author

simonw commented Dec 18, 2020

One enormous advantage I have is that after #1150 I will have a database table full of databases and tables that I can execute queries against.

This means I could calculate visible tables using SQL where clauses, which should be easily fast enough even against ten thousand plus tables.

The catch is the permissions hooks. Since I haven't hit Datasette 1.0 yet maybe I should redesign those hooks to work against the new in-memory database schema stuff?

@simonw
Copy link
Owner Author

simonw commented Dec 18, 2020

What would Datasette's permission hooks look like if they all dealt with sets of items rather than individual items? So plugins could return a set of items that the user has permission to access, or even a WHERE clause?

@simonw
Copy link
Owner Author

simonw commented Dec 18, 2020

Perhaps this can be solved by keeping the existing plugin hooks and adding new, optional ones for bulk lookups.

If your plugin doesn't implement the bulk lookup hooks Datasette will do an inefficient loop through everything checking permissions on each one.

If you DO implement it you can speed things up dramatically.

Not sure if this would solve the homepage problem though, where you might need to run 1,000 table permission checks. That's more a case where you want to think in terms of a SQL where clause.

@simonw
Copy link
Owner Author

simonw commented Dec 18, 2020

I want to keep the existing metadata.json "allow" blocks mechanism working. Note that if you have 1,000 tables and a permissions policy you won't be using "allow" blocks, you'll be using a more sophisticated permissions plugin instead.

@simonw
Copy link
Owner Author

simonw commented Dec 18, 2020

Could I solve this using a configured canned query against the _internal tables with the actor's properties as inputs?

@simonw
Copy link
Owner Author

simonw commented Dec 18, 2020

Redefining all Datasette permissions in terms of SQL queries that return the set of databases and tables that the user is allowed to interact with does feel VERY Datasette-y.

@simonw
Copy link
Owner Author

simonw commented Dec 18, 2020

It's also a really good fit for the new mechanism that's coming together in #1150.

@simonw
Copy link
Owner Author

simonw commented Dec 18, 2020

Another permissions thought: what if ALL Datasette permissions were default-deny, and plugins could only grant permission to things, not block permission?

Right now a plugin can reply False to block, True to allow or None for "I have no opinion on this, ask someone else" - but even I'm confused by the interactions between block and allow and I implemented the system!

If everything in Datasette was default-deny then the user could use --public-view as an option when starting the server to default-allow view actions.

More importantly: plugins could return SQL statements that select a list of databases/tables the user is allowed access to. These could then be combined with UNION to create a full list of available resources.

@simonw
Copy link
Owner Author

simonw commented Dec 22, 2020

#1150 is landed now, which means there's a new, hidden _internal SQLite in-memory database containing all of the tables and databases.

@simonw
Copy link
Owner Author

simonw commented Jan 4, 2021

I think the way to do this is to have a new plugin hook that returns two SQL where clauses: one returning a list of resources that the user should be able to access (the allow-list) and one returning a list of resources they are explicitly forbidden from accessing (the deny-list). Either of these can be blank.

Datasette can then combine those into a full SQL query and use it to answer the question "show me a list of resources that the user is allowed to perform action X on". It can also answer the existing question, "is user X allowed to perform action Y on resource Z"?

@simonw
Copy link
Owner Author

simonw commented Dec 27, 2021

Another option: rethink permissions to always work in terms of where clauses users as part of a SQL query that returns the overall allowed set of databases or tables. This would require rethinking existing permissions but it might be worthwhile prior to 1.0.

@simonw
Copy link
Owner Author

simonw commented Apr 25, 2024

One option: keep the existing permission_allowed() hook (since dozens of plugins use it already), but offer a new plugin hook which uses the SQL-level permissions instead. If your plugin implements that hook then it will be shown in lists of users-who-can-do-x and things-user-x-can-do AND you'll get the permission allowed behavior for free.

That way plugins that don't upgrade will remain secure, while plugins that DO upgrade will gain extra functionality.

@simonw
Copy link
Owner Author

simonw commented Apr 25, 2024

Relevant quote: https://simonwillison.net/2024/Apr/16/wkirby-on-hacker-news/

Permissions have three moving parts, who wants to do it, what do they want to do, and on what object. Any good permission system has to be able to efficiently answer any permutation of those variables. Given this person and this object, what can they do? Given this object and this action, who can do it? Given this person and this action, which objects can they act upon?

wkirby on Hacker News

@simonw simonw changed the title Efficiently calculate list of databases/tables a user can view Efficiently calculate list things a user can do / list of users who can do a thing Apr 25, 2024
@simonw
Copy link
Owner Author

simonw commented Apr 25, 2024

I think the _internal table is going to need to grow a actors table for this - without that I don't know how we would efficiently answer the "who can view this table" question.

This itself is tricky though, because right now actors are entirely decoupled - which is useful, because it means you can implement things like API tokens which act-as a specific actor ID but come with an extra set of restrictions, as seen in datasette-auth-tokens.

Is it dangerously misleading to offer a page of "people who can do this thing?" which omits API tokens or actors that might come in through some other mechanism?

@simonw
Copy link
Owner Author

simonw commented Dec 19, 2024

This can be influenced by explorations I did for https://github.com/datasette/datasette-acl - which is also the prime example of why this stuff is important.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant