Datasette with many and large databases > Memory use #1880

amitkoth · 2022-11-02T18:10:27Z

Datasette maintains an in-memory SQLite database with details of the the databases, tables and columns for all of the attached databases.

The above is from the docs ^. There's two problems here - the number of datasette "instances" in a single server/VM and the size of the database itself. We want the opposite of in-memory, including what happens on SQLlite - documented in https://www.sqlite.org/inmemorydb.html

From the context in #1150 - does it mean datasette is memory-bound to the size of the dataset - which might be a deal-breaker for many large-scale use cases?

In an extreme case - let's say a single server had 100 SQLlite databases, which would enable 100 "instances" of datasette to run, one per client (e.g. in a SaaS multi-tenant environment). How could we achieve all these goals:

Allow any one of these 100 databases to grow to say 2Tb in size
Have one datasette instance, which connects to 1 of the 100 instances, based on incoming credentials/tenant ID
Minimize memory use entirely - both by datasette and SQLlite, such that almost all operations are executed in real-time on-disk with little to no memory consumption per-tenant, or per-database.

Any ideas appreciated - we're looking to use this in a SaaS type of setting - many instances, single server.

@simonw great work on datasette, in general! Possibly related to #1480 but we don't want use any kind of serverless infra - this is a long-running VM/server.

amitkoth · 2022-11-02T18:20:14Z

Follow on question - is all memory use @simonw - for both datasette and SQLlite confined to the "query time" itself i.e. the memory use is relevant only to a particular transaction or query - and then subsequently released?

simonw · 2022-11-11T06:12:29Z

I think you may have misunderstood this feature. This is talking about the _internal in-memory database, which maintains a set of tables that list the databases and tables that are attached to Datasette.

They're not a copy of the data itself - just a list of table names, column names and database names.

You can see what that database looks like by signing in as root - running datasette --root and clicking the link. Or you can see an example here:

Click the button on https://latest.datasette.io/login-as-root
Now visit https://latest.datasette.io/_internal

For the example instance that looks like this:

The two most interesting tables in there are these ones:

As you can see, it's just the table schema itself and the columns that make up the tables. Even if you have hundreds of databases connected each with hundreds of tables this should still only add up to a few MB of RAM.

simonw · 2022-11-11T06:15:28Z

The _internal database is intended to help Datasette handle much larger attached databases. Right now Datasette attempts to show every database on the https://latest.datasette.io/ index page and every table on the https://latest.datasette.io/fixtures database index page - but these are not paginated. If you had a database containing 1,000 tables the database index page would get pretty slow.

So I want to be able to paginate (and search) those. But to paginate them it's useful to have them in a database table itself, since then I can paginate using SQL.

My plan for _internal is to use it to implement those advanced browsing features. I've not completed this work yet though. See this issue for more details on that:

Datasette Library #417

amitkoth · 2022-11-16T17:50:29Z

I appreciate your response @simonw - thanks!

I'll clarify what we need further - let's imagine we have 2000 SQLLite databases (for 2000 tenants), but we only want to run one datasette instance for each of those tenants to query/use datasette against their own database only. This means the "connection" between datasette and the SQLLite database would be dynamic, based on the tenantID that's required on an incoming request.

Is there any specific config or other considerations in this use case, to minimize memory use on a single, efficient VM and serve queries to all these tenants?

cc @MuAdham

amitkoth changed the title ~~Datasette with many and large databases~~ Datasette with many and large databases > Memory use Nov 2, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Datasette with many and large databases > Memory use #1880

Datasette with many and large databases > Memory use #1880

amitkoth commented Nov 2, 2022 •

edited

Loading

amitkoth commented Nov 2, 2022

simonw commented Nov 11, 2022

simonw commented Nov 11, 2022

amitkoth commented Nov 16, 2022

Datasette with many and large databases > Memory use #1880

Datasette with many and large databases > Memory use #1880

Comments

amitkoth commented Nov 2, 2022 • edited Loading

amitkoth commented Nov 2, 2022

simonw commented Nov 11, 2022

simonw commented Nov 11, 2022

amitkoth commented Nov 16, 2022

amitkoth commented Nov 2, 2022 •

edited

Loading