-
-
Notifications
You must be signed in to change notification settings - Fork 700
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mechanism for storing metadata in _metadata tables #1168
Comments
Also: in #188 I proposed bundling metadata in the SQLite database itself alongside the data. This is a great way of ensuring metadata travels with the data when it is downloaded as a SQLite |
_metadata
in-memory table_metadata
in-memory database
A database that exposes metadata will have the same restriction as the new As such, I'd rather bundle any metadata tables into the existing |
What if metadata was stored in a JSON text column in the existing The downside of JSON columns generally is that they're harder to run indexed queries against. For metadata I don't think that matters - even with 10,000 tables each with their own metadata a SQL query asking for e.g. "everything that has Apache 2 as the license" would return in just a few ms. |
So what if the These columns could be populated by Datasette on startup through reading the |
One possibility: plugins could write directly to that in-memory database table. But how would they know to write again should the server restart? Maybe they would write to it once when called by the Also: if I want to support metadata optionally living in a |
Here are the requirements I'm currently trying to satisfy:
|
The sticking point here seems to be the plugin hook. Allowing plugins to over-ride the way the question "give me the metadata for this database/table/column" is answered makes the database-backed metadata mechanisms much more complicated to think about. What if plugins didn't get to over-ride metadata in this way, but could instead update the metadata in a persistent Datasette-managed storage mechanism? Then maybe Datasette could do the following:
If database files were optionally allowed to store metadata about tables that live in another database file this could perhaps solve the plugin needs - since an "edit metadata" plugin would be able to edit records in a separate, dedicated |
Some SQLite databases include SQL comments in the schema definition which tell you what each column means: CREATE TABLE User
-- A table comment
(
uid INTEGER, -- A field comment
flags INTEGER -- Another field comment
); The problem with these is that they're not exposed to SQLite in any mechanism other than parsing the I had an idea to build a plugin that could return these. That would be easy with a "get metadata for this column" plugin hook - in the absence of one a plugin could still run that reads the schemas on startup and uses them to populate a metadata database table somewhere. |
The direction I'm leaning in now is the following:
Plugins that want to provide metadata can do so by populating a table. They could even maintain their own in-memory database for this, or they could write to the |
So what would the database schema for this look like? I'm leaning towards a single table called If it's just a single
If the The alternative to the |
Could this use a compound primary key on |
Would also need to figure out the precedence rules:
|
From an implementation perspective, I think the way this works is SQL queries read the relevant metadata from ALL available metadata tables, then Python code solves the precedence rules to produce the final, combined metadata for a database/table/column. |
Also: probably load column metadata as part of the table metadata rather than loading column metadata individually, since it's going to be rare to want the metadata for a single column rather than for an entire table full of columns. |
Precedence idea:
|
_metadata
in-memory database
I need to prototype this. Could I do that as a plugin? I think so - I could try out the algorithm for loading metadata and display it on pages using some custom templates. |
One catch: solving the "show me all metadata for everything in this Datasette instance" problem. Ideally there would be a SQLite table that can be queried for this. But the need to resolve the potentially complex set of precedence rules means that table would be difficult if not impossible to provide at run-time. Ideally a denormalized table would be available that featured the results of running those precedence rule calculations. But how to handle keeping this up-to-date? It would need to be recalculated any time a This is a much larger problem - but one potential fix would be to use triggers to maintain a "version number" for the Such a mechanism would have applications outside of just this |
Idea: version the metadata scheme. If the table is called |
Related: Here's an implementation of a |
Here's a plugin that implements metadata-within-DBs: next-LI/datasette-live-config How it works: If a database has a |
Datasette Cloud really wants this. |
Original title: Perhaps metadata should all live in a
_metadata
in-memory databaseInspired by #1150 - metadata should be exposed as an API, and for large Datasette instances that API may need to be paginated. So why not expose it through an in-memory database table?
One catch to this: plugins. #860 aims to add a plugin hook for metadata. But if the metadata comes from an in-memory table, how do the plugins interact with it?
The need to paginate over metadata does make a plugin hook that returns metadata for an individual table seem less wise, since we don't want to have to do 10,000 plugin hook invocations to show a list of all metadata.
If those plugins write directly to the in-memory table how can their contributions survive the server restarting?
The text was updated successfully, but these errors were encountered: