Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow creation of virtual tables at startup #657

Open
dazzag24 opened this issue Jan 12, 2020 · 4 comments
Open

Allow creation of virtual tables at startup #657

dazzag24 opened this issue Jan 12, 2020 · 4 comments

Comments

@dazzag24
Copy link

Hi,

I've been experimenting with SQLite reading from huge datasets using this excellent Parquet extension from @cldellow.
https://cldellow.com/2018/06/22/sqlite-parquet-vtable.html
https://github.com/cldellow/sqlite-parquet-vtable

This works really well, but I was keen to see if I could combine datasette with this. Having previously experimented with the spatialite extension I knew that datasette supports loading extensions in the underlying sqlite instance. However I hit a blocker as the current design only allows SELECT statements to be executed and so I am unable to execute the crucial

CREATE VIRTUAL TABLE .........

command that is required to load the data from the parquet file into the table.

It seems like this would be a simple-ish change, but I don't know enough about the architecture of datasette to start implementing this myself? Could this be done as a datasette plugin? or would this require more fundamental changes at initialisation time?

My thoughts are that something at init time could detect that the user was loading a *.parquet file and then switch to a mode were it loads that via the "CREATE VIRTUAL TABLE..." rather than loading the *.db file in the default case??

I'm happy to contribute code and testing, I just need some pointers on the best approach.

Thanks
Darren

@dazzag24 dazzag24 changed the title Allow creation of vtables at startup Allow creation of virtual tables at startup Jan 12, 2020
@dazzag24
Copy link
Author

I have successfully tested datasette using a parquet VIRTUAL TABLE. In the first terminal:

datasette airports.db --load-extension=libparquet

In another terminal I load the same sqlite db file using the sqlite3 cli client.

$ sqlite3 airports.db

and then load the parquet extension and create the virtual table.

sqlite> .load /home/darreng/metars/libparquet
sqlite> CREATE VIRTUAL TABLE mytable USING parquet('/home/xx/data.parquet');

Now the parquet virtual table is usable by the datasette web UI.

Its not an ideal solution but is a proof that datasette works the parquet extension.

@dazzag24
Copy link
Author

Hi,

I've completed some changes to my fork of datasette that allows it to automatically create the parquet virtual table when you supply it with a filename that has the ".parquet" extension.

I had to figure out how to make the "CREATE VIRTUAL TABLE" statement only be applied to the fake in memory parquet database and not to any others that were also being loaded. Thus it supports mixed mode databases e.g

datasette my_test.parquet normal_sqlite_file.db  --load-extension=libparquet.so --load-extensio
n=mod_spatialite.so

Please see my changes here:
dazzag24@8e18394

Thanks

@simonw
Copy link
Owner

simonw commented Jan 15, 2021

I think the startup() plugin hook at https://docs.datasette.io/en/stable/plugin_hooks.html#startup-datasette should be able to fit this. You can write a plugin which uses that hook to execute CREATE VIRTUAL TABLE against one or more databases when Datasette first starts running.

Would that work here?

@simonw
Copy link
Owner

simonw commented Jan 15, 2021

I'm not sure how I missed this issue but it's almost a year later and I'm finally taking a look at your Parquet work.

This is yet more evidence that allowing plugins to provide their own custom Database objects would be a good idea.

I started exploring what Datasette would like on PostgreSQL in #670 - my concern was that I would need to add a large amount of database abstraction code which would dramatically increase the complexity of the core project, but my thinking now is that it might be tractable - Datasette doesn't actually construct SQL in complex ways anywhere outside of the TableView class so abstracting away just that bit should be feasible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants