Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reading and writing to SQL database #158

Open
slwu89 opened this issue Nov 30, 2024 · 10 comments
Open

Reading and writing to SQL database #158

slwu89 opened this issue Nov 30, 2024 · 10 comments

Comments

@slwu89
Copy link
Member

slwu89 commented Nov 30, 2024

We would like to read/write from an SQL database, we can rely on the https://github.com/MechanicalRabbit/FunSQL.jl package to help automate the writing of SQL code.

@slwu89
Copy link
Member Author

slwu89 commented Dec 2, 2024

@TheCedarPrince since you are the local FunSQL expert, I wanted to ask, does FunSQL have a SQLNode object to do CREATE TABLE? The feature that this issue is tracking is the very simple read/write functionality, essentially just read_json_acset and read_json_acset as exists in the current acset serialization functionality but for a general SQL DB connection. We'd need to use CREATE TABLE for each object in an acset.

@slwu89
Copy link
Member Author

slwu89 commented Dec 3, 2024

One thing we will need to think about is how to handle writing of attribute data. SQLite, for example, has a rather restricted set of data types it supports (https://www.sqlite.org/datatype3.html). Unless we use BLOB, we are basically restricted to:

julia_to_sql_type(::Type{T}) where {T} = error("type $(T) not supported for SQL conversion")
julia_to_sql_type(::Type{T}) where {T<:Real} = "REAL"
julia_to_sql_type(::Type{T}) where {T<:AbstractString} = "TEXT"
julia_to_sql_type(::Type{T}) where {T<:Integer} = "INTEGER"

So any attributes that are unions, or vectors, etc, are out. One way to go about this is to let the user provide, if they want, for specific AttrTypes, an SQL target type and a conversion function (e.g. to convert a Vector{String} into a string delimited by commas. Otherwise we use the fallback above.

@slwu89
Copy link
Member Author

slwu89 commented Dec 3, 2024

For acsets, we can more or less name the obs, homs, etc anything we want. But in SQLite (and other implementations), there are quite a few reserved words we cannot use for tables (see https://sqlite.org/lang_keywords.html). So by default we should name the table associated with an object the same name, unless the user gives an override.

Also, in SQL (in yet another departure of the SQL implementation from the relational model it supposedly "implements"), the order in which columns are specified in CREATE TABLE is important. Not sure if this will complicate things in practice, but should be aware that subtle bugs might arise if it is not taken into account.

@slwu89
Copy link
Member Author

slwu89 commented Dec 4, 2024

Apparently SQLite does not allow multiple tables to be generated in a single statement, the code below (silently, of course) only generates the artist table and then ignores the remaining statements. This is a tricky problem for writing an arbitrary acset to a SQL database. The reason is that foreign keys play the role of Homs in acsets, and in SQL if you declare a column to be a foreign key to the primary key of another table, that other table had better exist already.

using SQLite

function get_table(db, table)
    query = DBInterface.execute(db, "SELECT * FROM $table")
    return DataFrames.DataFrame(query)
end

db = SQLite.DB()

DBInterface.execute(db, """
    CREATE TABLE artist(
        artistid    INTEGER PRIMARY KEY, 
        artistname  TEXT
    );
    CREATE TABLE track(
        trackid     INTEGER, 
        trackname   TEXT, 
        trackartist INTEGER,
        FOREIGN KEY(trackartist) REFERENCES artist(artistid)
    );
""")

get_table(db, "artist")
get_table(db, "track")

One thing to do to get around this is to first create each table "individually" and later add the foreign key columns. Of course here there is another problem, in that FOREIGN KEY keyword does not seem to be compatible with ALTER TABLE. The best I can get working is below. This is less than ideal, as REFERENCES is a "column level" foreign key, meaning that the column in the target table it points to doesn't necessarily have to be that table's primary key. FOREIGN KEY is what declares that column to be pointing to another table's primary key (this is not documented clearly anywhere, I read it at https://stackoverflow.com/questions/8595695/what-is-difference-between-foreign-key-and-reference-key). So we still don't have proper foreign keys. @TheCedarPrince any thoughts?

using SQLite

function get_table(db, table)
    query = DBInterface.execute(db, "SELECT * FROM $table")
    return DataFrames.DataFrame(query)
end

db = SQLite.DB()

DBInterface.execute(db, """
    CREATE TABLE artist(
        artistid    INTEGER PRIMARY KEY, 
        artistname  TEXT
    );
""")

DBInterface.execute(db, """
    CREATE TABLE track(
        trackid     INTEGER, 
        trackname   TEXT
    );
""")

get_table(db, "artist")
get_table(db, "track")

DBInterface.execute(db, """
    ALTER TABLE track
        ADD COLUMN trackartist INTEGER DEFAULT NULL
        REFERENCES artist(artistid);
""")

@TheCedarPrince
Copy link

TheCedarPrince commented Dec 4, 2024

Hey @slwu89 , here are some out of order thoughts on this!

@TheCedarPrince since you are the local FunSQL expert, I wanted to ask, does FunSQL have a SQLNode object to do CREATE TABLE?

FunSQL.jl does not have support for CREATE TABLE for I think precisely the reason we are running into -- all the variance between table creation in SQL flavors. If you want to get more clarification, the FunSQL.jl folks are very active on the Julia Zulip here. Otherwise, what they do support doing is that one can create a table first using whatever mechanism the SQL flavor we are using requires, and then we can insert the result of a FunSQL query into the table like this:

fun_sql = From(...) # Some FunSQL DSL statement

catalog = reflect(
    db_conn;
    schema = "",
    dialect=:sqlite
);

sql = render(catalog, fun_sql);

res = execute(db_conn,
    """
    INSERT INTO
        table
    SELECT
        *
    FROM
        ($sql) AS foo;
    """
)

But in SQLite (and other implementations), there are quite a few reserved words we cannot use for tables (see sqlite.org/lang_keywords.html). So by default we should name the table associated with an object the same name, unless the user gives an override.

Hm. We may need to create a small sanitizer package or routine that checks to make sure no statements are using reserved language per flavors we want to support.

One thing we will need to think about is how to handle writing of attribute data.

So one thing I am a strong advocate for here is to use DBInterface.jl interfaces to the DBs we are interested in using. I know that most DB packages within Julia support DBInterface interfaces which is great and should help a lot with this.

So we still don't have proper foreign keys. @TheCedarPrince any thoughts?

Is there any particular reason why we are sticking to SQLite? Could we perhaps use something like DuckDB which acts very similarly to PostgreSQL, supports in-memory databases, and should be fit for high performance use? I've encountered some of the roadblocks you were describing in SQLite and it is not a fault of FunSQL here but rather SQLite being strange in this regard. Here's the DuckDB.jl package and info about constraints.

This is just my thought on SQLite as we could switch to something else that "just does" what we want for a proof of concept. Or is SQLite on your side a design requirement?

@slwu89
Copy link
Member Author

slwu89 commented Dec 4, 2024

Hey @TheCedarPrince, thanks!

Is there any particular reason why we are sticking to SQLite?

It's not a hard design requirement. I just started with it since of previous familiarity (though, obviously, not that familiar as I forgot about all these foot guns). It would be nice to support it nontheless given how widely it's used (and for AlgebraicJulia/AlgebraicRelations.jl#33). But yes for a proof of concept it may be better to switch to something more "fully featured". I'll check out DuckDB!

I've encountered some of the roadblocks you were describing in SQLite and it is not a fault of FunSQL here but rather SQLite being strange in this regard.

Yes, that makes sense. I haven't used FunSQL yet in my investigations as I think for this first feature addition I'll just be looking at writing an acset to an SQL db of some sort and reading in from a db a schema and an acset. I think I'll mostly rely on DBInterface to talk to whatever backend is used as the proof of concept.

@TheCedarPrince
Copy link

Here is an example for creating a connection to DuckDB.jl:

using DBInterface
using DuckDB

db = DBInterface.connect(DuckDB.DB, ":memory:")

Let me know what else you need!

@TheCedarPrince
Copy link

OH! Forgot to mention, if you want to write this to a file (like database.duckdb) all you need is actually DBInterface.connect(DuckDB.DB, "database.duckdb") then you should be fine.

What's neat is that starting as of, if I recall, v0.10.X, all versions of duckdb files are now backwards compatible across DuckDB. Generated duckdb files should also be able to be accessed from teh DuckDB CLI.

@slwu89
Copy link
Member Author

slwu89 commented Dec 9, 2024

@TheCedarPrince testing out DuckDB a bit! Unfortunately it also has the Adding columns with constraints not yet supported notice. It's a problem because in general, there's nothing stopping someone from making an acset with a schema where each object has a hom to some other object, so there isn't a general way to "start with the tables that don't point to anything" and work backwards.

using DuckDB
using DataFrames
using Catlab

# create a new in-memory database
con = DBInterface.connect(DuckDB.DB, ":memory:")

julia_to_sql_type(::Type{T}) where {T} = error("type $(T) not supported for SQL conversion")
julia_to_sql_type(::Type{T}) where {T<:Real} = "REAL"
julia_to_sql_type(::Type{T}) where {T<:AbstractString} = "TEXT"
julia_to_sql_type(::Type{T}) where {T<:Symbol} = "TEXT"
julia_to_sql_type(::Type{T}) where {T<:Integer} = "INTEGER"

@present SchWeightedLabeledGraph <: SchLabeledGraph begin
    Weight::AttrType
    weight::Attr(E,Weight)
end

# to_graphviz(SchWeightedLabeledGraph)

@acset_type WeightedLabeledGraph(SchWeightedLabeledGraph, index=[:src, :tgt]) <: AbstractLabeledGraph

g = erdos_renyi(WeightedLabeledGraph{Symbol,Float64}, 10, 0.25)
g[:, :label] = Symbol.(collect('a':'z')[1:nv(g)])
g[:, :weight] = floor.(rand(ne(g)) .* 100)

# first make tables of each object and its attributes
for o in objects(acset_schema(g))
    o_attrs = attrs(acset_schema(g); from=o)
    stmt = Vector{String}()
    push!(stmt, "CREATE OR REPLACE TABLE $(o)(_id INTEGER PRIMARY KEY")
    for (col, _, type) in o_attrs
        push!(stmt, ", $(col) $(julia_to_sql_type(subpart_type(g, type)))")
    end
    push!(stmt, ");")
    
    DBInterface.execute(con, join(stmt))
end

# second add all homs (REFERENCE)
for (h, h_dom, h_codom) in homs(acset_schema(g))
    DBInterface.execute(con, """
        ALTER TABLE $(h_dom) ADD $(h) INTEGER REFERENCES $(h_codom)(_id);
    """)
end

To be exactly clear, the issue is that it's possible to have an acset like this, which is quite difficult to instantiate a database for, it seems.

@present SillySch(FreeSchema) begin
    (X,Y)::Ob
    x::Hom(X,Y)
    y::Hom(Y,X)
end

@acset_type SillyType(SillySch)

x = @acset SillyType begin
    X=2
    Y=3
    x=[1,2]
    y=[1,1,2]
end

@slwu89
Copy link
Member Author

slwu89 commented Dec 10, 2024

Update:

Unsure where to go from here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants