-
Notifications
You must be signed in to change notification settings - Fork 3
Description
For some applications it is desirable to to declare primary keys for tables. For example, sqlalchemy requires all mapped tables to have a PK if the ORM layer is to be used.
If we were to declare a primary key for statements we would need to either
- Add an additional statement id column and declare it PK
- Use a unique combination such as s,p,o,v as a composite PK
The first has disadvantages, e.g. overhead of adding SEQUENCEs, divergence from RDF model
The second seems reasonable. However, many RDBMs do not allow nulls in a PK. sqlite is more lax here though. However, even though sqlite allows it, certain tools such as sqlalchemy ORM will not allow nulls in PK fields. By design, with statements
, EITHER object OR value MUST be NULL.
e.g. given:
class Statements(Base):
"""
Represents an RDF triple
"""
__tablename__ = 'statements'
stanza = Column(Text, primary_key=True)
subject = Column(Text, primary_key=True)
predicate = Column(Text, primary_key=True)
object = Column(Text, primary_key=True)
value = Column(Text, primary_key=True)
datatype = Column(Text, primary_key=True)
language = Column(Text, primary_key=True)
if we try and do an insert with an rdfs:label (which has object NULL) we get:
sqlalchemy.orm.exc.FlushError: Can't update table statements using NULL for primary key value on column statements.object
This is OK if one doesn't want to use the sqlalchemy ORM layer (the lower-level base layer should work just fine), but it might be good to be aware of this limitation for future evolution
Aside: I used to be more of a computer science purist and decried ORMs as evil, impedance mismatch yadda yadda. But the sqlalchemy ORM is quite nice and I see the advantages of ORMs in solving more pedestrian problems...
I don't think we should change anything as I think the existing structure is nice and simple and good for certain kinds of queries.
I think the solution here is: if you want to do CRUD with an ORM, then define a separate schema in which object and value are combined into one field (e..g a basic "triples" table), and define a trivial bridge layer between rdftab.statements and your.triple (e.g populate one from the other via INSERT INTO .. AS SELECT ...). This is a separate concern from rdftab.
This issue can be immediately closed, just wanted to flag this if useful for planning future evolution.