Skip to content

statements table cannot have a primary key defined over it due to nulls #16

@cmungall

Description

@cmungall

For some applications it is desirable to to declare primary keys for tables. For example, sqlalchemy requires all mapped tables to have a PK if the ORM layer is to be used.

If we were to declare a primary key for statements we would need to either

  1. Add an additional statement id column and declare it PK
  2. Use a unique combination such as s,p,o,v as a composite PK

The first has disadvantages, e.g. overhead of adding SEQUENCEs, divergence from RDF model

The second seems reasonable. However, many RDBMs do not allow nulls in a PK. sqlite is more lax here though. However, even though sqlite allows it, certain tools such as sqlalchemy ORM will not allow nulls in PK fields. By design, with statements, EITHER object OR value MUST be NULL.

e.g. given:

class Statements(Base):
    """
    Represents an RDF triple
    """
    __tablename__ = 'statements'
    stanza = Column(Text, primary_key=True)
    subject = Column(Text, primary_key=True)
    predicate = Column(Text, primary_key=True)
    object = Column(Text, primary_key=True)
    value = Column(Text, primary_key=True)
    datatype = Column(Text, primary_key=True)
    language = Column(Text, primary_key=True)

if we try and do an insert with an rdfs:label (which has object NULL) we get:

sqlalchemy.orm.exc.FlushError: Can't update table statements using NULL for primary key value on column statements.object

This is OK if one doesn't want to use the sqlalchemy ORM layer (the lower-level base layer should work just fine), but it might be good to be aware of this limitation for future evolution

Aside: I used to be more of a computer science purist and decried ORMs as evil, impedance mismatch yadda yadda. But the sqlalchemy ORM is quite nice and I see the advantages of ORMs in solving more pedestrian problems...

I don't think we should change anything as I think the existing structure is nice and simple and good for certain kinds of queries.

I think the solution here is: if you want to do CRUD with an ORM, then define a separate schema in which object and value are combined into one field (e..g a basic "triples" table), and define a trivial bridge layer between rdftab.statements and your.triple (e.g populate one from the other via INSERT INTO .. AS SELECT ...). This is a separate concern from rdftab.

This issue can be immediately closed, just wanted to flag this if useful for planning future evolution.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions