Skip to content

Spike: Versioning

Erik Hetzner edited this page Apr 20, 2018 · 1 revision

Versioning Spike

Background

Aperta has long lacked a coherent concept and model of versions for papers. We have multiple different models for versions depending on whether we are storing versions for a papers content, for its attachments, or for the answers to questions. Some things are not versioned at all, or if they are, the old version data is stored in opaque JSON blobs. Retrieving the state of a paper in the past is very hard and complicated.

This spike is an attempt to bring us into a world where we have a sensible mental model, an appropriate data model, and steps to achieve that data model incrementally, bringing the different data models into a coherent data model piece by piece.

Technical goals

  • A consistent data model for versions
  • Retrieving information about an older version should be as easy as retrieving data about the current version: they should use the same structure
  • It should be possible to lock down older version data
  • While our current use case should include numbered versions, e.g. v0.0, v0.1, v1.0, etc., we should allow the possibility of storing intermediary versions in the future

What I did

I created a testbed rails application with the data models we can use. https://github.com/Tahi-project/versioning-spike

The idea is to have a one-to-many relationship between a paper and its versions. A row in the versions table represents one and only one version, and each version is represented by one row in the versions table. This table can be used to pull together all the information about a version by relations with the versions table.

We can then use a many-to-many or many-to-one intermediary model between this versions model and anything we want to version.

Model notes

  • Guidelines for database design
    • If the thing (some piece of information about a paper) never changes, add a row to the papers table (e.g. creator).
    • If there can be multiple things on a given version, e.g. answers, create a join table from version to answers to represent a many-to-many relationship
      • This is necessary because all, any, or none of the items could change between versions. A many to many join means that, e.g. an author could be shared across multiple versions (because it does not change) and any version can have many authors
      • See the answer model in the spike repository
    • If the thing to be versioned can have multiple “versions” within a version, e.g. we want to keep each docx that is uploaded, we can define a many-to-many join table with a “sub_version” (better names requested) integer column
      • When a new version is created, we simply copy the latest sub_version over to the new version and give it sub_version: 1
      • Each additional uploaded docx has an incremented sub_version column
      • This allows us to save all attachments uploaded, associate them with a version and start new versions with the a good starting point
      • I have not yet spiked this out, but it should be easy to do

How it works

To create a new version of an article:

  • Create new row in version table
  • For every row in join tables where version_id == the previous version
    • Create a new row in versioned thing join table pointing at the new version and the existing thing
  • For every row in a sub versioned join table where version_id == the previous version
    • Create a new new row with sub_version 1 in the sub versioned join table pointing at the new version and the existing thing
  • Mark new version as latest version

To edit a deduplicated versioned has-many thing:

  • If this thing belongs to old versions only, prevent change
  • If this thing belongs to a latest version:
    • If this thing is shared by multiple versions (that is, if the versioned thing join table contains > 1 entry where thing_id == this things id)
      • Create a copy of this thing and update the thing_id for the old versions to point at that new thing_id
      • Go to next step
    • If this thing is used only by the latest version (if the versioned thing join table contains 1 entry where thing_id == this things id)
      • Modify the thing table

To find an articles deduplicated versioned has-many thing for a given version:

  • Query version join where version_id == the version you want

Advantages:

  • Fast versioning
  • Fast saves once the copy is made
  • Reasonably fast saves even when we need to create a new "thing" row
  • Possible to prevent changes to previous versions
  • Reasonably easy reconstitution of a previous version (as easy as finding the latest version)
  • Extendable to any existing data model (we can put a join table in between any of them and a version)

Disadvantages:

  • Complicates the data model
  • Will require hooking into before_update or similar hooks in active record in a possible complex way

Steps to migrate our existing code

  • New infrastructure
    • Rename “versioned_text” to “paper_version”
      • We will need to migrate S3 asset locations
  • Funder migration
    • Move funders model through versions model
      • Backfill old versions with snapshots
      • Continue creating snapshots, but also use new versions+funder model
    • Serialize new versioned funder to frontend
    • Change frontend diffing to use new funder version model
  • Refine migration based on lessons learned from funder migration
  • Author migration
    • Consolidate author/group author models, moving position column to authors
      model
    • Repeat steps of funder migration
  • Repeat for answers
  • Repeat for figures
  • Repeat for reviewer recommendations
  • Clean up
    • Remove old snapshot code
  • Attachment migrations
    • Create new intermediary join for version_attachments
      • Move latest attachments to that
      • Migrate snapshots to the new model
    • Flatten out `versions` (paper_trail) into the new model
    • This work is probably going to be pretty complicated, since we have both paper_trail, and snapshots, and our existing deduplication system for attachments

Attachments:

[Versioned Thing Versioned Thing Schema (application/gliffy+json)
Versioned Thing Schema.png (image/png)

Clone this wiki locally