Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Frijya and Frigg Adapters for Migrating Collections, AdminSets and Works to Postgres or Fcrepo 6 #6221

Merged
merged 317 commits into from
Jun 25, 2024

Conversation

orangewolf
Copy link
Member

@orangewolf orangewolf commented Aug 25, 2023

Overview

First, apologies. The nature of the work has been that we've been upgrading Hyku and testing Hyku against this branch. As such, it's a larger PR than we might normally prefer.

Type of change (for release notes)

This PR includes two primary considerations:

  1. Code changes that allow for a "lazy migration" of data (see discussion below).
  2. Refactorings that allow for downstream applications to configure behavior instead of override entire files.

Further Discussion

The primary work introduces the Frigga and Freyja metadata adapter; these adapters are configured to:

  • Read first from one storage location, and failing to find something read from another (e.g. storage_first.find || storage_second.find if you will)
  • Write to the first storage location, ignoring the second storage location.

Through configuration and convention, the Frigga and Freyja adapters write similar looking Solr documents as their ActiveFedora based counterparts.

Frigga's adapter's first storage location is Postgres via Valkyrie, and it's second storage location is fcrepo via ActiveFedora.

Freyja's adapter's first storage location is fcrepo via Valkyrie, and it's second storage location is an fcrepo via ActiveFedora. Note, this is likely Fedora 6 and Fedora 4 respectively.

What this means is that these adapters provide a pathway for minimal downtime during a storage migration; something demonstrated by GBH's AMS project and presented on at Samvera Connect 2023 by @orangewolf.

Guidance for testing, such as acceptance criteria or new user interface behaviors:

There are no known new user behaviors. Instead we're relying on the test suite as well as downstream testing of Hyku migrations to help verify.

@samvera/hyrax-code-reviewers

# frozen_string_literal: true

module Freyja
class QueryService
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd 👍 adding a composite query service like this in Valkyrie if you wanted.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems like a good plan to get through it in Hyrax land and then port it. Since I'm modifying a valkyrie branch to make the shared examples flexible enough for this use case, it should slot over super easy if we get to that point.

Prior to this commit, when we had an event with payload keys of `:id`
and `:user`, accessing the `event[:object]` raised a KeyNotFound error.

This coercion of event to a hash helps with expectations.
This spec doesn't test methods in the `described_class`, but instead
asserts behavior in a delegated method's low-level implementation.

The spec could be more appropriate in the delegated object's unit tests.
@jeremyf
Copy link
Contributor

jeremyf commented Jan 22, 2024

The goal of double_combo approach is to minimize downtime by allowing for in-place migrations.

Note: In this scenario, I’m using Fedora4 as the name for the legacy persistence layer, and Postgres as a symbolic name for the metadata adapter’s persistence layer; we could just as easily use Fedora4 but for clarifying purposes I’m separating these concepts.

This is done by leveraging a copy on write (CoW) type strategy. Below are two examples:

Given I wrote Work A to Fedora4 via ActiveFedora
And wrote Work A to Solr via ActiveFedora
And did not write Work A to Postgres
When I use Valkyrie to find Work A
Then Valkyrie will first check Postgress for Work A
And Valkyrie will next check Fedora4 for Work A

From the above scenario:

Given I have found Work A via Valkyrie
When I use Valkyrie to save Work A
Then Valkyrie will write to Postgres
And Valkyrie will update the Solr document originally written by ActiveFedora

At the point of completing the write scenario, when we go to find “Work A” via Valkyrie, we’ll first check and find the record in Postgres.

Consider that the Fedora4 repository also has works B, C, D, and E. And all of those are part of Collection Alpha. When rendering the contents of the collection, we leverage the Solr document representations of each of the works. This is done by creating presenters. To avoid presentation drift, we would like for A, B, C, D, and E to have consistent presenters regardless of whether they were persisted via ActiveFedora or Valkyrie.

Put another way, the Presenter for Work A’s solr document written by ActiveFedora and the Presenter for Work A’s solr document written by Valkyrie should have the same methods and each of those methods used for rendering should return the same values.

To reduce duplication, it would be ideal for both of those presenters to in fact be the same presenter. It then follows that we would want to minimize branching logic between documents written via ActiveFedora and those written by Valkyrie; the greatest minimization would be for ActiveFedora and Valkyrie to generate logically equivalent solr documents. I say logically equivalent in that there are administrative benefits for tracking in Solr which documents have been valkyrized (see valkyrie_bsi).

This leads to the aspiration that the has_model_ssim should be stable between a document written via ActiveFedora and one written via Valkyrie.

The has_model_ssim is a foundational attribute in that presenters use this for switching logic and we use it for Solr querying logic. See Hyrax::SolrDocumentBehavior#hydra_model and it’s uses.

jeremyf added a commit to scientist-softserv/iiif_print that referenced this pull request Jan 22, 2024
Fundamental assumption, based on the `double_combo` branch of Hyrax is
that we will be able to use Valkyrie queries to Solr and Fedora for all
objects in the repository.

See samvera/hyrax#6221 (comment)
* main:
  Marks ActiveFedora tests in spec/services/hyrax/workflow/status_list_service_spec.rb.
  Marks spec/lib/hyrax/resource_sync/resource_list_writer_spec.rb as ActiveFedora-only.
  Changes enum to array in spec/forms/hyrax/forms/administrative_set_form_spec.rb.
  Converts collections in spec/tasks/rake_spec.rb.
  Adds creator to spec/features/lease_spec.rb.
  Marks ActiveFedora tests in spec/models/sipity_spec.rb.
  Marks spec/services/hyrax/import_url_failure_service_spec.rb as ActiveFedora-only. (#6617)
  generalize Sipity::Entity() test for solr documents over classes
  Valkyrizes spec/features/read_only_mode_spec.rb.
Prior to this commit, `valkyrie?` was always returning the array
`['valkyrie_bsi']`.  The intention, however, was to look at the
`SolrDocument#[]` method.

Perhaps Rubocop had opinions about not needing a self method?
Regardless, a few simple tests later, and we have what is the intended
behavior.
The only place leveraging the `Hyrax::ResourceIndexer` mixin was the the
`Hyrax::Indexers::ResourceIndexer`.  Further we had a case where the
Solr Document's `has_model_ssim` was trampled by the mixin's behavior.

Both settings of the `has_model_ssim` likely set the same value, but did
so using two different methods.  And given that, my preference would be
to favor a single dot `.` call (e.g. `resource.internal_resource`)
instead of a double dot (e.g. `resource.class.name`).
Instead of the dubious internal_resource; which might be adequate.  But
we'll need wisdom of the team to understand how we should be working on
things.
* main:
  Fixes flaky test in spec/jobs/embargo_expiry_job_spec.rb.
  Valkyrizes spec/support/features/batch_edit_actions.rb.
  Valkyrizes spec/services/hyrax/statistics/works/count_spec.rb.
  Edit lease_expiry_job_spec.rb
Prior to this change, we were asserting specific counts.  But the nature
of the spec was to in fact assert that changes were and were not made to
the respective adapters.
Sad panda, as the prior version was visually elegant but implementation
details became a problem.  Perhaps this is indicative of a problem we've
introduced.  But for now, I'm looking to put this forward as "still
passing" so we can then have a conversation about what this means.
@dlpierce
Copy link
Contributor

dlpierce commented Jun 4, 2024

I think we're getting close. I feel like the changes in dassie to enable Freyja need an on/off env toggle.

dlpierce and others added 14 commits June 4, 2024 17:09
* add fallback support to find_count_by

* call the cops

* call the cops
* Add toggle to dassie to turn freyja on or off

* turn valkyrie transition off by default
Module form_fields defaulted terms to `display: true` but sometimes
terms are needed without showing on the form. This allows gems to
define custom terms in a yaml, without them appearing on a form.
dlpierce
dlpierce previously approved these changes Jun 21, 2024
orangewolf and others added 5 commits June 24, 2024 10:10
Wings now assigns and indexes a Hydra::AccessControl object that throws a wrench into the counts of this spec.
The queries here now only look for the expected model type.

Also cleaned up a couple lets that appear unused.
@dlpierce dlpierce merged commit d4e646c into main Jun 25, 2024
21 of 22 checks passed
@dlpierce dlpierce deleted the double_combo branch June 25, 2024 03:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
notes-major Release Notes: Potentially breaking changes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants