Skip to content

Performance bottleneck in nested model conversion for Fedora persister #992

@dolsysmith

Description

@dolsysmith

Cross-referencing this issue on the Samvera/Hyrax repo.

When using the Fedora persister with Hyrax 5/Fedora 6, I'm seeing a bottleneck in saving ACL objects (causing performance to degrade as the size of the repository scales). I believe that the root cause lies in the behavior of the nested_graph method of Fedora::Persister::ModelConverter::NestedProperty.

Because Hyrax::AccessControl objects nest Hyrax::Permission objects within themselves, saving the ACL triggers the creation of a nested graph. However, the instance of ModelConverter created when populating the nested graph receives a value for subject_uri that causes the graph to be populated with all of the objects under the top-level container in the repository (which in turns causes performance degradation).

Here is the code I'm referring to:

@nested_graph ||= ModelConverter.new(resource: Valkyrie::Types::Anything[value.value], adapter: value.adapter, subject_uri: subject_uri).convert.graph

And subject_uri comes from this method:

   def subject_uri
          @subject_uri ||= ::RDF::URI(RDF::Node.new.to_s.gsub("_:", "#"))
   end

As a result, the subject_uri passed to the ModelConverter instance for the Hyrax::Permission resource has a value like this: #<RDF::URI:0x927e8 URI:#g600020>. And this truncated URI is then passed to the Fedora API, which returns all objects directly under the base path (e.g., http://localhost:8080/fcrepo/rest/development).

This behavior, which occurs only for nested properties, differs from the non-nested case, where instances of ModelConverter are created with an empty RDF::URI.

To demonstrate, I inserted some logging into the initializer for the ModelCoverter class. Here are the results of the calls to this initializer when persisting an update to a single ACL:

[{'model': '#<GwEtd',
  'id': '"8aad0784-3e7a-4a9a-9348-c72fb3a0b80e">',
  'subject_uri': '#<RDF::URI:0x92608 URI:>'},
 {'model': '#<Hyrax::AccessControl',
  'id': '"7a20d68e-dd96-4a56-8215-9fb7b7a6f5bc">',
  'subject_uri': '#<RDF::URI:0x926f8 URI:>'},
 {'model': '#<Hyrax::Permission',
  'subject_uri': '#<RDF::URI:0x92748 URI:#g599860>'},
 {'model': '#<Hyrax::Permission',
  'subject_uri': '#<RDF::URI:0x92798 URI:#g599940>'},
 {'model': '#<Hyrax::Permission',
  'subject_uri': '#<RDF::URI:0x927e8 URI:#g600020>'}]

(The model and id values have been extracted from the resource argument to the initializer.)

Can the model converter for the nested property be initialized with an empty URI, just like the other resources? Is there a need for the anonymous node when working with nested resources?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions