Skip to content

Bottleneck in saving ACL with Fedora persister #7405

@dolsysmith

Description

@dolsysmith

Descriptive summary

As repository size grows with Hyrax 5/Fedora 6, time to save a new work becomes unmanageable (>2 minutes). This is not counting the time to ingest the file set, create the derivative, etc. Hyrax spends minutes just creating the metadata for the work itself.

Steps to reproduce the behavior in User Interface (UI)

  1. Populate a Fedora 6 repository with a few thousand works.
  2. Create a new work and save.
  3. Note that the UI times out.

Actual behavior (include screenshots if available)

Hyrax 5 (hyrax-81f74086cf84), Fedora 6.5.1 (Docker image: fcrepo/fcrepo:6.5.1-tomcat9), Dockerized application.

Using rbspy and TracePoint, I've determined that the app spends a lot of time in this Valkyrie method which calls this method of the ruby-rdf gem. The bottlenecking behavior occurs during the saving of the ACL object associated with a new work.

I patched Valkyrie::Persistence::Fedora::Persister::ModelConverter.convert to log some information at the completion of the method. Below are the logs from creating a single work in the UI (without attached files). A few things jump out at me:

  • The method is called first with the work model (GwEtd) metadata, then once for each instance of a Hyrax::Permission applied to the work (for the public/read and depositor/edit permissions) and once for an instance of Hyrax::AccessControl that owns those two permission objects.
    • Iterations 3 and 4 are "children" of iteration 2, which makes sense -- the permissions are saved then saved to the ACL.
    • All of this takes less than 1 second (the duration values are in seconds).
    • For this phase, graph_size is 0 -- this is a metric I added, a result of calling graph_resource.graph.count within the convert method (but after calculating the duration).
  • Starting with iteration 5, the bottleneck occurs.
    • The convert method is called twice with (presumably) the same instances of Hyrax::Permission as above, but this time the size of the graph has exploded.
    • As best I can tell, these calls coincide with a call to the Fedora API for the URI at rest/development, which returns all top-level containers on the graph. Probably as a result of processing so many nodes, each of these two iterations costs 60 seconds.
  • Iteration 8 is another "normal" (duration-wise) invocation, with an instance of Hyrax::Permission that sets the admin/edit permission.
  • Iteration 5 is the "parent" of iterations 6, 7, and 8. Its extraordinarily long duration is, I believe, partly an artifact of my patching, since it includes the time both of calling graph_resource.graph.delete in iterations 6 and 7, and of calling graph_resource.graph.count within those same two iterations.
{'method_iteration': 1,
 'resource_type': '#<GwEtd',
 'attributes': {'id': None,
                'internal_resource': 'GwEtd',
                'created_at': '2026-03-27T13:47:47.683Z',
                'updated_at': '2026-03-27T13:47:47.684Z',
                'new_record': True,
                'alternate_ids': [],
                'embargo_id': None,
                'lease_id': None,
                'title': ['IGNORE manually created'],
                'date_modified': '2026-03-27T13:47:47.339+00:00',
                'date_uploaded': '2026-03-27T13:47:47.339+00:00',
                'depositor': 'admin@example.com',
                'admin_set_id': {'id': 'acf631b0-5611-42b8-84ee-52cecc81e9b2'},
                'member_ids': [],
                'member_of_collection_ids': [],
                'on_behalf_of': None,
                'proxy_depositor': None,
                'state': {'@id': 'http://fedora.info/definitions/1/0/access/ObjState#active'},
        # .... additional attributes               
 },
 'graph_size': 0,
 'start_time': '2026-03-27T13:47:47.685+00:00',
 'end_time': '2026-03-27T13:47:47.707+00:00',
 'duration': 0.022
}

{'method_iteration': 2,
 'resource_type': '#<Hyrax::AccessControl',
 'attributes': {'id': None,
                'internal_resource': 'Hyrax::AccessControl',
                'created_at': '2026-03-27T13:47:49.784Z',
                'updated_at': '2026-03-27T13:47:49.784Z',
                'new_record': True,
                'access_to': {'id': 'c7e10119-26cc-4726-ba10-c733124bf635'},
                'permissions': [{'id': None,
                                 'internal_resource': 'Hyrax::Permission',
                                 'created_at': None,
                                 'updated_at': None,
                                 'new_record': True,
                                 'access_to': {'id': 'c7e10119-26cc-4726-ba10-c733124bf635'},
                                 'agent': 'group/public',
                                 'mode': 'read'},
                                {'id': None,
                                 'internal_resource': 'Hyrax::Permission',
                                 'created_at': None,
                                 'updated_at': None,
                                 'new_record': True,
                                 'access_to': {'id': 'c7e10119-26cc-4726-ba10-c733124bf635'},
                                 'agent': 'admin@example.com',
                                 'mode': 'edit'}]},
 'graph_size': 0,
 'start_time': '2026-03-27T13:47:49.785+00:00',
 'end_time': '2026-03-27T13:47:49.823+00:00',
 'duration': 0.038
}

{'method_iteration': 3,
 'resource_type': '#<Hyrax::Permission',
 'attributes': {'id': None,
                'internal_resource': 'Hyrax::Permission',
                'created_at': None,
                'updated_at': None,
                'new_record': True,
                'access_to': {'id': 'c7e10119-26cc-4726-ba10-c733124bf635'},
                'agent': 'group/public',
                'mode': 'read'},
 'graph_size': 0,
 'start_time': '2026-03-27T13:47:49.798+00:00',
 'end_time': '2026-03-27T13:47:49.808+00:00',
 'duration': 0.01
}

{'method_iteration': 4,
 'resource_type': '#<Hyrax::Permission',
 'attributes': {'id': None,
                'internal_resource': 'Hyrax::Permission',
                'created_at': None,
                'updated_at': None,
                'new_record': True,
                'access_to': {'id': 'c7e10119-26cc-4726-ba10-c733124bf635'},
                'agent': 'admin@example.com',
                'mode': 'edit'},
 'graph_size': 0,
 'start_time': '2026-03-27T13:47:49.812+00:00',
 'end_time': '2026-03-27T13:47:49.820+00:00',
 'duration': 0.008}

{'method_iteration': 5,
 'resource_type': '#<Hyrax::AccessControl',
 'attributes': {'id': {'id': '2a337d6c-0302-4cee-a55a-1dfcdaed95f9'},
                'internal_resource': 'Hyrax::AccessControl',
                'created_at': '2026-03-27T13:47:49.784+00:00',
                'updated_at': '2026-03-27T13:47:52.006Z',
                'new_record': False,
                'access_to': {'id': 'c7e10119-26cc-4726-ba10-c733124bf635'},
                'permissions': [{'id': {'id': '#g599160'},
                                 'internal_resource': 'Hyrax::Permission',
                                 'created_at': None,
                                 'updated_at': None,
                                 'new_record': False,
                                 'access_to': {'id': 'c7e10119-26cc-4726-ba10-c733124bf635'},
                                 'agent': 'group/public',
                                 'mode': 'read'},
                                {'id': {'id': '#g599220'},
                                 'internal_resource': 'Hyrax::Permission',
                                 'created_at': None,
                                 'updated_at': None,
                                 'new_record': False,
                                 'access_to': {'id': 'c7e10119-26cc-4726-ba10-c733124bf635'},
                                 'agent': 'admin@example.com',
                                 'mode': 'edit'},
                                {'id': None,
                                 'internal_resource': 'Hyrax::Permission',
                                 'created_at': None,
                                 'updated_at': None,
                                 'new_record': True,
                                 'access_to': {'id': 'c7e10119-26cc-4726-ba10-c733124bf635'},
                                 'agent': 'group/admin',
                                 'mode': 'edit'}]},
 'graph_size': 24,
 'start_time': '2026-03-27T13:47:52.127+00:00',
 'end_time': '2026-03-27T13:51:53.041+00:00',
 'duration': 240.914
}

{'method_iteration': 6,
 'resource_type': '#<Hyrax::Permission',
 'attributes': {'id': {'id': '#g599160'},
                'internal_resource': 'Hyrax::Permission',
                'created_at': None,
                'updated_at': None,
                'new_record': False,
                'access_to': {'id': 'c7e10119-26cc-4726-ba10-c733124bf635'},
                'agent': 'group/public',
                'mode': 'read'},
 'graph_size': 27362,
 'start_time': '2026-03-27T13:48:54.350+00:00',
 'end_time': '2026-03-27T13:49:56.201+00:00',
 'duration': 61.851
}

{'method_iteration': 7,
 'resource_type': '#<Hyrax::Permission',
 'attributes': {'id': {'id': '#g599220'},
                'internal_resource': 'Hyrax::Permission',
                'created_at': None,
                'updated_at': None,
                'new_record': False,
                'access_to': {'id': 'c7e10119-26cc-4726-ba10-c733124bf635'},
                'agent': 'admin@example.com',
                'mode': 'edit'},
 'graph_size': 27362,
 'start_time': '2026-03-27T13:50:46.324+00:00',
 'end_time': '2026-03-27T13:51:53.038+00:00',
 'duration': 66.714
}

{'method_iteration': 8,
 'resource_type': '#<Hyrax::Permission',
 'attributes': {'id': None,
                'internal_resource': 'Hyrax::Permission',
                'created_at': None,
                'updated_at': None,
                'new_record': True,
                'access_to': {'id': 'c7e10119-26cc-4726-ba10-c733124bf635'},
                'agent': 'group/admin',
                'mode': 'edit'},
 'graph_size': 0,
 'start_time': '2026-03-27T13:51:53.038+00:00',
 'end_time': '2026-03-27T13:51:53.039+00:00',
 'duration': 0.001
}

I am not sure why those two additional calls (iterations 6 and 7) to the convert method occur, nor why they pass in the full repository graph, as it were. That doesn't seem like it should be necessary to persist an ACL that points to a single work. If it weren't for this bottleneck, I think the step of saving a work would complete in a reasonable amount of time.

Acceptance Criteria/Expected Behavior

  • Saving a new work to a repository containing numerous works takes a reasonable amount of time.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions