Skip to content

Conversation

@eilmiv
Copy link
Collaborator

@eilmiv eilmiv commented Nov 25, 2025

Summary of changes

  • Add ingestor for OAI-PMH endpoints that ingests materials and events based on Dublin Core metadata
  • Add ingestor for OAI-PMH endpoints that ingests materials and events based on Bioschemas RDF metadata

Motivation and context

This is a relevant step in the mTeSS-X project. In particular, it is MR11. See #1179.

Checklist

  • I have read and followed the CONTRIBUTING guide.
  • I confirm that I have the authority necessary to make this contribution on behalf of its copyright owner and agree
    to license it to the TeSS codebase under the
    BSD license.

@eilmiv eilmiv requested a review from fbacall February 5, 2026 15:27
@eilmiv eilmiv marked this pull request as ready for review February 5, 2026 15:27
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new OAI-PMH ingestion pathway to TeSS to import Events and Materials from remote OAI-PMH endpoints, with support for both Dublin Core and Bioschemas RDF metadata (MR11 / #1179).

Changes:

  • Introduce Ingestors::OaiPmhIngestor with RDF-first (Bioschemas) parsing and Dublin Core fallback.
  • Register the new ingestor in IngestorFactory.
  • Add unit tests covering empty endpoints, Dublin Core materials/events, mixed records, and Bioschemas RDF parsing.
  • Fix HTML body tag class attribute quoting in the application layout.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.

File Description
lib/ingestors/oai_pmh_ingestor.rb New OAI-PMH ingestor implementation for RDF (Bioschemas) and Dublin Core ingestion.
lib/ingestors/ingestor_factory.rb Registers Ingestors::OaiPmhIngestor as an available ingestor.
test/unit/ingestors/oai_pmh_test.rb Adds unit tests for OAI-PMH ingestion behavior and parsing outcomes.
app/views/layouts/application.html.erb Quotes body class attribute to produce valid HTML when multiple classes are emitted.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.


def read_oai_dublin_core(client)
count = 0
client.list_records.full.each do |record|
Copy link

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

read_oai_dublin_core calls client.list_records without a metadata_prefix. In OAI-PMH, ListRecords requests require metadataPrefix, and many servers/gems will return a badArgument / raise when it’s missing. Consider explicitly requesting Dublin Core (typically oai_dc) here so this works against spec-compliant endpoints.

Suggested change
client.list_records.full.each do |record|
client.list_records(metadata_prefix: 'oai_dc').full.each do |record|

Copilot uses AI. Check for mistakes.
end

def read(source_url)
client = OAI::Client.new source_url, headers: { 'From' => config[:mail] }
Copy link

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

config[:user_agent] is defined but not used when instantiating OAI::Client. Consider passing it via headers (alongside From) so requests identify TeSS consistently, matching how other ingestors set User-Agent on outbound HTTP requests.

Suggested change
client = OAI::Client.new source_url, headers: { 'From' => config[:mail] }
client = OAI::Client.new source_url, headers: { 'From' => config[:mail], 'User-Agent' => config[:user_agent] }

Copilot uses AI. Check for mistakes.
@@ -0,0 +1,208 @@
require 'open-uri'
Copy link

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

require 'open-uri' doesn’t appear to be used in this ingestor (it doesn’t call open_url or URI.open). Consider removing the unused require to keep dependencies minimal.

Suggested change
require 'open-uri'

Copilot uses AI. Check for mistakes.
Comment on lines +25 to +30
def list_records(metadata_prefix: nil)
if metadata_prefix == 'rdf'
@rdf_response
else
@dc_response
end
Copy link

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FakeClient#list_records treats a missing metadata_prefix (nil) as Dublin Core. Since OAI-PMH ListRecords requests normally require an explicit metadataPrefix, consider making the mock require/handle the explicit DC prefix (e.g., oai_dc) so the tests exercise the same call pattern as production code.

Copilot uses AI. Check for mistakes.
Comment on lines +45 to +46
assert_equal @ingestor.materials, []
assert_equal @ingestor.events, []
Copy link

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These assertions have the expected/actual arguments reversed. Swapping them to assert_equal [], @ingestor.materials (and similarly for events) will produce clearer failure messages and matches the assertion style used elsewhere in the test suite.

Suggested change
assert_equal @ingestor.materials, []
assert_equal @ingestor.events, []
assert_equal [], @ingestor.materials
assert_equal [], @ingestor.events

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant