Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 8 additions & 7 deletions CONTRIBUTORS.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,16 +2,17 @@

The following people have contributed to the TeSS codebase:

* Niall Beard
* Aleksandra Nenadic
* Milo Thurston
* Finn Bacall <finn.bacall@manchester.ac.uk>
* Servilio Afre Puentes <afrepues@sharcnet.ca>
* Aitor Apaolaza
* Finn Bacall <finn.bacall@manchester.ac.uk>
* Niall Beard
* Chris Child
* Ivan Kuzmin <ivan.kuzmin@ut.ee>
* Nick May
* Daan van Vugt <dvanvugt@ignitioncomputing.com>
* Aleksandra Nenadic
* Xènia Pérez Sitjà <https://orcid.org/0000-0002-7166-0183>
* Ivan Kuzmin <ivan.kuzmin@ut.ee>
* Servilio Afre Puentes <afrepues@sharcnet.ca>
* Kenneth Rioja <kenneth.brian.rioja@cern.ch>
* Mike Sanders <msanders@ignitioncomputing.com>
* Milo Thurston
* Daan van Vugt <dvanvugt@ignitioncomputing.com>
* Martin Voigt <m.voigt@hzdr.de>
12 changes: 11 additions & 1 deletion lib/ingestors/bioschemas_ingestor.rb
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ def read(source_url)
sitemap_regex = nil
@verbose = false
sources = if source_url.downcase.match?(/sitemap(.*)?.xml\Z/)
sitemap_message = "Parsing sitemap: #{source_url}\n"
sitemap_message = "Parsing .xml sitemap: #{source_url}\n"
urls = SitemapParser.new(source_url, {
recurse: true,
url_regex: sitemap_regex,
Expand All @@ -28,13 +28,20 @@ def read(source_url)
sitemap_message << "\n - #{urls.count} URLs found"
@messages << sitemap_message
urls
elsif source_url.downcase.match?(/sitemap(.*)?.txt\Z/)
sitemap_message = "Parsing .txt sitemap: #{source_url}\n"
urls = open_url(source_url).to_a.uniq.map(&:strip)
sitemap_message << "\n - #{urls.count} URLs found"
@messages << sitemap_message
urls
else
[source_url]
end

provider_events = []
provider_materials = []
totals = Hash.new(0)
no_bioschema_urls = "Bioschemas not found in:\n"
sources.each do |url|
source = open_url(url)
output = read_content(source, url: url)
Expand All @@ -44,6 +51,7 @@ def read(source_url)
output[:totals].each do |key, value|
totals[key] += value
end
no_bioschema_urls << "\n - #{url}" if !source.nil? && output[:totals].values.sum.zero?
end
end

Expand All @@ -55,6 +63,8 @@ def read(source_url)
@messages << bioschemas_summary
end

@messages << no_bioschema_urls

deduplicate(provider_events).each do |event_params|
add_event(event_params)
end
Expand Down
Loading
Loading