This tool scrapes the public metadata sources from set of stratum0 and stratum1 servers. It grabs:
- cvmfs/info/v1/repositories.json
And then for every repo it finds (that it's not told to ignore), it grabs:
- cvmfs/<repo>/.cvmfs_status.json
- cvmfs/<repo>/.cvmfspublished
#!/usr/bin/env python3
from cvmfsscraper import scrape, scrape_server
# server = scrape_server("aws-eu-west1.stratum1.cvmfs.eessi-infra.org")
servers = scrape(
servers = [
"aws-eu-west1.stratum1.cvmfs.eessi-infra.org",
"bgo-no.stratum1.cvmfs.eessi-infra.org",
],
ignore_repos = [
"ci.eessi-hpc.org",
],
)
print(servers[0])
for repo in servers[0].repositories:
print("Repo: " + repo.name )
print("Root size: " + repo.root_size)
print("Revision: " + repo.revision)
print("Revision timestamp: " + repo.revision_timestamp)
print("Last snapshot: " + str(repo.last_snapshot))
A server object, representing a specific server that has been scraped.
servers = scrape(...)
server_one = servers[0]
server.name
The name of the server, usually its fully qualified domain name.
server.geoapi_status
An integer value within [0, 1, 2, 9]
, with the following meaning:
- 0 : OK
- 1 : GeoApi gives wrong location
- 2 : No response
- 9 : The server has no repository available so the GeoApi cannot be tested
server.repositories
A list of repository objects, empty if no repositores are scraped on the server.
server.ignored_repositories
List of repositories names that are to be ignored by the scraper.
server.forced_repositories
A list of repository names that the server is forced to scrape. If a repo name exists in both ignored_repositories and forced_repositories, it will be scraped.
A repository object, representing a single repository on a scraped server.
servers = scrape(...)
repo_one = servers[0].repositories[0]
repo_one.name
The fully qualified name of the repository.
repo_one.server
The server object to which the repository belongs.
repo_one.path
The path for the repository on the server. May differ from the name. To get a complete URL, one can do:
url = "http://" + repo_one.server.name + repo_one.path
These attributes are populated from cvmfs_status.json
:
Attribute | Value |
---|---|
last_gc | Timestamp of last garbage collection |
last_snapshot | Timestamp of the last snapshot |
Information from .cvmfspublished
is also provided. For explanations for these keys, please see CVMFS' official documentation. The field value in the table is the field key from .cvmfspublished
.
Attribute | Field |
---|---|
alternative_name | A |
full_name | N |
is_garbage_collectable | G |
metadata_cryptographic_hash | M |
micro_cataogues | L |
reflog_checksum_cryptographic_hash | Y |
revision_timestamp | T |
root_catalogue_ttl | D |
root_cryptographic_hash | C |
root_size | B |
root_path_hash | R |
signature | The end signature blob |
signing_certificate_cryptographic_hash | X |
tag_history_cryptographic_hash | H |