-
Notifications
You must be signed in to change notification settings - Fork 202
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update Science Museum ingester with API changes #4105
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good, thanks so much for jumping on that so quickly! I was able to run an ingestion locally, however it looks like the URLs that we ingested are all giving me AccessDenied
errors 😞 here's a sample:
openledger> select title, url, foreign_landing_url from image where provider = 'sciencemuseum' limit 10;
+------------------------------------------------+------------------------------------------------------------------------------------+-----------------------------------------------------------------------------
-----------------------------------------+
| title | url | foreign_landing_url
|
|------------------------------------------------+------------------------------------------------------------------------------------+-----------------------------------------------------------------------------
-----------------------------------------|
| Excavated neolithic flint scraper | https://coimages.sciencemuseumgroup.org.uk/images/330/750/large_a634842__0003_.jpg | https://collection.sciencemuseumgroup.org.uk/objects/co106398/excavated-neol
ithic-flint-scraper-scrapers |
| Votive intestine | https://coimages.sciencemuseumgroup.org.uk/images/709/335/large_a635751__0005_.jpg | https://collection.sciencemuseumgroup.org.uk/objects/co83260/votive-intestin
e-votive-viscera |
| Roughly cylindrical sandstone mortar | https://coimages.sciencemuseumgroup.org.uk/images/442/761/large_smg00201374.jpg | https://collection.sciencemuseumgroup.org.uk/objects/co131060/roughly-cylind
rical-sandstone-mortar-mortars |
| Votive right hand | https://coimages.sciencemuseumgroup.org.uk/images/458/333/large_a73036__0002_.jpg | https://collection.sciencemuseumgroup.org.uk/objects/co82968/votive-right-ha
nd-votive-hand |
| Cautery, bronze, Roman, from Sforza collection | https://coimages.sciencemuseumgroup.org.uk/images/347/896/large_smg00190800.jpg | https://collection.sciencemuseumgroup.org.uk/objects/co87137/cautery-bronze-
roman-from-sforza-collection-cautery |
| Bronze coin | https://coimages.sciencemuseumgroup.org.uk/images/661/559/smg00015483__0001_.jpg | https://collection.sciencemuseumgroup.org.uk/objects/co83841/bronze-coin-coi
ns |
| Glass unguent bottle, Roman, 151 to 300 AD | https://coimages.sciencemuseumgroup.org.uk/images/362/329/large_smg00187927.jpg | https://collection.sciencemuseumgroup.org.uk/objects/co90128/glass-unguent-b
ottle-roman-151-to-300-ad-unguent-bottle |
| Probe with flat end and olive end, bronze | https://coimages.sciencemuseumgroup.org.uk/images/347/880/large_smg00190784.jpg | https://collection.sciencemuseumgroup.org.uk/objects/co88326/probe-with-flat
-end-and-olive-end-bronze-probe-medical |
| Votive heart(?), terracotta, probably Roman | https://coimages.sciencemuseumgroup.org.uk/images/458/341/large_a635759__0001_.jpg | https://collection.sciencemuseumgroup.org.uk/objects/co83268/votive-heart-te
rracotta-probably-roman-votive-viscera |
| Votive placenta | https://coimages.sciencemuseumgroup.org.uk/images/237/659/large_a114889__0001_.jpg | https://collection.sciencemuseumgroup.org.uk/objects/co83676/votive-placenta
-votive-viscera |
+------------------------------------------------+------------------------------------------------------------------------------------+-----------------------------------------------------------------------------
-----------------------------------------+
@AetherUnbound It looks like the url format changed, so it was breaking only in the instances where we were building the full url ourselves. This means most of our production URLs are currently broken as well, which I was able to confirm 😬 However, since this is not However I noticed #4013 again while testing this locally, so I've reopened that issue. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Confirmed the new URLs work!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Good catch on the failed URLs!
I left other suggestions but am not blocking on them since this is mainly aimed at reactivating the DAG and the PR as it is achieve it.
if not (maker := attributes.get("creation", {}).get("maker", [])): | ||
return None | ||
|
||
return maker[0].get("summary", {}).get("title", None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a lot of "Unknown maker" values in the creator
column (the test data is a good example), which I think would be more accurate to leave them as NULL
instead.
if not (maker := attributes.get("creation", {}).get("maker", [])): | |
return None | |
return maker[0].get("summary", {}).get("title", None) | |
if not (maker := attributes.get("creation", {}).get("maker", [])): | |
return None | |
creator = maker[0].get("summary", {}).get("title", None) | |
return creator if creator != "Unknown maker" else None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes sense, because most other creators are also unknown here :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, I can't decide -- theoretically there could be a difference between "no maker information was provided by the source" and "an authoritative (museum) source confirmed the maker is unknown". But maybe that's not a useful distinction. I would be curious if we do something similar for any of our other sources 🤔
I'll make a separate issue for this, mostly because it looks like we do have "Unknown maker" in production data at the moment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#4145 created!
catalog/tests/dags/providers/provider_api_scripts/resources/sciencemuseum/measurements.json
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to re-enable the science museum dag and kick off a new dagrun if one doesn't start automatically. I'll also make an issue to reenable the provider once we've had a full Dagrun to succeed and a data refresh to complete. |
Fixes
Fixes #4092 by @AetherUnbound
Description
Updates the ScienceMuseum ingester class to work with the changed API.
Testing Instructions
Tests should pass. Run the Science Museum DAG locally and observe that records are ingested.
I also downloaded the tsv from MinIO and compared it to the last pre-changes production tsv from January to make sure the data looks good.
Checklist
Update index.md
).main
) or a parent feature branch.Developer Certificate of Origin
Developer Certificate of Origin