Description
Description
We will need a one-off crawling script which will run against the data that we have stored in the catalog from the provider waltersartmuseum
to acquire image height
, width
, filesize
, and filetype
information from each of the still-available images. This information will be used to update the records in the catalog, and is necessary for the data normalization steps we wish to perform (see #1545 and #1485).
This should be able to be accomplished using a polite crawler which visits all of the url
values we have for Walters data. I don't believe we'll be able to use HEAD
requests only since we will actually need the image data itself to determine height
and width
:
[ins] In [1]: import requests
[ins] In [2]: r = requests.head("https://static.thewalters.org/images/PS1_37.1513_Fnt_DD_T11.jpg")
[ins] In [4]: r.headers
Out[4]: {'Content-Length': '471138', 'Content-Type': 'image/jpeg', 'Last-Modified': 'Mon, 18 Nov 2013 23:36:26 GMT', 'Accept-Ranges': 'bytes', 'ETag': '"1cc97fffb6e4ce1:0"', 'Server': 'Microsoft-IIS/7.5', 'X-Powered-By': 'ASP.NET', 'Date': 'Wed, 05 Oct 2022 21:26:36 GMT'}
Additional context
This rationale for this decision can be found on our Make WP blog: Next steps for Walters Art Museum data.
Implementation
- 🙋 I would be interested in implementing this feature.
Metadata
Assignees
Labels
Type
Projects
Status
📋 Backlog