One-off backfill script for Walters Museum data

## Description

We will need a one-off crawling script which will run against the data that we have stored in the catalog from the provider `waltersartmuseum` to acquire image `height`, `width`, `filesize`, and `filetype` information from each of the still-available images. This information will be used to update the records in the catalog, and is necessary for the data normalization steps we wish to perform (see WordPress/openverse#1545 and WordPress/openverse#1485).

This should be able to be accomplished using a polite crawler which visits all of the `url` values we have for Walters data. I don't believe we'll be able to use `HEAD` requests only since we will actually need the image data itself to determine `height` and `width`:

```python
[ins] In [1]: import requests

[ins] In [2]: r = requests.head("https://static.thewalters.org/images/PS1_37.1513_Fnt_DD_T11.jpg")

[ins] In [4]: r.headers
Out[4]: {'Content-Length': '471138', 'Content-Type': 'image/jpeg', 'Last-Modified': 'Mon, 18 Nov 2013 23:36:26 GMT', 'Accept-Ranges': 'bytes', 'ETag': '"1cc97fffb6e4ce1:0"', 'Server': 'Microsoft-IIS/7.5', 'X-Powered-By': 'ASP.NET', 'Date': 'Wed, 05 Oct 2022 21:26:36 GMT'}
```

## Additional context

This rationale for this decision can be found on our Make WP blog: [Next steps for Walters Art Museum data](https://make.wordpress.org/openverse/2022/09/30/next-steps-for-walters-art-museum-data/).

## Implementation

- [ ] 🙋 I would be interested in implementing this feature.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

One-off backfill script for Walters Museum data #1416

AetherUnbound
openedon Oct 5, 2022

Description

Additional context

Implementation

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

One-off backfill script for Walters Museum data #1416

Description

AetherUnboundopenedon Oct 5, 2022

Description

Additional context

Implementation

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

AetherUnbound
openedon Oct 5, 2022