Use alternate method for getting fast subset of rows

## Problem

PR WordPress/openverse-api#474 introduced an approach to creating a pseudo-random subset by ordering the primary query on `identifier`. Unfortunately, while I thought the index on `identifier` would help out, it appears that the query still takes an incredibly long time to return results.

## Description

We don't really care about true randomness or even an exact number of records selected, so we could potentially use an approach like this involving `TABLESAMPLE_SYSTEM` to get a fast subset: https://stackoverflow.com/a/8675160/3277713. One thing to consider here is ensuring this is robust during integration testing and copies sufficient data in that case as well. It may be necessary to base the estimate off table count and provide a bare minimum number of rows.

## Alternatives


## Additional context


## Implementation

- [ ] 🙋 I would be interested in implementing this feature.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use alternate method for getting fast subset of rows #736

AetherUnbound
openedon Jan 21, 2022

Problem

Description

Alternatives

Additional context

Implementation

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Use alternate method for getting fast subset of rows #736

Description

AetherUnboundopenedon Jan 21, 2022

Problem

Description

Alternatives

Additional context

Implementation

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

AetherUnbound
openedon Jan 21, 2022