Skip to content

Use alternate method for getting fast subset of rows #736

Closed as not planned

Description

Problem

PR WordPress/openverse-api#474 introduced an approach to creating a pseudo-random subset by ordering the primary query on identifier. Unfortunately, while I thought the index on identifier would help out, it appears that the query still takes an incredibly long time to return results.

Description

We don't really care about true randomness or even an exact number of records selected, so we could potentially use an approach like this involving TABLESAMPLE_SYSTEM to get a fast subset: https://stackoverflow.com/a/8675160/3277713. One thing to consider here is ensuring this is robust during integration testing and copies sufficient data in that case as well. It may be necessary to base the estimate off table count and provide a bare minimum number of rows.

Alternatives

Additional context

Implementation

  • 🙋 I would be interested in implementing this feature.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Labels

    ✨ goal: improvementImprovement to an existing user-facing feature💻 aspect: codeConcerns the software code in the repository🟩 priority: lowLow priority and doesn't need to be rushed🧱 stack: apiRelated to the Django API

    Type

    No type

    Projects

    • Status

      🗑 Discarded

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions