Convert search endpoints to asynchronous

## Problem


Since we have converted the application to ASGI, we can now benefit from the async Elasticsearch client. Requests to Elasticsearch are some of the longest running blocking operations in our application. By using the async client, we can remove that block when waiting for queries to come back from Elasticsearch.

## Description


The primary thing to convert in the search route is the Elasticsearch client usage. [The asynchronous client swaps the underlying "node" (the request engine the ES Client uses) with aiohttp](https://www.elastic.co/guide/en/elasticsearch/client/python-api/current/config.html#_node_implementations). We can't just swap out all usages of the Elasticsearch client for the async client, however, so we'll need to maintain both the synchronous and asynchronous for a period of time while we switch over from sync to async in all our usage of the client.

To do this, update the `_elasticsearch_connect` function to return both a synchronous and asynchronous client. Rename the existing synchronous `ES` client to `SYNC_ES` and add a new `ASYNC_ES` assigned to the asynchronous client. Update all usages of `settings.ES` to `settings.SYNC_ES`.

Now for the complex part of this issue: the Elasticsearch DSL library, which we use for our normal search routine, [does not yet support the async Elasticsearch client](https://github.com/elastic/elasticsearch-dsl-py/issues/1355). We can work around this, however, by not using `Search::execute` and changing the `get_es_response` function to use `ASYNC_ES.search` directly instead.

https://github.com/WordPress/openverse/blob/7bb42984ea69c8ac8ce3f77d49a3fabc03c3068a/api/api/controllers/elasticsearch/helpers.py#L50

Something like this might work:

```py
@log_timing_info
async def get_es_response(s: Search, *args, **kwargs):
    if settings.VERBOSE_ES_RESPONSE:
        log.info(pprint.pprint(s.to_dict()))

    if not hasattr(s, "_response"):
        try:
            raw_response = await settings.ASYNC_ES.search(
                index=s._index,
                body=s.to_dict(),
                **self._params
            )

            s._response = s._response_class(
                s,
                raw_response.body
            )

            if settings.VERBOSE_ES_RESPONSE:
                log.info(pprint.pprint(s._response.to_dict()))
        except (BadRequestError, NotFoundError) as e:
            raise ValueError(e)

    return s._response
```

That's basically an adaptation of the Elasticsearch DSL's `Search::execute` method into our `get_es_response` function.

All functions that call `get_es_response`.

After updating `get_es_response`, we'll also want to update `check_dead_links` to be an asynchronous function rather than having it call an `async_to_sync` wrapped function. We'll need to follow the chain of functions all the way up to the route endpoints until the route endpoints and all functions that interact with Elasticsearch they use are `async def`.

## Additional context

Marked staff only due to complexity.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Convert search endpoints to asynchronous #3449

sarayourfriend
openedon Dec 4, 2023

Problem

Description

Additional context

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Convert search endpoints to asynchronous #3449

Description

sarayourfriendopenedon Dec 4, 2023

Problem

Description

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

sarayourfriend
openedon Dec 4, 2023