-
Notifications
You must be signed in to change notification settings - Fork 202
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add catalog indexer worker #4330
Conversation
@@ -0,0 +1,71 @@ | |||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copied from ingestion_server
without changes to logic.
@@ -0,0 +1,58 @@ | |||
import logging as log |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copied from ingestion_server
without changes to logic.
@@ -0,0 +1,335 @@ | |||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copied from ingestion_server
without changes to logic. This provides the mappings from db record to ES document.
@@ -0,0 +1,51 @@ | |||
import logging as log |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copied from ingestion_server
without changes to logic.
12a58df
to
a331cb3
Compare
@stacimc I've opened a PR targeting this branch that switches Pipenv out for PDM, add a I didn't see a separate issue for implanting CI for the new project, so I wasn't sure where they would go, if not in this PR; but if you want to keep it as a fast-follow, no worries, it doesn't need to be merged into this PR, can be on its own if you'd prefer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This LGTM! On #4497 I can run just indexer_worker/test
and all the tests pass. On this branch, just indexer_worker/test-local
also passes after running through the local venv installation.
I have not spent much time reviewing the code line by line given how much of it is copy/paste. That's fine by me, as well as the decision to push tests off into integration style tests in later issues 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great! I was able to test this locally and it worked as expected. A couple of comments here but nothing blocking, if anything we can make issues going forward for addressing them as non-urgent follow ups (and possibly good first issues!).
return None | ||
return str(datetime.datetime.utcfromtimestamp(timestamp)) | ||
|
||
task_info = self.tasks[task_id] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this is something we can make an issue for, particularly given this might be longer-term code now: it would be great if this could be a dataclass or namedtuple, rather than a dictionary! And then we could have some typing around the self.tasks
dictionary as well, which would be neat 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Of course! Sounds like a good first issue as well. Issue: #4534 👍
Fixes
Fixes #4147 by @stacimc
Description
This PR adds the catalog indexer worker. It does not actually connect to it in the data refresh yet.
Some unit tests are added, but because of the nature of the worker the bulk of the tests will be integration tests modeled after the existing ingestion server integration tests. I was able to get these mostly working locally but pulled them out of this PR as I think it will make more sense to add the integration tests once the data refresh is fully implemented, possibly reusing issue #4340.
Almost all of this code is just a refactor of existing code. Where possible I've noted when files are almost entirely copied over from the ingestion server.
Note that the service is called
catalog_indexer_worker
to distinguish it from the existingindexer_worker
that belongs to the ingestion server, which we must keep around for now.Testing Instructions
just build && just up
Use elasticvue to
Delete All Documents
from one of your existing indices. I usedaudio-init-filtered
. The index should now have 0 documents.Now run
just catalog/shell
, and you should be able to curl the indexer worker.You should get back a response like:
Now curl the status_check endpoint that was just returned:
And you should see something like:
In elasticvue, you should now see 100 docs in your target index.
If you used a filtered index, you can now query the API (http://localhost:50281/v1/audio) and see that the results are returned as expected.⚠️ Important: In Elasticvue you must first refresh the index to make the documents available. Then you can inspect them in Elasticvue, or check the API. This will be handled by the DAG later.
You can also try running the
just
commands for the indexer worker.Checklist
Update index.md
).main
) or a parent feature branch.just catalog/generate-docs
for catalogPRs) or the media properties generator (
just catalog/generate-docs media-props
for the catalog or
just api/generate-docs
for the API) where applicable.Developer Certificate of Origin
Developer Certificate of Origin