-
Notifications
You must be signed in to change notification settings - Fork 202
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create a dataclass for indexer worker TaskStatus #4601
Create a dataclass for indexer worker TaskStatus #4601
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @akshay-km, thanks for your help with this. In general, it looks good to me! You have a small failure in CI due to linting; it should be fixed once you run ./ov just lint
and commit the changes.
Also, I'm unsure if @stacimc would want the active
variable to have more direct access from the dataclass. Let's see what she was thinking about it.
task: Any | ||
start_time: Any | ||
model: str | ||
target_index: str | ||
finish_time: Any | ||
progress: Any |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The types can be more precise here. Below, the times are passed to the _time_fmt
function that accepts integers, and probably progress
is also an integer; @stacimc can confirm.
task: Any | |
start_time: Any | |
model: str | |
target_index: str | |
finish_time: Any | |
progress: Any | |
task: Any | |
start_time: int | |
model: str | |
target_index: str | |
finish_time: int | |
progress: Any |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @krysal , finish_time
and progress
has the same type: <class 'multiprocessing.sharedctypes.Synchronized'> .
But the value of progress
in the api response is 100.0
. So should it be set as a float
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
float
would be appropriate for the timestamps (and you can change the type of the timestamp
parameter in _time_fmt
to match, actually!)
progress
and finish_time
are shared variables using multiprocessing.Value
so I believe the correct type would be Synchronized[float]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HI @stacimc When using Synchronized[float]
, I am getting a TypeError: type 'Synchronized' is not subscriptable
because Synchronized
is not a type that can be parameterized in the typing module
. So I have used Synchronized
as type of progress
and finish_time
. Also used float
as type for timestamps including for the parameter in _time_fmt_
function.
Have made a new commit with these changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, interesting! It looks like Synchronized
is in the typeshed package/not available at runtime. I think the fix should be to add from __future__ import annotations
to the top of the file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great!
Learned something new :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More accurate:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@akshay-km I noted and applied the @stacimc's suggestion so this looks good to me! Thanks for you help.
Edir: also great job including complete testing instructions. It's appreciated :)
Thanks @krysal |
Based on the contributor urgency of this PR, the following reviewers are being gently reminded to review this PR: @stacimc Excluding weekend1 days, this PR was ready for review 10 day(s) ago. PRs labelled with contributor urgency are expected to be reviewed within 3 weekday(s)2. @akshay-km, if this PR is not ready for a review, please draft it to prevent reviewers from getting further unnecessary pings. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fantastic, thanks @akshay-km!
Happy to help :) Thanks for the feedback. |
Fixes
Fixes #4534 by @stacimc
Description
This PR adds a dataclass with type hints for each field for index worker task status. The task status was stored as dictionary before.
Testing Instructions
just build && just up
Use elasticvue to Delete All Documents from one of your existing indices. Eg:
audio-init-filtered
. The index should now have 0 documents.Now run just
catalog/shell
, and you should be able to curl the indexer worker.Replace the target_index name if you don't have audio-init-filtered
Note the start_id/end_id we're using, to make it reindex only 100 records
curl -X POST -d '{"model_name": "audio", "table_name": "audio", "start_id": 0, "end_id": 100, "target_index": "audio-init-filtered"}' -H "Content-Type: application/json" 'http://catalog_indexer_worker:8003/task'
You should get back a response like:
Now curl the status_check endpoint that was just returned:
curl 'http://catalog_indexer_worker:8003/task/acfb9919708c4ef2a505fe5f8c6a10a8
And you should see something like:
Which is similar to the response returned before adding dataclass for the task status.
Checklist
Update index.md
).main
) or a parent feature branch../ov just catalog/generate-docs
for catalogPRs) or the media properties generator (
./ov just catalog/generate-docs media-props
for the catalog or
./ov just api/generate-docs
for the API) where applicable.Developer Certificate of Origin
Developer Certificate of Origin