Add support for asynchronous embeddings export by ntamas92 · Pull Request #394 · scaleapi/nucleus-python-client

ntamas92 · 2023-09-18T17:30:56Z

No description provided.

jean-lucas · 2023-09-18T17:42:39Z

nucleus/async_job.py

+        if status["status"] != "Completed":
+            raise JobError(status, self)
+


why raise a JobError if the job is not completed? perhaps its still running?

My thought process was that the usage pattern would be the following:

export_job = dataset.export_embeddings() export_job.sleep_until_complete(False) result = export_job.result_urls()

We could just wait for the result urls inside result_urls() also, but then I'd highlight it somehow that obtaining the results could run for a long time.

Alright, that makes sense, didn't noticed the AsyncJob inheritence.
This is a neat idea, to have customized job result classes

Maybe we should add a wait_for_completion parameter. It might even be the default to wait for the job to complete.

Good idea, let's do that

jean-lucas

LGTM 👍

gatli

Look good to me! Let's address the instantiation of the EmbeddingsExportJob (and for that matter any AsyncJob) such that we can let people trigger this in one process and poll in another.

gatli · 2023-09-19T08:25:49Z

.circleci/config.yml

            poetry run black --check .
      - run:
-          name: Ruff Lint Check # See pyproject.tooml [tool.ruff]
+          name: Ruff Lint Check # See pyproject.toml [tool.ruff]


gatli · 2023-09-19T08:32:07Z

nucleus/async_job.py

+        if status["status"] != "Completed":
+            raise JobError(status, self)
+


Maybe we should add a wait_for_completion parameter. It might even be the default to wait for the job to complete.

gatli · 2023-09-19T08:33:34Z

nucleus/async_job.py

+class EmbeddingsExportJob(AsyncJob):
+    def result_urls(self) -> List[str]:


I'm wondering how you would instantiate this in another process. I think we need a classmethod from_id that would allow you to spin this up in one environment and then poll in another just from the job_id

Good point, that would be used through the NucleusClient.list_jobs method though right? So something like this:

jobs = NucleusClient.list_jobs() export_job = EmbeddingsExportJob.from_job_id(jobs[0].job_id)

Added a from_id to the AsyncJob, but I couldn't make it more typesafe (e.g. client argument is) still inferred as any. Do you have any ideas on how to improve?

gatli · 2023-09-19T08:36:39Z

We really need to address this flakiness 😓

ntamas92 · 2023-09-19T15:22:38Z

We really need to address this flakiness 😓

Even more so now that it hides actually failing tests, tests were failing because of my changes, and I didn't realize, I just assumed they are just flaky 🤦

ntamas92 added 2 commits September 18, 2023 19:11

Add support for asynchronous embeddings export

95ab42a

Add changelog

adbdb57

ntamas92 requested review from gatli and jean-lucas September 18, 2023 17:30

Add changed section to changelog

4374f9b

jean-lucas reviewed Sep 18, 2023

View reviewed changes

jean-lucas approved these changes Sep 19, 2023

View reviewed changes

gatli reviewed Sep 19, 2023

View reviewed changes

ntamas92 added 3 commits September 19, 2023 12:10

Allow waiting for completion in result_urls

47af8e5

Add from_id to AsyncJob

4bd86a1

Add documentation for from_id

76c00a8

Adapt tests

3bdcd37

ntamas92 merged commit cb10ec6 into master Sep 21, 2023

ntamas92 deleted the tamasn/async-embeddings-export branch September 21, 2023 08:05

		if status["status"] != "Completed":
		raise JobError(status, self)

		class EmbeddingsExportJob(AsyncJob):
		def result_urls(self) -> List[str]:

Conversation

ntamas92 commented Sep 18, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ntamas92 Sep 18, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jean-lucas left a comment

Choose a reason for hiding this comment

Uh oh!

gatli left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gatli commented Sep 19, 2023

Uh oh!

ntamas92 commented Sep 19, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ntamas92 Sep 18, 2023 •

edited

Loading