Skip to content

Conversation

@mgonnav
Copy link
Contributor

@mgonnav mgonnav commented Oct 23, 2024

Description

Documented estela core tasks and tasks that run on a schedule

Issue

Checklist before requesting a review

  • I have performed a self-review of my code.
  • My code follows the style guidelines of this project.
  • I have made corresponding changes to the documentation.
  • New and existing tests pass locally with my changes.
  • If this change is a core feature, I have added thorough tests.
  • If this change affects or depends on the behavior of other estela repositories, I have created pull requests with the relevant changes in the affected repositories. Please, refer to our official documentation.
  • I understand that my pull request may be closed if it becomes obvious or I did not perform all of the steps above.

@mgonnav mgonnav requested a review from joaquingx October 23, 2024 16:11
@mgonnav mgonnav self-assigned this Oct 23, 2024
Comment on lines +50 to +54
"""
Task to run spider jobs that are in the IN_QUEUE_STATUS.
Selects jobs from the queue and updates their status to WAITING_STATUS.
Initializes job execution using the job manager.
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we utilizing this task? It seems like we’re launching jobs immediately when they are created, as seen here: estela-api/api/views/job.py#L136-L146. For cron jobs, we’re using the launch_job task.

If I recall correctly, this task was intended to throttle the number of jobs and prevent overloading the Kubernetes cluster. Could you confirm if this is the case? If so, can you provide more context or explain the reasons behind this task?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@joaquingx You are right. The task is not actively used since the request to launch jobs is usually sent without an async parameter. However, the logic is present here in case jobs are launched with the async parameter here:

else:
serializer.save(
spider=spider,
status=SpiderJob.IN_QUEUE_STATUS,
data_status=data_status,
data_expiry_days=data_expiry_days,
)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add to this documentation that it will be used for asyncs jobs launched please?

Suggested change
"""
Task to run spider jobs that are in the IN_QUEUE_STATUS.
Selects jobs from the queue and updates their status to WAITING_STATUS.
Initializes job execution using the job manager.
"""
"""
Task to run spider jobs that are in the IN_QUEUE_STATUS.
Selects jobs from the queue and updates their status to WAITING_STATUS.
Initializes job execution using the job manager.
Note: This task will be used for async jobs, although job requests are typically sent without the async parameter.
"""

Comment on lines +106 to +115
"""
Task to launch a spider job with the provided data and optional token.
Creates a job using SpiderJobCreateSerializer and passes the job to the job manager.
Args:
sid_ (int): Spider ID.
data_ (dict): Job data to be serialized and saved.
data_expiry_days (int, optional): Number of days before data expiry.
token (str, optional): Authentication token. If not provided, a default token is used.
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In what scenarios would we want to provide the auth token as an argument?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@joaquingx I'm not sure about it. I don't see anywhere in the code where the code is passed manually to this function. @Adriana618 I remember you created this task, do you remember the purpose for adding the token argument?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants