Skip to content

Conversation

wraymo
Copy link
Member

@wraymo wraymo commented Jan 22, 2024

References

Description

This PR refactors the schedulers by

  • Removing the task consumer thread
  • Refactoring task distribution by using group primitive for compression
  • Removing the compression handler and delegating the work to the scheduler
  • Removing the old search scheduler and executor
  • Modifying associated package scripts
  • Cleaning up the code

Validation performed

  • Validated compression with the hive-24hrs dataset.
  • Validated search with a query "org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from hadoop-metrics2.properties"

@wraymo wraymo marked this pull request as ready for review January 23, 2024 21:04
@wraymo wraymo changed the title Refactor compression scheduler Refactor compression and search scheduler Jan 25, 2024
@wraymo wraymo force-pushed the compression_scheduler branch from c4f9bff to 3bfbe75 Compare January 25, 2024 21:01
@wraymo wraymo requested a review from kirkrodrigues January 25, 2024 23:28
Comment on lines +85 to +87
jobs = fetch_new_jobs(db_cursor)
db_conn.commit()
for job_row in jobs:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move db_conn.commit() into fetch_new_jobs (to keep it modular if we ever use the function elsewhere) and then we can simplify the loop to:

Suggested change
jobs = fetch_new_jobs(db_cursor)
db_conn.commit()
for job_row in jobs:
for job_row in fetch_new_jobs(db_conn, db_cursor):

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But we need to pass db_conn to fetch_new_jobs while other database functions don't require this parameter and all of them commit outside the function

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough. I think the database ops need some refactoring to be less error prone but we can do that in another PR.

kirkrodrigues
kirkrodrigues previously approved these changes Jan 28, 2024
@kirkrodrigues
Copy link
Member

Commit message: Add new compression scheduler implementation for improved simplicity and robustness; Remove old scheduler and consolidate the new schedulers. (#238)?

@wraymo
Copy link
Member Author

wraymo commented Jan 28, 2024

Commit message: Add new compression scheduler implementation for improved simplicity and robustness; Remove old scheduler and consolidate the new schedulers. (#238)?

Sure, but let me resolve conflicts first

@wraymo wraymo merged commit 5be00a2 into y-scope:main Jan 29, 2024
@wraymo wraymo changed the title Refactor compression and search scheduler Add new compression scheduler implementation for improved simplicity and robustness; Remove old scheduler and consolidate the new schedulers Jan 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants