Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: enable ray by default #787

Merged
merged 25 commits into from
Jun 1, 2023
Merged

feat: enable ray by default #787

merged 25 commits into from
Jun 1, 2023

Conversation

xzdandy
Copy link
Collaborator

@xzdandy xzdandy commented May 28, 2023

This pr intends to enable ray by default in EVA.

  • Ray is enabled by default in eva server, test cases, notebooks
  • Remove experimental code directory
  • Introduce a new UDF decorator parralizable and uses it in the optimizer to filter out inexpensive functions.
    • This is not possible now. We need UDF decorator to be associated with the class instead of the setup function, so we can get the UDF properties without initializing the UDF.

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@jarulraj jarulraj changed the title Enable ray be default feat: enable ray by default May 29, 2023
@jarulraj jarulraj added High Priority ⚡️ Integrations 🧩 Pull requests that update an integration High Effort 🏋 Difficult solution or problem to solve labels May 29, 2023
@xzdandy
Copy link
Collaborator Author

xzdandy commented May 30, 2023

Currently, the optimizer rule for ray does not work for UDFs in the projection's target list. So only the following two notebooks uses ray. Their results look expected.

The reason is that LogicalGetToSeqScan for ray does not make sense, since function expressions are in the LogicalProject instead of Logical Get. We need to create a LogicalProjectToPhysical for ray.
https://github.com/georgia-tech-db/eva/blob/e26c36dd71242de005e57baa3c4a6c2ab818d2d2/eva/experimental/parallel/optimizer/rules/rules.py#L90

@xzdandy xzdandy marked this pull request as ready for review May 31, 2023 08:42
@xzdandy xzdandy requested review from gaurav274 and jiashenC May 31, 2023 08:42
@xzdandy
Copy link
Collaborator Author

xzdandy commented May 31, 2023

@gaurav274 @jiashenC Any idea, why we have the ray skip tag for chatgpt test cases? Also, this seems to be unit test cases instead of integration test cases.

@gaurav274
Copy link
Member

@gaurav274 @jiashenC Any idea, why we have the ray skip tag for chatgpt test cases? Also, this seems to be unit test cases instead of integration test cases.

It is a unit test because of the missing openai_api_key. Not sure about ray.

@gaurav274
Copy link
Member

@jiashenC testcases are failing because of the coverage issue, right?

@jiashenC
Copy link
Member

jiashenC commented May 31, 2023

@jiashenC testcases are failing because of the coverage issue, right?

Yes, we probably still need to have a separate ray testing without generating coverage reports. Otherwise, it will cause timeout issue because of too many coverage trace to parse which are generated by many ray processes.

@@ -87,45 +86,39 @@ def apply(self, before: LogicalApplyAndMerge, context: OptimizerContext):
yield exchange_plan


class LogicalGetToSeqScan(Rule):
class LogicalProjectToPhysical(Rule):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just double check, so we won't get any functional expression in the SeqScan operator anymore?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. It seems that the current statement to operator generation always creates a project operator.

@jarulraj
Copy link
Member

jarulraj commented Jun 1, 2023

We can get rid of experimental folder?

@xzdandy
Copy link
Collaborator Author

xzdandy commented Jun 1, 2023

@jiashenC testcases are failing because of the coverage issue, right?

Yes, we probably still need to have a separate ray testing without generating coverage reports. Otherwise, it will cause timeout issue because of too many coverage trace to parse which are generated by many ray processes.

It is pending on local also, after all testcases are completed before the coverage report is generated. It seems that there may be some ray related resources are not correctly garbage cleaned.

@xzdandy
Copy link
Collaborator Author

xzdandy commented Jun 1, 2023

We can get rid of experimental folder?

Make sense. We should get rid of it. Will do.

@xzdandy
Copy link
Collaborator Author

xzdandy commented Jun 1, 2023

Update:

  • Rollback the UDF decorator changes for ray. The current UDF decorator design can not support this feature in a proper manner.
  • After removing the experimental directory, the coverage is dropping, given we are collecting coverage with ray disabled. Need to update .coveragerc and add some unit ray testcases even when ray is disabled.

@gaurav274 gaurav274 merged commit 22b5160 into master Jun 1, 2023
@gaurav274 gaurav274 deleted the ray_default branch June 1, 2023 20:05
yulaicui pushed a commit that referenced this pull request Jun 3, 2023
This pr intends to enable ray by default in EVA. 

- [x] Ray is enabled by default in eva server, test cases, notebooks  
- [x] Remove experimental code directory
- [ ] ~~Introduce a new UDF decorator `parralizable` and uses it in the
optimizer to filter out inexpensive functions.~~
- This is not possible now. We need UDF decorator to be associated with
the class instead of the setup function, so we can get the UDF
properties without initializing the UDF.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
High Effort 🏋 Difficult solution or problem to solve High Priority ⚡️ Integrations 🧩 Pull requests that update an integration
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants