Skip to content

Streamset filter to use dataframe #44

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 3 commits into
base: master
Choose a base branch
from
Draft

Conversation

jleifnf
Copy link

@jleifnf jleifnf commented Aug 11, 2023

Update the StreamSet's filter to remove multiple for-loops by build a hidden metadata table and using DataFrame.
Tested on StreamSet of 6471 streams (originally took ~50 seconds) to filter down streamset.filter(unit=re.compile(r'FREQ|FLAG'), name=re.compile(r'FQ|FLAG')) to 294 streams.

With this code change, the stream filter took ~10seconds (5x)..

@jleifnf jleifnf requested a review from justinGilmer August 11, 2023 00:36
@jleifnf jleifnf self-assigned this Aug 11, 2023
@jleifnf jleifnf marked this pull request as draft August 23, 2023 16:35
@andrewchambers
Copy link

Don't know if we want to add dask in this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants