Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Polars is not updated #186

Closed
alippai opened this issue Feb 4, 2021 · 10 comments
Closed

Polars is not updated #186

alippai opened this issue Feb 4, 2021 · 10 comments
Labels

Comments

@alippai
Copy link

alippai commented Feb 4, 2021

It's stuck on version 0.4.5. I looked into the code, but I don't see any constraints, a new pip install should result in new version.
The report claims:

Benchmark run took around 148.3 hours.
Report was generated on: 2021-02-02 01:18:00 PST.

so this might indicate some cache or deploy issues.

@alippai
Copy link
Author

alippai commented Feb 4, 2021

cc @ritchie46

@jangorecki
Copy link
Contributor

Recently I was running groupby2014 branch which runs only 3 solutions as of now. Will schedule run of all solutions today. Thanks for prompt.

@alippai
Copy link
Author

alippai commented Feb 4, 2021

That explains. Thanks for this useful benchmark, it's a really great ecosystem overview.

@jangorecki
Copy link
Contributor

I just run it now, multiple solutions got new releases recently thus it will take couple of days to finish. Will close this issue when done.

...
Benchmark solutions to run: data.table, pydatatable, dplyr, dask, juliadf, polars
...

@jangorecki
Copy link
Contributor

Resolved, report updated.

@impredicative
Copy link

impredicative commented Apr 10, 2021

Polars is a fairly low quality package that seems more of a perpetual alpha release. I say this because it has numerous serious bugs and multiple segfaults. It even corrupts data. IMO it should be excluded from the benchmark altogether.

@alippai
Copy link
Author

alippai commented Apr 10, 2021

@impredicative it would be a funny list if we'd drop packages because they have serious bugs. Dask, pandas, cudf, DataFrames.jl all already burned me in production. If you want to help the open source community, create a similar project to db-benchmark for testing conformance to your expected results using your data and queries. It'd be undoubtedly useful. @jangorecki is already doing an epic work maintaining this repo, it's definitely out of the scope of this package to track quality as well (and other OLAP products are not added yet, like TiDB, Vertica, MemSQL). Spamming the issue queues in the related projects is unwanted and doesn't create value, stop it, please.

@jangorecki
Copy link
Contributor

jangorecki commented Apr 11, 2021

@impredicative it would make more sense if you would link those bug reports (in a new dedicated issue).
If a project is not maintained, bugs are not being resolved for extended period of time, then we could eventually think about dropping a solution from benchmark. According to my experience with solutions in benchmark there would be multiple other solutions that would be better candidates to be dropped.

@ritchie46
Copy link
Contributor

For context, @impredicative was blocked on the Polars repo for being really rude and complaining that I wouldn't implement a feature or at least not the way he likes to see it.

Constructive feedback is of course more than welcome and if there are any issues/ bugs that need to be resolved I happily do so in discussion with the users. I am afraid that the request above is more due to his relationship with me than real bugs/ segfaults. But if there are any, please let me know. :)

@impredicative
This comment has been minimized.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants