Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Polars remove notebook #173

Closed
wants to merge 498 commits into from
Closed

Polars remove notebook #173

wants to merge 498 commits into from

Conversation

armgilles
Copy link
Owner

@armgilles armgilles commented Nov 7, 2024

Closed #172

On enlève toute référence à LFS (pour les gros notebook il y longtemps...) et reprise d'historique git (très chiant de gérer ces veilles erreurs....)

Signed-off-by: Gillesa <arm.gilles@gmail.com>
Signed-off-by: Gillesa <arm.gilles@gmail.com>
Signed-off-by: Gillesa <arm.gilles@gmail.com>
Signed-off-by: Gillesa <arm.gilles@gmail.com>
)

Signed-off-by: Gillesa <arm.gilles@gmail.com>
)

Signed-off-by: Gillesa <arm.gilles@gmail.com>
Test transformation json to DataFrame
Test transformation json to DataFrame
Signed-off-by: Armand <arm.gilles@gmail.com>
Signed-off-by: Armand <arm.gilles@gmail.com>
Signed-off-by: Armand <arm.gilles@gmail.com>
Signed-off-by: Armand <arm.gilles@gmail.com>
Signed-off-by: Armand <arm.gilles@gmail.com>
Signed-off-by: Armand <arm.gilles@gmail.com>
…ect #85

Signed-off-by: Armand <arm.gilles@gmail.com>
…ect #85

Signed-off-by: Armand <arm.gilles@gmail.com>
Signed-off-by: Armand <arm.gilles@gmail.com>
Signed-off-by: Armand <arm.gilles@gmail.com>
Signed-off-by: Armand <arm.gilles@gmail.com>
Signed-off-by: Armand <arm.gilles@gmail.com>
Signed-off-by: Armand <arm.gilles@gmail.com>
Signed-off-by: Armand <arm.gilles@gmail.com>
Signed-off-by: Armand <arm.gilles@gmail.com>
Signed-off-by: Armand <arm.gilles@gmail.com>
Signed-off-by: Armand <arm.gilles@gmail.com>
Signed-off-by: Armand <arm.gilles@gmail.com>
Signed-off-by: Armand <arm.gilles@gmail.com>
Signed-off-by: Armand <arm.gilles@gmail.com>
* Bench pipeline transf (#155)

* Update for front (#111)

* Fix install in front and got to be in PROD

Signed-off-by: Armand <arm.gilles@gmail.com>

* Just to check in site-package on dir below ROOT_DIR

Signed-off-by: Armand <arm.gilles@gmail.com>

* Update Ruff version in pre commit

Signed-off-by: Armand <arm.gilles@gmail.com>

* Change check for prod with new structure and Front

Signed-off-by: Armand <arm.gilles@gmail.com>

* Remove non usefull reset_index in oslandia API data crunsh

Signed-off-by: Armand <arm.gilles@gmail.com>

* Format

Signed-off-by: Armand <arm.gilles@gmail.com>

* Add matplotlib as dep (visualisation.py)

Signed-off-by: Armand <arm.gilles@gmail.com>

* Add seaborn as dep for deploy

Signed-off-by: Armand <arm.gilles@gmail.com>

* Try to fix error install ci watcher 'error: Your local changes to the following files would be overwritten by checkout'

Signed-off-by: Armand <arm.gilles@gmail.com>

* minor update on notebook

Signed-off-by: Armand <arm.gilles@gmail.com>

---------

Signed-off-by: Armand <arm.gilles@gmail.com>

* Bump to version 1.2.2a (#112)

Signed-off-by: Armand <arm.gilles@gmail.com>

* Add benchmark for pipeline transf from json API data #154

Signed-off-by: Armand <arm.gilles@gmail.com>

* Improve speed of creation of simulated data & fix big data creation setting #154

Signed-off-by: Armand <arm.gilles@gmail.com>

* Optimise creation of simulated data and reduce volume test #154

Signed-off-by: Armand <arm.gilles@gmail.com>

* Codspeed run forever, only on small test #154

Signed-off-by: Armand <arm.gilles@gmail.com>

* Codspeed run forever, only on small test #154

Signed-off-by: Armand <arm.gilles@gmail.com>

* Codspeed previous commit ok with no new test, check with one

Signed-off-by: Armand <arm.gilles@gmail.com>

* Reduce volume for #154 to run on codspeed

Signed-off-by: Armand <arm.gilles@gmail.com>

* Reduce volume for #154 to run on codspeed

Signed-off-by: Armand <arm.gilles@gmail.com>

* Reduce again volume for codspeed, reduce CI time

Signed-off-by: Armand <arm.gilles@gmail.com>

---------

Signed-off-by: Armand <arm.gilles@gmail.com>

* specify timezone to be as real data #154

Signed-off-by: Armand <arm.gilles@gmail.com>

---------

Signed-off-by: Armand <arm.gilles@gmail.com>
* Bench pipeline transf (#155)

* Update for front (#111)

* Fix install in front and got to be in PROD

Signed-off-by: Armand <arm.gilles@gmail.com>

* Just to check in site-package on dir below ROOT_DIR

Signed-off-by: Armand <arm.gilles@gmail.com>

* Update Ruff version in pre commit

Signed-off-by: Armand <arm.gilles@gmail.com>

* Change check for prod with new structure and Front

Signed-off-by: Armand <arm.gilles@gmail.com>

* Remove non usefull reset_index in oslandia API data crunsh

Signed-off-by: Armand <arm.gilles@gmail.com>

* Format

Signed-off-by: Armand <arm.gilles@gmail.com>

* Add matplotlib as dep (visualisation.py)

Signed-off-by: Armand <arm.gilles@gmail.com>

* Add seaborn as dep for deploy

Signed-off-by: Armand <arm.gilles@gmail.com>

* Try to fix error install ci watcher 'error: Your local changes to the following files would be overwritten by checkout'

Signed-off-by: Armand <arm.gilles@gmail.com>

* minor update on notebook

Signed-off-by: Armand <arm.gilles@gmail.com>

---------

Signed-off-by: Armand <arm.gilles@gmail.com>

* Bump to version 1.2.2a (#112)

Signed-off-by: Armand <arm.gilles@gmail.com>

* Add benchmark for pipeline transf from json API data #154

Signed-off-by: Armand <arm.gilles@gmail.com>

* Improve speed of creation of simulated data & fix big data creation setting #154

Signed-off-by: Armand <arm.gilles@gmail.com>

* Optimise creation of simulated data and reduce volume test #154

Signed-off-by: Armand <arm.gilles@gmail.com>

* Codspeed run forever, only on small test #154

Signed-off-by: Armand <arm.gilles@gmail.com>

* Codspeed run forever, only on small test #154

Signed-off-by: Armand <arm.gilles@gmail.com>

* Codspeed previous commit ok with no new test, check with one

Signed-off-by: Armand <arm.gilles@gmail.com>

* Reduce volume for #154 to run on codspeed

Signed-off-by: Armand <arm.gilles@gmail.com>

* Reduce volume for #154 to run on codspeed

Signed-off-by: Armand <arm.gilles@gmail.com>

* Reduce again volume for codspeed, reduce CI time

Signed-off-by: Armand <arm.gilles@gmail.com>

---------

Signed-off-by: Armand <arm.gilles@gmail.com>

* specify timezone to be as real data #154

Signed-off-by: Armand <arm.gilles@gmail.com>

---------

Signed-off-by: Armand <arm.gilles@gmail.com>
Signed-off-by: Armand <arm.gilles@gmail.com>
Signed-off-by: Armand <arm.gilles@gmail.com>
* Try lazy + expression fonction to check perf bench #146

Signed-off-by: Armand <arm.gilles@gmail.com>

* Update code with get_transactions_out expr #146

Signed-off-by: Armand <arm.gilles@gmail.com>

* have to collect with lazy #146

Signed-off-by: Armand <arm.gilles@gmail.com>

* Lazy read_activity_vcub #148 and update notebook transactions_out

Signed-off-by: Armand <arm.gilles@gmail.com>

* Update bench test with lazy vcub_keeper_py312 #148

Signed-off-by: Armand <arm.gilles@gmail.com>

* update docstring #148

Signed-off-by: Armand <arm.gilles@gmail.com>

* collect lazy df to be a fair bench #146

Signed-off-by: Armand <arm.gilles@gmail.com>

* collect lazy df to be a fair bench #146

Signed-off-by: Armand <arm.gilles@gmail.com>

* collect lazy df to be a fair bench #146

Signed-off-by: Armand <arm.gilles@gmail.com>

* Update transactions_in to be lazy and expr fonction #149

Signed-off-by: Armand <arm.gilles@gmail.com>

* Try to improve bench test on big result

Signed-off-by: Armand <arm.gilles@gmail.com>

* lazy and Expr function for transactions_all function #150

Signed-off-by: Armand <arm.gilles@gmail.com>

* try to fix bad perf on big dataset lazy

Signed-off-by: Armand <arm.gilles@gmail.com>

* try to fix bad perf on big dataset lazy

Signed-off-by: Armand <arm.gilles@gmail.com>

* try to fix bad perf on big dataset lazy

Signed-off-by: Armand <arm.gilles@gmail.com>

* Lazy expr for get_consecutive_no_transactions_out #151

Signed-off-by: Armand <arm.gilles@gmail.com>

* Lazy transform_json_api_bdx_station_data_to_df function #152

Signed-off-by: Armand <arm.gilles@gmail.com>

* Encoding time in Expr function and process_data_cluster in lazy mode #153

Signed-off-by: Armand <arm.gilles@gmail.com>

* add todo for ML with pandas

Signed-off-by: Armand <arm.gilles@gmail.com>

* Adapt code for pipeline bench lazy

Signed-off-by: Armand <arm.gilles@gmail.com>

* Try new lazy for pipeline bench

Signed-off-by: Armand <arm.gilles@gmail.com>

* Try new lazy for pipeline bench

Signed-off-by: Armand <arm.gilles@gmail.com>

* process data with with_columns style & lazy #161

Signed-off-by: Armand <arm.gilles@gmail.com>

* forget previous commit

Signed-off-by: Armand <arm.gilles@gmail.com>

* have to collect this tests

Signed-off-by: Armand <arm.gilles@gmail.com>

* Small test bench are in eager mode, big in lazy mode to faire comparaison

Signed-off-by: Armand <arm.gilles@gmail.com>

* Small test bench are in eager mode, big in lazy mode to faire comparaison

Signed-off-by: Armand <arm.gilles@gmail.com>

* Update notebook with with_colums style for feature creation

Signed-off-by: Armand <arm.gilles@gmail.com>

* Using pipe style with lazy

Signed-off-by: Armand <arm.gilles@gmail.com>

* cleaning

Signed-off-by: Armand <arm.gilles@gmail.com>

---------

Signed-off-by: Armand <arm.gilles@gmail.com>
* Try lazy + expression fonction to check perf bench #146

Signed-off-by: Armand <arm.gilles@gmail.com>

* Update code with get_transactions_out expr #146

Signed-off-by: Armand <arm.gilles@gmail.com>

* have to collect with lazy #146

Signed-off-by: Armand <arm.gilles@gmail.com>

* Lazy read_activity_vcub #148 and update notebook transactions_out

Signed-off-by: Armand <arm.gilles@gmail.com>

* Update bench test with lazy vcub_keeper_py312 #148

Signed-off-by: Armand <arm.gilles@gmail.com>

* update docstring #148

Signed-off-by: Armand <arm.gilles@gmail.com>

* collect lazy df to be a fair bench #146

Signed-off-by: Armand <arm.gilles@gmail.com>

* collect lazy df to be a fair bench #146

Signed-off-by: Armand <arm.gilles@gmail.com>

* collect lazy df to be a fair bench #146

Signed-off-by: Armand <arm.gilles@gmail.com>

* Update transactions_in to be lazy and expr fonction #149

Signed-off-by: Armand <arm.gilles@gmail.com>

* Try to improve bench test on big result

Signed-off-by: Armand <arm.gilles@gmail.com>

* lazy and Expr function for transactions_all function #150

Signed-off-by: Armand <arm.gilles@gmail.com>

* try to fix bad perf on big dataset lazy

Signed-off-by: Armand <arm.gilles@gmail.com>

* try to fix bad perf on big dataset lazy

Signed-off-by: Armand <arm.gilles@gmail.com>

* try to fix bad perf on big dataset lazy

Signed-off-by: Armand <arm.gilles@gmail.com>

* Lazy expr for get_consecutive_no_transactions_out #151

Signed-off-by: Armand <arm.gilles@gmail.com>

* Lazy transform_json_api_bdx_station_data_to_df function #152

Signed-off-by: Armand <arm.gilles@gmail.com>

* Encoding time in Expr function and process_data_cluster in lazy mode #153

Signed-off-by: Armand <arm.gilles@gmail.com>

* add todo for ML with pandas

Signed-off-by: Armand <arm.gilles@gmail.com>

* Adapt code for pipeline bench lazy

Signed-off-by: Armand <arm.gilles@gmail.com>

* Try new lazy for pipeline bench

Signed-off-by: Armand <arm.gilles@gmail.com>

* Try new lazy for pipeline bench

Signed-off-by: Armand <arm.gilles@gmail.com>

* process data with with_columns style & lazy #161

Signed-off-by: Armand <arm.gilles@gmail.com>

* forget previous commit

Signed-off-by: Armand <arm.gilles@gmail.com>

* have to collect this tests

Signed-off-by: Armand <arm.gilles@gmail.com>

* Small test bench are in eager mode, big in lazy mode to faire comparaison

Signed-off-by: Armand <arm.gilles@gmail.com>

* Small test bench are in eager mode, big in lazy mode to faire comparaison

Signed-off-by: Armand <arm.gilles@gmail.com>

* Update notebook with with_colums style for feature creation

Signed-off-by: Armand <arm.gilles@gmail.com>

* Using pipe style with lazy

Signed-off-by: Armand <arm.gilles@gmail.com>

* cleaning

Signed-off-by: Armand <arm.gilles@gmail.com>

---------

Signed-off-by: Armand <arm.gilles@gmail.com>
Signed-off-by: Armand <arm.gilles@gmail.com>
Signed-off-by: Armand <arm.gilles@gmail.com>
* Update important viz fucntion to polars #164

Signed-off-by: Armand <arm.gilles@gmail.com>

* Upate notebook with viz, some pandas are still here but it's ok #164

Signed-off-by: Armand <arm.gilles@gmail.com>

---------

Signed-off-by: Armand <arm.gilles@gmail.com>
* Update important viz fucntion to polars #164

Signed-off-by: Armand <arm.gilles@gmail.com>

* Upate notebook with viz, some pandas are still here but it's ok #164

Signed-off-by: Armand <arm.gilles@gmail.com>

---------

Signed-off-by: Armand <arm.gilles@gmail.com>
Signed-off-by: Armand <arm.gilles@gmail.com>
Signed-off-by: Armand <arm.gilles@gmail.com>
* Fix bad condition for consecutive_no_transactions_out and available bike in station #170

Signed-off-by: Armand <arm.gilles@gmail.com>

* Update test to test available_bike less or equal 2 is consecutive_no_transactions_out = 0 #170

Signed-off-by: Armand <arm.gilles@gmail.com>

* Small update to give same type in test #170

Signed-off-by: Armand <arm.gilles@gmail.com>

---------

Signed-off-by: Armand <arm.gilles@gmail.com>
* Fix bad condition for consecutive_no_transactions_out and available bike in station #170

Signed-off-by: Armand <arm.gilles@gmail.com>

* Update test to test available_bike less or equal 2 is consecutive_no_transactions_out = 0 #170

Signed-off-by: Armand <arm.gilles@gmail.com>

* Small update to give same type in test #170

Signed-off-by: Armand <arm.gilles@gmail.com>

---------

Signed-off-by: Armand <arm.gilles@gmail.com>
Signed-off-by: Armand <arm.gilles@gmail.com>
Signed-off-by: Armand <arm.gilles@gmail.com>
Signed-off-by: Armand <arm.gilles@gmail.com>
Signed-off-by: Armand <arm.gilles@gmail.com>
@armgilles armgilles added the bug Something isn't working label Nov 7, 2024
@armgilles armgilles added this to the V1.3 milestone Nov 7, 2024
@armgilles armgilles self-assigned this Nov 7, 2024
Copy link

codecov bot commented Nov 7, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 68.70%. Comparing base (ab1fa8c) to head (a42f2f0).

Additional details and impacted files
@@             Coverage Diff             @@
##           dev_polars     #173   +/-   ##
===========================================
  Coverage       68.70%   68.70%           
===========================================
  Files              10       10           
  Lines             310      310           
===========================================
  Hits              213      213           
  Misses             97       97           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link

codspeed-hq bot commented Nov 7, 2024

CodSpeed Performance Report

Merging #173 will degrade performances by 10.11%

Comparing polars_remove_notebook (bf50e6a) with polars_remove_notebook (4717996)

Summary

⚡ 1 improvements
❌ 1 regressions
✅ 11 untouched benchmarks

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

Benchmark polars_remove_notebook polars_remove_notebook Change
test_benchmark_get_transaction_all 1.5 ms 1.7 ms -10.11%
test_benchmark_get_transaction_in 1.9 ms 1.8 ms +7.17%

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant