Andrew Ng stated, “applied ML is basically just feature engineering.” In data science and ML, the most important, but oftentimes most overlooked, piece of the puzzle is feature engineering.
At Rasgo, we are data scientists on the mission to enable the global data science community to generate valuable and trusted insights from data in under 5 minutes. As we have marched forward on this mission, we’ve grown incredibly frustrated in the lack of helpful content and python functions that target feature engineering. We wrestle with these problems everyday and we wanted to provide a repository of recipes that showcase how to use the best tools available in this space. Additionally, we’ve built our own SDK (PyRasgo) for feature engineering that enables users to automatically track, visualize, and evaluate their feature engineering experiments to make more accurate and explainable feature engineering decisions.
In that vein, this repository contains tutorials and code to enable data scientists to easily create new ML features and evaluate their importance for supervised machine learning. We sincerely hope this is helpful and please leave comments with any questions on what we can do to improve!
Please join us on the
- Rasgo Forum for questions about these recipies and PyRasgo.
- Rasgo User Group Slack to join our community.
- Video Tutorials on YouTube (Coming Soon)
- Feature Profiling
- Data Cleaning
- Feature Transformation
- Time-series
- Categorical
- Numerical
- Model Selection
- Train-Test Split
- Time Series Split
- Train-Test Split:
- K-Fold or Cross-Validation
- Model Comparison
- Model Training
- Catboost
- Model Metrics
- Train-Test Split
- Feature Importance
- Feature Selection
- Model Agnostic
- Low Variance:
- Univariate Feature Selection
- Model Based
- Lasso-based Selection (Coming soon)
- Feature Importance
- Sequential Feature Selection
- Forward Stepwise Selection (Coming soon)
- Backwards Stepwise Selection
- Model Agnostic