Skip to content

brunolnetto/Data-science

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Data Science Articles from CodeCut

CodeCut is a platform dedicated to helping busy data scientists write better code through concise, practical tutorials, best practices, and tool recommendations. We focus on open-source tools and techniques that make data science workflows more efficient and maintainable, saving you time and reducing technical debt.

This repository is a curated collection of data science articles from CodeCut, covering topics like MLOps, data management, testing, visualization, and more. Each article comes with practical examples, code repositories, and video tutorials to help you quickly implement these tools and practices in your own projects.

Table of Contents

  1. MLOps
  2. Data Management Tools
  3. Testing
  4. Python Helper Tools
  5. Feature Engineering
  6. Visualization
  7. Python
  8. Logging and Debugging
  9. LLM
  10. Speed-up Tools

MLOps

Title Article Repository Video
Goodbye Pip and Poetry. Why UV Might Be All You Need 🔗
Stop Hard Coding in a Data Science Project – Use Configuration Files Instead 🔗 🔗 🔗
Poetry: A Better Way to Manage Python Dependencies 🔗 🔗
Git for Data Scientists: Learn Git through Practical Examples 🔗 🔗
4 pre-commit Plugins to Automate Code Reviewing and Formatting in Python 🔗 🔗 🔗
How to Structure a Data Science Project for Maintainability 🔗 🔗 🔗
Build Reliable Machine Learning Pipelines with Continuous Integration 🔗 🔗 🔗
Automate Machine Learning Deployment with GitHub Actions 🔗 🔗 🔗
How to Build a Fully Automated Data Drift Detection Pipeline 🔗 🔗 🔗

Data Management Tools

Title Article Repository Video
Version Control for Data and Models Using DVC 🔗 🔗 🔗
What is dbt (data build tool) and When should you use it? 🔗 🔗 🔗
Streamline dbt Model Development with Notebook-Style Workspace 🔗 🔗 🔗

Testing

Title Article Repository Video
Pytest for Data Scientists 🔗 🔗 🔗

Python Helper Tools

Title Article Repository Video
Write Clean Python Code Using Pipes 🔗 🔗 🔗
Introducing FugueSQL — SQL for Pandas, Spark, and Dask DataFrames 🔗 🔗
Fugue and DuckDB: Fast SQL Code in Python 🔗 🔗

Feature Engineering

Title Article Repository Video
Polars vs. Pandas: A Fast, Multi-Core Alternative for DataFrames 🔗 🔗

Visualization

Title Article Repository Video
Top 6 Python Libraries for Visualization: Which one to Use? 🔗 🔗

Python

Title Article Repository Video
Python Clean Code: 6 Best Practices to Make Your Python Functions More Readable 🔗 🔗 🔗

Logging and Debugging

Title Article Repository Video
Loguru: Simple as Print, Flexible as Logging 🔗 🔗 🔗

LLM

Title Article Repository Video
Enforce Structured Outputs from LLMs with PydanticAI 🔗 🔗

Speed-up Tools

Title Article Repository Video
Writing Safer PySpark Queries with Parameters 🔗 🔗

Contributing

If you're passionate about data science and want to share your knowledge about open-source tools for data processing and LLM applications in Python, we'd love to have you contribute!

To contribute:

  1. Create a GitHub issue:
    • Click on the "Issues" tab
    • Click "New issue"
    • Select "Article Topic Suggestion" template
    • Fill in the template with your article proposal
  2. Read our contribution guidelines

About

Collection of useful data science topics along with articles, videos, and code

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 97.4%
  • HTML 2.5%
  • Python 0.1%
  • CSS 0.0%
  • Gherkin 0.0%
  • Shell 0.0%