Skip to content

jacobwbengtson/jake_cookiecutter

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

3 Commits
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Cookiecutter Data Science

A logical, reasonably standardized, but flexible project structure for doing and sharing data science work at Farmers Edge.

Requirements to use the cookiecutter template:


  • Python 2.7 or 3.5
  • Cookiecutter Python package >= 1.4.0: This can be installed with pip by or conda depending on how you manage your Python packages:
$ pip install cookiecutter

or

$ conda config --add channels conda-forge
$ conda install cookiecutter

To start a new project, run:


cookiecutter https://github.com/jacobwbengtson/jake_cookiecutter

asciicast

The resulting directory structure


The directory structure of your new project looks like this:

β”œβ”€β”€ README.md             <- The top-level README for developers using this project.
β”œβ”€β”€ .gitignore            <- Boiler plate version will be provided
β”œβ”€β”€ config.txt            <- contains passwords and tokens that should not be version controlled
β”œβ”€β”€ environment.yml       <- The .yml file used to create the envrionment for the project.
β”‚                         Generate .yml file using 'conda env_name export > environment.yml'.
β”‚                         Generate a virtual environment from .yml using 'conda env_name create -f envrionment.yml'.
β”œβ”€β”€ data
β”‚   β”œβ”€β”€ processed         <- The final, canonical data sets for modeling.
β”‚   β”œβ”€β”€ interim           <- Data that has been cleansed or altered, but is not in its final state
β”‚   └── raw               <- The original, immutable data dump.
β”‚
β”œβ”€β”€ models                <- Trained and serialized models, model predictions, or model summaries
β”‚
β”œβ”€β”€ exploration           <- Jupyter notebooks or python scripts for EDA. Naming convention is a number
β”‚                            (for ordering), the creator's initials, and a short `_` delimited description, e.g.
β”‚                             e.g. `1.0_jwb_initial_data_exploration.ipynb`.
β”‚
β”œβ”€β”€ experiments           <- Jupyter notebooks or python scripts for model experimentation. Naming convention is a number
β”‚                            (for ordering), the creator's initials, and a short `_` delimited description, e.g.
β”‚                             e.g. `1.0_jwb_random_forest.py`.
β”‚
β”œβ”€β”€ references            <- Data dictionaries, manuals, and all other explanatory materials.
β”‚
β”œβ”€β”€ main.py               <- Script that will run everything required to generate the best working model for the project
β”‚                            From data ingestion to model training
β”‚
β”‚
└── src                   <- Source code for use in this project.
    β”œβ”€β”€ __init__.py       <- Makes src a Python module
    β”‚
    β”œβ”€β”€ data              <- Functions to download, generate, combine, clean, or featurize data.
    β”‚   β”œβ”€β”€ pull.py       <- Outputs to data/raw
    β”‚   β”œβ”€β”€ clean.py      <- Outputs to data/interim
    β”‚   └── featurize.py  <- Outputs to either data/interim or data/processed
    β”‚
    β”œβ”€β”€ models            <- Functions to train/test models, or use trained models for predictions
    β”‚   β”œβ”€β”€ train.py
    β”‚   β”œβ”€β”€ test.py
    β”‚   └── predict.py
    β”‚
    └── visualization     <- Functions to create exploratory and results oriented visualizations
        └── visualize.py

Contributing

We welcome contributions! See the docs for guidelines.

Installing Anaconda Environment


conda env create -f environment.yml

About

my own version of cookie cutter

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages