Cookiecutter Data Science

A logical, reasonably standardized, but flexible project structure for doing and sharing data science work.

Adapted from https://github.com/drivendata/cookiecutter-data-science

Requirements to use the cookiecutter template:

Python 2.7 or 3.5
Cookiecutter Python package >= 1.4.0: This can be installed with pip by or conda depending on how you manage your Python packages:

$ pip install cookiecutter

or

$ conda config --add channels conda-forge
$ conda install cookiecutter

To start a new project, run:

cookiecutter https://github.com/drivendata/cookiecutter-data-science.git

The resulting directory structure

The directory structure of your new project looks like this:

├── LICENSE
├── Makefile             <- Contains instructions for command line execution
├── README.md
├── data                 <- Base data directory, no data will be directly held under this folder
│   ├── archive          <- Archived data that is not in use, out of date or no longer needed.
│   ├── interim          <- Intermediate data that has been transformed, this could be data that is undergoing staging but is not yet ready for models or to be cut for presentations
│   ├── processed        <- Final data ready for training models, or scoring models, doing data cuts
│   └── raw              <- Original data from clients and/or third party, this data is raw form and could include dictionaries
├── docs                 <- A default Sphinx project; see sphinx-doc.org for details
│   ├── Makefile
│   ├── commands.rst
│   ├── conf.py
│   ├── getting-started.rst
│   ├── index.rst
│   └── make.bat
├── model                <- If only one tool is used then files can be stored directly under folder
│   ├── llamasoft        <- For Llamasoft models this is self-contained file that holds code and data.
│   ├── python
│   ├── r
│   └── spss
├── output
│   ├── log              <- Log files if scripts have developed to capture error messaging that might occur
│   └── results          <- Model output, could include model output such as coefficient values, t-tests, variable importance, accuracy statistics etc
│       ├── predict      <- Sub folder for capturing predictions
│       └── visuals
│           ├── figures  <- Sub folder for figures created as part of the model development and evaluation, might be diagnostic plots or variable importance plots
│           └── tableau  <- Dashboards created show insights on models
├── requirements.txt
├── setup.py
├── src
│   ├── assets           <- Store here data dictionaries, modalities mapping, etc...
│   ├── data             <- Data pipelines, merging and preparation and feature engineering scripts
│   ├── environement     <- Environment directory contains docker containers used to create either r or python sandbox environments, anaconda environments, R environment (R version and package versions).
│   ├── explore          <- Notebooks/scripts used in the initial exploration and discovery phase
│   ├── main             <- Scripts that spawn other scripts such as data prep, model training and interference and testing
│   ├── model            <- Model training, hyper parameter tuning, evaluation and prediction scripts
│   ├── test             <- Test scripts used to run through scenarios to ensure scripts are fit for purpose
│   └── utils            <- Scripts that don't fit into the other categories
├── test_environment.py
└── tox.ini

Contributing

We welcome contributions! See the docs for guidelines.

Installing development requirements

pip install -r requirements.txt

Running the tests

py.test tests

Name		Name	Last commit message	Last commit date
Latest commit History 153 Commits
docs		docs
tests		tests
{{ cookiecutter.repo_name }}		{{ cookiecutter.repo_name }}
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
cookiecutter.json		cookiecutter.json
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Cookiecutter Data Science

Requirements to use the cookiecutter template:

To start a new project, run:

The resulting directory structure

Contributing

Installing development requirements

Running the tests

About

Uh oh!

Releases

Packages

Languages

License

greghor/cookiecutter-data-science-workflow

Folders and files

Latest commit

History

Repository files navigation

Cookiecutter Data Science

Requirements to use the cookiecutter template:

To start a new project, run:

The resulting directory structure

Contributing

Installing development requirements

Running the tests

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages