Skip to content

phonosync/templateproject

Repository files navigation

Sample Project

This is a template for a Data Science project using Python, conda for environment management and Quarto for documentation.

To adapt to your individual project change sample to the respective project name in

  • filename of .yml file
  • environment name in .yml-file
  • in the commands below

Adapt the LICENSE as required.

Provide a brief description of the project here.

Project Organisation

According to Is It Ops That Make Data Science Scientific? Archives of Data Science, Series A, vol 8, p. 12, 2022.

The Data Science Process

Code and configurations used in the different project phases are stored in the subfolders

  • data_acquisition
  • eda
  • modelling
  • deployment

Templates for the documentation artefacts from the different project phases are provided in the subfolder docs in the form of a Quarto project:

  • Project charta
  • Data report
  • Modelling report
  • Evaluation decision log

See section Quarto Setup and Usage for instructions on how to build and serve the documentation website from the indvidual reports using Quarto.

Python Environment Setup and Management

Install conda environment:

$ conda env create -f conda.yml

Update the environment with new packages/versions:

  1. modify template.yml
  2. run conda env update:
$ conda env update --name sample --file conda.yml --prune

prune uninstalls dependencies which were removed from sample.yml

Use environment: before working on the project always make sure you have the environment activated:

$ conda activate sample

Check the version of a specific package (e.g. html5lib) in the environment:

$ conda list html5lib

Export an environment file across platforms: Include only the packages that were specifically installed. Dependencies will be resolved upon installation

$ conda env export --from-history > conda.yml

List all installed environments: From the base environment run

$ conda info --envs

Remove environment:

$ conda env remove -n sample

See the complete documentation on managing conda-environments.

Runtime Configuration with Environment Variables

The environment variables are specified in a .env-File, which is never commited into version control, as it may contain secrets. The repo just contains the file .env.template to demonstrate how environment variables are specified.

You have to create a local copy of .env.template in the project root folder and the easiest is to just rename it to .env.

The content of the .env-file is then read by the pypi-dependency: python-dotenv. Usage:

import os
from dotenv import load_dotenv

load_dotenv reads the .env-file and sets the environment variables:

load_dotenv()

which can then be accessed (assuming the file contains a line SAMPLE_VAR=<some value>):

os.environ['SAMPLE_VAR']

Quarto Setup and Usage

If Quarto is used to build a documentation website as described in the Project Organisation section, you need to

  1. Install Quarto
  2. Optional: quarto-extension for VS Code
  3. Adapt the configuration file docs/_quarto.yml as needed.
  4. Build the website by running quarto render from the docs subfolder. This will push all files into the docs/build subfolder.
  5. The you can check the website locally by opening docs/build/index.html in a browser

If you would like to use github pages to serve the documentation website, and at the same time avoid pushing the rendered files into the repo (makes very ugly diffs) but executing embedded code blocks only locally, the initial setup (only needed once) of the github action is according to https://quarto.org/docs/publishing/github-pages.html#github-action as follows:

  1. Add
        execute:
            freeze: auto
    to the _quarto.yml file
  2. execute quarto render from the docs folder
  3. run quarto publish gh-pages (generates and pushes a branch called gh-pages)
  4. make sure that github pages is configured to serve the root of the gh-pages branch
  5. add the definition of the github action workflow .github/workflows/publish.yml (see below)
  6. check all of the newly created files (including the _freeze directory) into the main branch of the repository
  7. docs/build is excluded by the .gitignore
  8. then push to main

Github action workflow configuration file to add in .github/workflows/publish.yml:

on:
  workflow_dispatch:
  push:
    branches: main

name: Quarto Publish

jobs:
  build-deploy:
    runs-on: ubuntu-latest
    permissions:
      contents: write
    steps:
      - name: Check out repository
        uses: actions/checkout@v4
    
      - name: Install librsvg
        run: sudo apt-get install librsvg2-bin

      - name: Set up Quarto
        uses: quarto-dev/quarto-actions/setup@v2
        with:
          tinytex: true

      - name: Render and Publish
        uses: quarto-dev/quarto-actions/publish@v2
        with:
          target: gh-pages
          path: docs
        env:
          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

From now on, every update just needs:

  1. Build the website by running quarto render from the docs subfolder. This will push the rendered files into docs/build (not checked into the repository via .gitignore) and computations in the docs/_freeze (checked in so that github action runners to not need python) subfolder.
  2. Check the website locally by opening the docs/build/index.html
  3. Push all updated files into the main branch. This will trigger a github action that
    • pushes an update to the github-pages branch
    • renders and publishes the site to https://.github.io/sample/

Additional notes:

  • Rendering svg-files requires the librsvg package. The github action (Linux Ubuntu) installs it via sudo apt-get install librsvg2-bin. To render locally, you need to install it on your system as well. On macOS, this can be done via brew install librsvg. On Windows you can use chocholatey to install it: choco install rsvg-convet (https://community.chocolatey.org/packages?&tags=librsvg).

Further Information

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages