Skip to content

Applied Machine Learning in Genomic Data Science

License

Notifications You must be signed in to change notification settings

CrunchyFlakes/amlg_crunchyflakes

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Applied Machine Learning in Genomic Data Science

This repository contains the code for the course Applied Machine Learning in Genomic Data Science (AMLG).

This course was held at:

Happy coding! 👩‍💻👨‍💻

How to Work With This Repository

In this course, all exercises are provided as Jupyter notebooks.

A Jupyter notebook is a JSON file, following a versioned schema, usually ending with the .ipynb extension. The main part of a Jupyter notebook is a list of cells. List of cells are different types of cells for Markdown, code, and output of the code type cells.

The notebooks are organized in demos and exercises. In each exercise folder, you will find two versions of each notebook: one named, e.g., hic_analysis.ipynb, and another one named hic_analysis_assignment.ipynb. Please work in the assignment version. If you get stuck, feel free to take a look at the corresponding solution. Note: The assignment versions will be uploaded according to the current status of the course.

Locally

You can simply clone the repository over HTTP via the command line:

git clone https://github.com/voges/amlg.git

Git is probably already installed on every Linux distribution. On Windows systems, we recommend using the Windows Subsystem for Linux along with any long-term support (LTS) Ubuntu distribution. Please refer to the Ubuntu documentation for installation instructions. On Mac systems, we recommend installing Git using the Xcode command line tools (via xcode-select --install) or via Homebrew.

We recommend Visual Studio Code with its Jupyter extensions.

Online

We recommend using the GWDG Jupyter Cloud. Here, in addition to the terminal, you can use the graphical user interface to clone the repository.

Alternatively, you can use any other online Jupyter server, such as Google Colab.

Data Availability

The data used are available via the Harvard Dataverse under the DOI 10.7910/DVN/ZSVS5X. A copy of the data is also hosted here via Seafile at Leibniz University Hannover. Note: It is not necessary to download the data beforehand. The individual notebooks already contain the code to download the necessary data.

Package and Environment Management

We use pip for package and environment management.

Follow the steps below to set up your environment using the provided requirements.txt file.

The environment has been used and tested on the following systems:

  • macOS Sonoma 14.6.1 with Python 3.12.6 and pip 24.2
  • Ubuntu 22.04.5 LTS with Python 3.10.12 and pip 22.0.2

Setup Instructions

  1. Create a virtual environment:

    python3 -m venv .venv
  2. Activate the virtual environment:

    source .venv/bin/activate
  3. Install the required packages:

    pip3 install -r requirements.txt

Additional Commands

  • Install additional packages:

    pip3 install <package>
  • Update the requirements file:

    pip3 freeze > requirements.txt
  • Deactivate the virtual environment:

    deactivate

Code Linting

We use Ruff to check the code for linting issues.

  1. Install Ruff:

    pip3 install ruff
  2. Run the following command from the root of the Git repository:

    ruff check .

License

This project is licensed under the MIT License. See the LICENSE file for details.

About

Applied Machine Learning in Genomic Data Science

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 91.7%
  • Python 8.3%