Skip to content

A short tutorial to using DL and keras for genomics with sequence data.

Notifications You must be signed in to change notification settings

rschwess/tutorial_dl_for_genomics

Repository files navigation

DL for Genomics Tutorial

A short tutorial to using DL and keras for genomics with sequence data.

Requirements

  • python 3.5 or 3.6

    • modules:
      • keras
      • numpy
    • I recommend anaconda https://docs.anaconda.com/ or miniconda to manage environments
      • to create a self contained environment after conda install:
        • conda create -n tutorial python=3.6
        • conda activate tutorial (or: source activate tutorial)
        • conda install pip
        • conda install nb_conda (to see virtual environments in jupyter notebooks)
        • pip install numpy tensorflow keras
        • to leave type: source deactivate
  • R v3+

    • packages:
      • ggplot2

Description

This is a short tutorial running through some principles of using DL for genomics, specifically for sequence classification. We will use simulated sequences from 4 different classes that are populated with transcption factor motifs and other DNA patterns. We will build and train a small convolutional neuronal network using keras to learn to classify from the sequence only. We will also apply this network for new predictions and inspect what the network has learned to be important features.

  • Colab Notebook a great way to run the tutorial on google hardware :). The colab notebook actually has some updated visualizations and niceties so I strongly suggest running it.
  • Notebook running through the tutorial
  • python script to adjust and run the entire analysis
  • input data
  • helper scripts for simulating data and visualizing kernel weights

About

A short tutorial to using DL and keras for genomics with sequence data.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published