Skip to content

A cancer tumor can have thousands of genetic mutations. But the challenge is distinguishing the mutations that contribute to tumor growth (drivers) from the neutral mutations (passengers).

Notifications You must be signed in to change notification settings

sandeeppainuly/Personalized-Cancer-Diagnosis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 

Repository files navigation

Personalized-Cancer-Diagnosis

The workflow is as follows

  • A molecular pathologist selects a list of genetic variations of interest that he/she want to analyze
  • The molecular pathologist searches for evidence in the medical literature that somehow are relevant to the genetic variations of interest
  • Finally this molecular pathologist spends a huge amount of time analyzing the evidence related to each of the variations to classify them.

Our goal here is to replace step 3 by a machine learning model. The molecular pathologist will still have to decide which variations are of interest, and also collect the relevant evidence for them. But the last step, which is also the most time consuming, will be fully automated.

Dataset

We have two data files:

  1. One conatins the information about the genetic mutations
  2. Other contains the clinical evidence (text) that human experts/pathologists use to classify the genetic mutations.

Both these data files are have a common column called ID

  • Data file's information:

    • training_variants (ID , Gene, Variations, Class)
    • training_text (ID, Text)

Mapping the real-world problem to an ML problem

Type of Machine Learning Problem

   **There are nine different classes a genetic mutation can be classified into => Multi class classification problem**

Performance Metrics

Metric(s):

  • Multi class log-loss
  • Confusion matrix

Machine Learing Objectives and Constraints

Objective: Predict the probability of each data-point belonging to each of the nine classes.

Constraints:

  • Interpretability
  • Class probabilities are needed.
  • Penalize the errors in class probabilites => Metric is Log-loss.
  • No Latency constraints.

About

A cancer tumor can have thousands of genetic mutations. But the challenge is distinguishing the mutations that contribute to tumor growth (drivers) from the neutral mutations (passengers).

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published