Personalized-Cancer-Diagnosis

The workflow is as follows

A molecular pathologist selects a list of genetic variations of interest that he/she want to analyze
The molecular pathologist searches for evidence in the medical literature that somehow are relevant to the genetic variations of interest
Finally this molecular pathologist spends a huge amount of time analyzing the evidence related to each of the variations to classify them.

Our goal here is to replace step 3 by a machine learning model. The molecular pathologist will still have to decide which variations are of interest, and also collect the relevant evidence for them. But the last step, which is also the most time consuming, will be fully automated.

Dataset

We have two data files:

One conatins the information about the genetic mutations
Other contains the clinical evidence (text) that human experts/pathologists use to classify the genetic mutations.

Both these data files are have a common column called ID

Data file's information:
- training_variants (ID , Gene, Variations, Class)
- training_text (ID, Text)

Mapping the real-world problem to an ML problem

Type of Machine Learning Problem

   **There are nine different classes a genetic mutation can be classified into => Multi class classification problem**

Performance Metrics

Metric(s):

Multi class log-loss
Confusion matrix

Machine Learing Objectives and Constraints

Objective: Predict the probability of each data-point belonging to each of the nine classes.

Constraints:

Interpretability
Class probabilities are needed.
Penalize the errors in class probabilites => Metric is Log-loss.
No Latency constraints.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
PersonalizedCancerDiagnosis.ipynb		PersonalizedCancerDiagnosis.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Personalized-Cancer-Diagnosis

Dataset

Both these data files are have a common column called ID

Mapping the real-world problem to an ML problem

Performance Metrics

Machine Learing Objectives and Constraints

About

Releases

Packages

Languages

sandeeppainuly/Personalized-Cancer-Diagnosis

Folders and files

Latest commit

History

Repository files navigation

Personalized-Cancer-Diagnosis

Dataset

Both these data files are have a common column called ID

Mapping the real-world problem to an ML problem

Performance Metrics

Machine Learing Objectives and Constraints

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages