The goal of the project was to create an algorithm for detecting TADs in the Hi-C data with the best precision possible.
TADs (Topologically Associating Domains) are self-interacting genomic regions and detecting them is crucial for further development of research on the complex chromatin structure. Our initial goal was simple. We decided to analyze two different algorithms in more depth and possibly modify them. The final result was modified version of a TopDom algorithm, with additional sensitivity parameter that allows specifing how precisely TADs are detected.
Short summary can be found in digest.pdf file and longer description in summary_record.pdf. This project was part of the 4EU+ Alliance.
This repository was copied from: https://github.com/meet-eu-21/Team-WA1
Note: due to some technical issues I did not make many commits to the original repository. However, the majority of my code was committed by other team members.
This project was created in cooperation with Leszek Troc (https://github.com/Lechuuu000), Sebastian Kot (https://github.com/kot-sebastian) and Ignacy Makowski (https://github.com/Ilidan).
Meet-EU Team WA1
Topic A : Prediction of TADs
- Create GitHub account
- Provide email to get access.
- Create ssh key
cd /home/my-user/.ssh
ssh-keygen -t rsa -b 4096
- provide name and optional password
cat <given-name>.pub
- copy retrieved
- GitHub settings
- SSH and GCP keys tab
- add ssh key
- type any name
- paste result of
cat
-a - navigate to terminal
ssh -T git@github.com -i ~/.ssh/<given-name>
- now I can:
git clone git@github.com:<scope-user>/<project-name>.git
, in our casegit clone git@github.com:meet-eu-21/Team-WA1.git
- HiC - data to our algorithms
- TAD - additional metadata + results
- All (approx. 25 gb) -
./scripts/download_all.sh
- Only GM12878 (approx. 11 gb) -
./scripts/download_GM12878.sh
- install
python3
- install required packages
- download and unpack data (can use scripts directory)
- in src directory
python3 main.py {args}
(available args and their behaviour can be found insummary_report.pdf
)
- example:
python3 main.py--results-path=../results --resolution=100k --data-path=../data/www.lcqb.upmc.fr/meetu/dataforstudent/HiC/GM12878/100kb_resolution_intrachromosomal --run-topdom=True --with-metrics-results=True --with-results-coordinates=True --topdom-sensitivity=0.04 --topdom-window-size=5 --chromosomes=1,22,X
- Results should be available in
{desired directory}/topdom
/{desired directory}/arrowhead
directory - Remember to change resolution if you change data