Installation

Install EvoDiff according to their instructions. They recommend using python 3.8.
To get clustering to work with protclust, you will need to have mmseqs2 installed in the command line. If you are using conda or micromamba, this is as simple as running conda install -c bioconda mmseqs2. You can visit the mmseqs2 GitHub for more installation options.

Data Collection

Begin with scripts in the preprocessing directory, starting with download_data.py. By altering the main function, you can change which EC class you want to download. If you are starting, I recommend using EC 5 since it is the smallest.
Next, access filter.ipynb. You should modify the pandas at the beginning to import your data, wherever you stored it. From here, the notebook will combine the data, assign it labels, and cluster it using protclust.

Training

At this point, you are ready to run. Try running train.py making sure that the data directory correctly points to your stored data. train_full.py implements several useful features like Autocast, learning rate warmup and scheduling, and advanced logging.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
preprocessing		preprocessing
.gitignore		.gitignore
README.md		README.md
__init__.py		__init__.py
collator.py		collator.py
data.py		data.py
generate.ipynb		generate.ipynb
generate.py		generate.py
model.py		model.py
train.py		train.py
train_full.py		train_full.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Repository files navigation

Installation

Data Collection

Training

About

Uh oh!

Releases

Packages

Languages

Uh oh!

Uh oh!

michaelscutari/evodiff-conditional-gen

Folders and files

Latest commit

History

Repository files navigation

Installation

Data Collection

Training

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages