This repo contains scripts for creating various protein sequence and structure datasets, as well as some guides for how to use them.
Protein amino acid features.
Working with the cullPDB dataset created in Zhou & Troyanskaya, 2014.
Creating a new protein sequence-structure dataset following the methods used for the cullPDB dataset, referred to as cpdb2.
Scripts for calling NCBI+ psiblast on large fasta files from BioPython and handling the results using multiprocessing.