Skip to content

Tutorials

Simone Chiarella edited this page Jan 22, 2025 · 7 revisions

Configure your experiment

ProtACon allows the setting of several parameters within config.txt.

Three sections are present: cutoffs, paths, and proteins.

Cutoffs

Here is where you can configure the thresholding operations that are performed through the pipeline.

  • ATTENTION_CUTOFF (default: 0.1)

    All the values of attention in each attention matrix from the model that are below the cutoff are set to zero. This operation is made to filter out "noisy" attention that may hide some patterns. In particular, attention alignment increases significantly after this thresholding. The two images below represent the same attention matrix, respectively with a threshold of 0 and 0.1.

6NJC_0.0 6NJC_0.1

  • DISTANCE_CUTOFF (default: 8.0) and POSITION_CUTOFF (default: 6)

    These parameters set the definition of "contact" between parameters in the run. In other words, they are responsible for the binarization of the contact map. Considering the default values of the two cutoffs, a value of 1—i.e., a contact—is assigned to a couple of residues if:

    • they are at a distance in the protein tertiary structure smaller than 8 Å;
    • they are separated in the primary structure (the amino acid sequence) by at least 5 residues—in order to discard the contacts between residues that are close to each other just because of their position in the chain.

    If even one of those requirements is not satisfied, then a zero is assigned in the binary contact map (see image below).

    6NJC_proximity_map 6NJC_binary_contact_map

Paths

The options PDB_FOLDER (default: pdb_files), FILES (default: files) and PLOTS (default: plots) set the folders where to store the PDB files of the proteins analyzed, the csv and other files, and the plots generated during the run, respectively. The folders indicated are created if they don't exist.

Find the complete description of the files that are downloaded during the run at the page Guides of this wiki.

Proteins

The options in this section select the set of proteins to analyze during the execution of the command on_set.

  • PROTEIN_CODES (default is blank)

    Here you can type the PDB codes corresponding to the proteins you want in your analysis. The entries have to be separated with one blank space. Please note that this option must be empty if you want to make a search through RCSB Search API. In other words, if PROTEIN_CODES is not empty, the following options will be just skipped, and no API search will be performed.

  • MIN_LENGTH (default: 15) and MAX_LENGTH (default: 300)

    They set respectively the minimum and maximum numbers of residues to include in the search through RCSB Search API. The attribute accessed to achieve that is rcsbsearchapi.rcsb_attributes.rcsb_assembly_info.polymer_monomer_count.

  • MIN_RESIDUES (default: 10)

    It is the minimum number of valid residues that the peptide chain must have. The presence of this option is due to ligands, residues that cannot be classified within the group of the twenty canonical amino acids. For this reason, the ligands are discarded from the chain, that becomes shorter than its original length. The search is based on MIN_LENGTH, but some of the fetched proteins may have several ligands. When all the ligands in one chain have been discarded, the length of the chain is checked, and it is skipped if it has less residues than MIN_RESIDUES.

  • SAMPLE_SIZE (default: 1000)

    It sets the number of proteins to include in the search through RCSB Search API. This is achieved after the search is done, by keeping only the specified number of proteins.

Clone this wiki locally