Skip to content

Commit d84cc74

Browse files
author
Apoorv Malik
committed
update
1 parent e3941f2 commit d84cc74

File tree

2 files changed

+22
-8
lines changed

2 files changed

+22
-8
lines changed

eval/rnastralign/README.md

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
# RNAstralign Evaluation
2+
This directory contains the structural distance evaluation script and RNAstralign dataset.
3+
4+
## Data
5+
Four families (Group I Intron, tmRNA, tRNA, and 5S rRNA) are used for parameter tuning and another four families (SRP, RNaseP, telomerase, and 16S rRNA) are used for testing. For Group I Intron, 5S rRNA, SRP, RNaseP, and 16S rRNA, there are multiple subfamilies within each family, so we chose one specific subfamily for these five families (See table below for more details).
6+
7+
| family | subfamily | avg. seq. len. | avg. seq. identity |
8+
|------------|---------------------|----------------|--------------------|
9+
| Group 1 | IC1 | 428.5 | 0.31 |
10+
| tmRNA | - | 367.4 | 0.35 |
11+
| tRNA | - | 77.1 | 0.48 |
12+
| 5S rRNA | Bacteria | 116.2 | 0.61 |
13+
|------------|---------------------|----------------|--------------------|
14+
| SRP | Protozoan | 285.8 | 0.35 |
15+
| RNaseP | Bacterial | 360.0 | 0.43 |
16+
| telomerase | - | 444.9 | 0.45 |
17+
| 16S RNA | Alphaproteobacteria | 1419.2 | 0.85 |
18+
19+
There are two versions of the data, aligned version (all the homologs in the sample are aligned) and unaligned version:
20+
- Aligned Version: [data/aln/](./rnastralign/data/aln/)
21+
- Unaligned Version: [data/no-aln/](./rnastralign/data/no-aln/)
22+

eval/rnastralign/get_sequence_identity.py

Lines changed: 0 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -30,14 +30,6 @@ def main(data_path):
3030
avg_identity = utility.calculate_msa_seq_identity(seqs)
3131
seq_identities[family].append(avg_identity)
3232

33-
# Print formatted results
34-
# print("Sequence Identity")
35-
# print("{:<8}\t{:>10}".format("Family", "Identity"))
36-
37-
# for family in sorted(seq_identities):
38-
# avg_value = np.mean(seq_identities[family])
39-
# print("{:<8}\t{:>10.2f}".format(family, avg_value))
40-
4133
# Print formatted results
4234
print("Sequence Identity")
4335
print("{:<8}\t{:>10}\t{:>10}".format("Family", "Identity", "Length"))

0 commit comments

Comments
 (0)