Skip to content

Conversation

@timholy
Copy link
Collaborator

@timholy timholy commented May 2, 2025

While we can get this info from running DSSP, it seems reasonable to extract this information from the file when it is present.

One issue is that the annotations used are more detailed than the one-letter codes. I may not have handled this correctly, and perhaps it may be worth considering if we want to consider an alternative approach. (E.g., an @enum?) It might also be nice to include the numeric code, e.g., for TM6. (Presumably there will always be a switch in categorization before the next structure of the same type, so this is not as critical.)

@codecov
Copy link

codecov bot commented May 2, 2025

Codecov Report

Attention: Patch coverage is 93.10345% with 2 lines in your changes missing coverage. Please review.

Project coverage is 94.95%. Comparing base (58456ce) to head (4dabaf4).
Report is 1 commits behind head on master.

Files with missing lines Patch % Lines
src/mmcif.jl 93.10% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master      #65      +/-   ##
==========================================
- Coverage   94.99%   94.95%   -0.04%     
==========================================
  Files          14       14              
  Lines        1919     1944      +25     
==========================================
+ Hits         1823     1846      +23     
- Misses         96       98       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Member

@jgreener64 jgreener64 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just a few comments.

I will also give you write access to the package.

# Secondary structure assignment
if haskey(mmcif_dict, "_struct_conf.conf_type_id")
(run_dssp | run_stride) && @warn "Secondary structure assignment will be overwritten"
for (i, id) in pairs(mmcif_dict["_struct_conf.conf_type_id"])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth benchmarking that this doesn't slow down mmCIF reading much.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With

git diff ../src/mmcif.jl
diff --git a/src/mmcif.jl b/src/mmcif.jl
index 4dc94d4..7e14271 100644
--- a/src/mmcif.jl
+++ b/src/mmcif.jl
@@ -347,8 +347,9 @@ function MolecularStructure(mmcif_dict::MMCIFDict;
     end

     # Secondary structure assignment
-    if haskey(mmcif_dict, "_struct_conf.conf_type_id")
-        (run_dssp | run_stride) && @warn "Secondary structure assignment will be overwritten"
+    if !(run_dssp | run_stride) && haskey(mmcif_dict, "_struct_conf.conf_type_id")
+        println("parsing secondary structure from mmCIF file")
+
         for (i, id) in pairs(mmcif_dict["_struct_conf.conf_type_id"])
             chainid = mmcif_dict["_struct_conf.beg_label_asym_id"][i]
             mmcif_dict["_struct_conf.end_label_asym_id"][i] == chainid || continue   # mismatch in chain id
@@ -369,12 +370,12 @@ function MolecularStructure(mmcif_dict::MMCIFDict;
     if run_dssp && run_stride
         throw(ArgumentError("run_dssp and run_stride cannot both be true"))
     end
-    if run_dssp
-        rundssp!(struc)
-    end
-    if run_stride
-        runstride!(struc)
-    end
+    # if run_dssp
+    #     rundssp!(struc)
+    # end
+    # if run_stride
+    #     runstride!(struc)
+    # end

     return struc
 end

I get this:

julia> @time read(cif_path, MMCIFFormat; run_dssp=false)
parsing secondary structure from mmCIF file
  0.060777 seconds (780.75 k allocations: 50.053 MiB)
MolecularStructure 1BQ0.cif with 20 models, 1 chains (A), 77 residues, 1244 atoms

julia> @time read(cif_path, MMCIFFormat; run_dssp=true)
  0.058849 seconds (780.61 k allocations: 50.030 MiB)
MolecularStructure 1BQ0.cif with 20 models, 1 chains (A), 77 residues, 1244 atoms

@timholy timholy merged commit b18bec3 into BioJulia:master May 5, 2025
7 of 9 checks passed
@timholy timholy deleted the teh/ss branch May 5, 2025 20:47
@jgreener64
Copy link
Member

Looks good, thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants