Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Foldingathome #163

Merged
merged 10 commits into from
Oct 12, 2020
Merged

Foldingathome #163

merged 10 commits into from
Oct 12, 2020

Conversation

mizimmer90
Copy link
Contributor

Description

Added documentation for NSP3 (pl2pro and macrodomain), NSP5 (monomer and dimer), NSP7, NSP8, NSP9, and NSP10

Status

  • YAML file for each piece of data
  • Ready to go

@mizimmer90
Copy link
Contributor Author

For many of these datasets, they're very large and calculating a size/trajectory length is an arduous task. Is there a way I can leave this blank or put a temporary placeholder while I query all the sizes and total them up?

@Lnaden
Copy link
Collaborator

Lnaden commented Oct 10, 2020

Sure. Simulation length is optional (can be blank) and the size is just a string, so if you order of magnitude guess (e.g. "100's of GB) or just put something like "---" it should accept it. I think you have some other schema problems, but add a placeholder for now and we can sift through the rest of the log after.

@Lnaden
Copy link
Collaborator

Lnaden commented Oct 10, 2020

2 changes and then schema should validate:

  • size has to be some string, can't just be a null field
  • The proteins entry for any nspXX is capitalized, e.g. NSP9

Otherwise, I think it'll work after that!

@jchodera
Copy link
Contributor

@mizimmer90 : According to @Lnaden, simulations require a model entry to be defined in order to be correctly grouped under the right target. (@Lnaden : Is that correct?)

I know it's more work to define the corresponding simulation models you started from (unless they already have been entered), but it would be awesome if we could correctly link these simulations up to the right targets this way!

@Lnaden
Copy link
Collaborator

Lnaden commented Oct 10, 2020

I actually merged a PR recently which works out the targets for simulations and models based on any listed proteins in those entries as well. If its not correctly rendering here, its because this would need a rebase. But, if you just fix the 2 items I listed above and merge, it should just apply and work on the live version, even though the preview here isn't quite right. And if it doesn't I'm still watching this very closely so I can fix it in the morning if need be.

@jchodera
Copy link
Contributor

But, if you just fix the 2 items I listed above and merge, it should just apply and work on the live version, even though the preview here isn't quite right.

@Lnaden: Aweswome!

@mizimmer90: Can you fix the remaining issues so the schema validates (rebasing or merging from master as needed) so we can get this in on Monday?

@mizimmer90
Copy link
Contributor Author

@Lnaden Thanks! Is there a preferred name for some of the proteins over others? i.e. NSP13 v helicase? Also, should I specify the subdomains? i.e. NSP3 simulations of PL2pro and the macrodomain. I see an entry for PL2pro but not the macrodomain.

@Lnaden
Copy link
Collaborator

Lnaden commented Oct 12, 2020

Is there a preferred name for some of the proteins over others? i.e. NSP13 v helicase?

For the ones with common names, we tended to fall back to the common name over the generic, e.g. helicase over NSP13. In short, the proteins entries have to match one of the name fields of the files in https://github.com/MolSSI/covid/tree/master/data/proteins. We could engineer additional logic to handle the common and general names, but that might not be a good time sink right now.

Also, should I specify the subdomains?

I had been working on logic for directly specifying subdomains and structures, but was never able to finish it as it most of the entries (so far) would have only applied to the spike and its various components. Instead, we opted to keep it more simple and let the description fill in the details.

If you think that we should add more details, we're happy to consider changes! However, that might be best left to a separate issue/PR to not hold this one up.

I think all that is left are the proper names (mostly common names instead of NSP names), and then filling in some string (e.g. "---," "O(100's GB)," etc) for the size and this should be good.

@Lnaden
Copy link
Collaborator

Lnaden commented Oct 12, 2020

Last few things:

NSP3 -> PLpro
helicase -> Helicase (case is sensitive)
NSP12 -> RdRP

Sorry about the particulars with the schema, its the only way to make sure all of the entries are linked together correctly. One of the limitations to the static webpage design.

@mizimmer90
Copy link
Contributor Author

Thanks! I made the change for NSP13 and will update NSP12. For NSP3, it's the macrodomain, not PL2pro. I don't see a specification for this domain of NSP3. Do we need to add it?

@Lnaden
Copy link
Collaborator

Lnaden commented Oct 12, 2020

If we need a new domain, go for it! Add a YAML file to the protein directory and fill in the entries. You should be able to just copy the PLpro file and replace the entries. Once you do that, I'll come back in real quick and edit the schema file to make sure that all works.

@Lnaden
Copy link
Collaborator

Lnaden commented Oct 12, 2020

I think this is ready to go from my end. @mizimmer90 Anything else you want to add?

@mizimmer90
Copy link
Contributor Author

@Lnaden Thanks! I think this is a good addition for now. I will be adding more later in the week!

@Lnaden
Copy link
Collaborator

Lnaden commented Oct 12, 2020

I'll keep an eye out for them.

@jchodera I think this is the last of the F@H PR's which have been opened recently if that was blocking anything on your end.

@Lnaden Lnaden merged commit befc935 into MolSSI:master Oct 12, 2020
@jchodera
Copy link
Contributor

@jchodera I think this is the last of the F@H PR's which have been opened recently if that was blocking anything on your end.

Awesome! I'll add these to the AWS Public Dataset page!

Thanks!

@jchodera
Copy link
Contributor

@Lnaden : These are all rendering as being the protein 3CLPro, even though they are other proteins like nsp5. Any idea what we're missing?

See, for example:
https://covid.molssi.org//simulations/#foldinghome-simulations-of-nsp5-3clpro-3clpro-protease-activity

image

@Lnaden
Copy link
Collaborator

Lnaden commented Oct 13, 2020

3CLpro is nsp5 as I understand the biology. Also called the main protease or sometimes "mpro." This is an instance of one protein having different designations, where we refer to it by its common name.

@jchodera
Copy link
Contributor

Whoops, you're right!
@mizimmer90 : Did you mean to use the nsp5 terminology, or 3CLpro or Mpro?

@Lnaden
Copy link
Collaborator

Lnaden commented Oct 13, 2020

Neither are wrong technically, we even refer to it as "SARS-CoV-2 main protease (3CLpro or NSP5)" in the titles.

@mizimmer90
Copy link
Contributor Author

@jchodera I am neutral to the name and as @Lnaden pointed out, either can be used when referencing the protein. There is only one entry for this protein to reference, which uses the 3CLpro convention

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants