-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Foldingathome #163
Foldingathome #163
Conversation
For many of these datasets, they're very large and calculating a size/trajectory length is an arduous task. Is there a way I can leave this blank or put a temporary placeholder while I query all the sizes and total them up? |
Sure. Simulation length is optional (can be blank) and the size is just a string, so if you order of magnitude guess (e.g. "100's of GB) or just put something like "---" it should accept it. I think you have some other schema problems, but add a placeholder for now and we can sift through the rest of the log after. |
2 changes and then schema should validate:
Otherwise, I think it'll work after that! |
@mizimmer90 : According to @Lnaden, simulations require a I know it's more work to define the corresponding simulation models you started from (unless they already have been entered), but it would be awesome if we could correctly link these simulations up to the right targets this way! |
I actually merged a PR recently which works out the |
@Lnaden: Aweswome! @mizimmer90: Can you fix the remaining issues so the schema validates (rebasing or merging from |
@Lnaden Thanks! Is there a preferred name for some of the proteins over others? i.e. NSP13 v helicase? Also, should I specify the subdomains? i.e. NSP3 simulations of PL2pro and the macrodomain. I see an entry for PL2pro but not the macrodomain. |
For the ones with common names, we tended to fall back to the common name over the generic, e.g. helicase over NSP13. In short, the
I had been working on logic for directly specifying subdomains and structures, but was never able to finish it as it most of the entries (so far) would have only applied to the If you think that we should add more details, we're happy to consider changes! However, that might be best left to a separate issue/PR to not hold this one up. I think all that is left are the proper names (mostly common names instead of NSP names), and then filling in some string (e.g. "---," "O(100's GB)," etc) for the |
Last few things:
Sorry about the particulars with the schema, its the only way to make sure all of the entries are linked together correctly. One of the limitations to the static webpage design. |
Thanks! I made the change for NSP13 and will update NSP12. For NSP3, it's the macrodomain, not PL2pro. I don't see a specification for this domain of NSP3. Do we need to add it? |
If we need a new domain, go for it! Add a YAML file to the |
I think this is ready to go from my end. @mizimmer90 Anything else you want to add? |
@Lnaden Thanks! I think this is a good addition for now. I will be adding more later in the week! |
I'll keep an eye out for them. @jchodera I think this is the last of the F@H PR's which have been opened recently if that was blocking anything on your end. |
Awesome! I'll add these to the AWS Public Dataset page! Thanks! |
@Lnaden : These are all rendering as being the protein See, for example: |
3CLpro is nsp5 as I understand the biology. Also called the main protease or sometimes "mpro." This is an instance of one protein having different designations, where we refer to it by its common name. |
Whoops, you're right! |
Neither are wrong technically, we even refer to it as "SARS-CoV-2 main protease (3CLpro or NSP5)" in the titles. |
Description
Added documentation for NSP3 (pl2pro and macrodomain), NSP5 (monomer and dimer), NSP7, NSP8, NSP9, and NSP10
Status