Skip to content

Conversation

@dessygil
Copy link
Contributor

@dessygil dessygil commented Apr 22, 2023

Checklist:

I have spoken to Manu about different plugins I would be able to add. I chose to start with MolT5. The PR wasn't discussed in an issue but through the discussion tab and Slack. The rest seems okay to checkoff since I'm just preregistering what I want to add.

name: MoltT5
inputs: smiles
type: pretrained
group: huggingface
version: 0
submitter: Desmond Gilmour
description: "MolT5 specifically Smiles2Caption is a pretrained model based on the Colossal Clean Crawl Corpus for textual descriptions, ZINC-15 for SMILES strings, and fine tuned using ChEBI-20."
representation: text
require_3D: false
tags:
    - smiles
    - huggingface
    - transformers
    - text2text
authors:
    - Tuan Manh Lai
    - Carl Edwards
    - Kevin Ros
    - Garret Honke
    - Kyunghyun Cho
    - Heng Ji
reference: 
    - https://arxiv.org/pdf/2204.11817.pdf
  • Was this PR discussed in a issue? It is recommended to first discuss a new feature and let the community know whether you are planning or have started working on it before opening a PR.
  • Add tests to cover the fixed bug(s) or the new introduced feature(s) (if appropriate).
  • Update the API documentation if a new function is added or an existing one is deleted.
  • Added a news entry.
    • copy news/TEMPLATE.rst to news/my-feature-or-branch.rst) and edit it.

@dessygil dessygil requested a review from maclandrol as a code owner April 22, 2023 21:26
@maclandrol
Copy link
Member

Thanks @dessygil, I will let this sit for a while and merge when the model card is ready.

@maclandrol maclandrol changed the title Preregistering the plugin I would like to add Adding MolT5 to molfeat Apr 27, 2023
@maclandrol maclandrol requested a review from cwognum April 27, 2023 18:44
env.yml Outdated
- pytorch >=1.10.2
- scikit-learn
- fcd_torch
- sentencepiece
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if possible can you check whether there are hidden dependencies for sentencepiece earlier version that could clash ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked the documentation on https://github.com/google/sentencepiece I didn't find anything that said there were any hidden dependencies. Is there anywhere else I can look to double-check, or is this even the right place to look? I've never considered doing this in any of my previous projects, so thank you for suggesting this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not familiar with sentencepiece, so maybe these are naive remarks, but:

Does it make sense to add this dependency if it's only being used by a single model? Is the dependency "stand-alone" (no subdependencies) and thus easy to install? Is it popular enough that we expect other models to use it too? Is that why?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use conda-forge, so having a quick look at the recipe for sentencepiece is enough. Here: https://github.com/conda-forge/sentencepiece-feedstock/blob/main/recipe/meta.yaml

It's always worth checking how any new dependency works or conflict with existing ones. For example a new dependencies might require a version of package with could be incompatible with an existing dependency.

I don't see anything that would be a problem in this case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I posted my comment before seeing yours @cwognum.

The way I would approach it is to add the optional dependency on pyproject.toml for pip and in the env file here. Then document the requirement to make the model work.

In the conda recipe we don't need to add that dependency, as users will install that themselves.

Copy link
Member

@maclandrol maclandrol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @dessygil , please see my comments

modelCard.yml Outdated
@@ -0,0 +1,20 @@
- name: laituan245/molt5-large-smiles2caption
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please provide this information in the body of your PR, not as a file next time.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reference is supposed to be a single string, so please provide link to the paper or a DOI directly, if there isn't one, then the github repo or any other link would be fine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please provide this information in the body of your PR, not as a file next time.

I can delete this file and then edit the PR. Would this be better?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, thanks @dessygil !

modelCard.yml Outdated
- huggingface
- transformers
- text2text
authors:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please put all the authors of the original paper: https://arxiv.org/abs/2204.11817

Args:
inputs: smiles or seqs
use_encoder: If model requires encoderfeaturi
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this, and put that argument in the init.

attention_mask = None
with torch.no_grad():
out_dict = self.featurizer.model(output_hidden_states=True, **inputs)
if hasattr(self.featurizer.model, 'encoder'):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use the use_encoder if the model is a huggingface EncoderDecoder model to check whether encoder should be used. Otherwise default to what was done before.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forgot to delete the use_encoder, but I figured the hasattr() might be better because it allows you to avoid adding another init parameter. Is using init better in this scenario?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's fine the way you proposed if you delete the use_encoder, I wanted to make it more general to support most cases of EncoderDecoder models.

plugin.yaml Outdated
entry_point_prefix: new
home_url: ~
molfeat_version: ~
molfeat-MolT5:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a direct PR in molfeat, so no need to fill this since it's not a plugin.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I deleted this file

@maclandrol maclandrol closed this May 3, 2023
@maclandrol maclandrol mentioned this pull request May 3, 2023
4 tasks
@maclandrol
Copy link
Member

@dessygil Thanks again, I made some additional changes, so I am opening a new PR. where your work was merged in. See #47

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants