Adding MolT5 to molfeat #40

dessygil · 2023-04-22T21:26:58Z

Checklist:

I have spoken to Manu about different plugins I would be able to add. I chose to start with MolT5. The PR wasn't discussed in an issue but through the discussion tab and Slack. The rest seems okay to checkoff since I'm just preregistering what I want to add.

name: MoltT5
inputs: smiles
type: pretrained
group: huggingface
version: 0
submitter: Desmond Gilmour
description: "MolT5 specifically Smiles2Caption is a pretrained model based on the Colossal Clean Crawl Corpus for textual descriptions, ZINC-15 for SMILES strings, and fine tuned using ChEBI-20."
representation: text
require_3D: false
tags:
    - smiles
    - huggingface
    - transformers
    - text2text
authors:
    - Tuan Manh Lai
    - Carl Edwards
    - Kevin Ros
    - Garret Honke
    - Kyunghyun Cho
    - Heng Ji
reference: 
    - https://arxiv.org/pdf/2204.11817.pdf

Was this PR discussed in a issue? It is recommended to first discuss a new feature and let the community know whether you are planning or have started working on it before opening a PR.
Add tests to cover the fixed bug(s) or the new introduced feature(s) (if appropriate).
Update the API documentation if a new function is added or an existing one is deleted.
Added a news entry.
- copy news/TEMPLATE.rst to news/my-feature-or-branch.rst) and edit it.

maclandrol · 2023-04-23T15:07:58Z

Thanks @dessygil, I will let this sit for a while and merge when the model card is ready.

maclandrol · 2023-04-27T18:46:08Z

env.yml

  - pytorch >=1.10.2
  - scikit-learn
  - fcd_torch
+  - sentencepiece


if possible can you check whether there are hidden dependencies for sentencepiece earlier version that could clash ?

I checked the documentation on https://github.com/google/sentencepiece I didn't find anything that said there were any hidden dependencies. Is there anywhere else I can look to double-check, or is this even the right place to look? I've never considered doing this in any of my previous projects, so thank you for suggesting this.

I am not familiar with sentencepiece, so maybe these are naive remarks, but:

Does it make sense to add this dependency if it's only being used by a single model? Is the dependency "stand-alone" (no subdependencies) and thus easy to install? Is it popular enough that we expect other models to use it too? Is that why?

We use conda-forge, so having a quick look at the recipe for sentencepiece is enough. Here: https://github.com/conda-forge/sentencepiece-feedstock/blob/main/recipe/meta.yaml

It's always worth checking how any new dependency works or conflict with existing ones. For example a new dependencies might require a version of package with could be incompatible with an existing dependency.

I don't see anything that would be a problem in this case.

I posted my comment before seeing yours @cwognum.

The way I would approach it is to add the optional dependency on pyproject.toml for pip and in the env file here. Then document the requirement to make the model work.

In the conda recipe we don't need to add that dependency, as users will install that themselves.

maclandrol

Thanks @dessygil , please see my comments

maclandrol · 2023-04-27T18:47:25Z

modelCard.yml

@@ -0,0 +1,20 @@
+- name: laituan245/molt5-large-smiles2caption


please provide this information in the body of your PR, not as a file next time.

Reference is supposed to be a single string, so please provide link to the paper or a DOI directly, if there isn't one, then the github repo or any other link would be fine.

please provide this information in the body of your PR, not as a file next time.

I can delete this file and then edit the PR. Would this be better?

Yes, thanks @dessygil !

modelCard.yml

maclandrol · 2023-04-27T18:51:39Z

modelCard.yml

+    - huggingface
+    - transformers
+    - text2text
+  authors:


Please put all the authors of the original paper: https://arxiv.org/abs/2204.11817

maclandrol · 2023-04-27T18:52:40Z

molfeat/trans/pretrained/hf_transformers.py


        Args:
            inputs: smiles or seqs
+            use_encoder: If model requires encoderfeaturi


remove this, and put that argument in the init.

maclandrol · 2023-04-27T18:53:43Z

molfeat/trans/pretrained/hf_transformers.py

            attention_mask = None
        with torch.no_grad():
-            out_dict = self.featurizer.model(output_hidden_states=True, **inputs)
+            if hasattr(self.featurizer.model, 'encoder'):


Use the use_encoder if the model is a huggingface EncoderDecoder model to check whether encoder should be used. Otherwise default to what was done before.

I forgot to delete the use_encoder, but I figured the hasattr() might be better because it allows you to avoid adding another init parameter. Is using init better in this scenario?

It's fine the way you proposed if you delete the use_encoder, I wanted to make it more general to support most cases of EncoderDecoder models.

maclandrol · 2023-04-27T18:54:09Z

plugin.yaml

-  entry_point_prefix: new
-  home_url: ~
-  molfeat_version: ~
+molfeat-MolT5:


This is a direct PR in molfeat, so no need to fill this since it's not a plugin.

I deleted this file

maclandrol · 2023-05-03T14:24:06Z

@dessygil Thanks again, I made some additional changes, so I am opening a new PR. where your work was merged in. See #47

dessygil requested a review from maclandrol as a code owner April 22, 2023 21:26

maclandrol changed the title ~~Preregistering the plugin I would like to add~~ Adding MolT5 to molfeat Apr 27, 2023

maclandrol requested a review from cwognum April 27, 2023 18:44

maclandrol reviewed Apr 27, 2023

View reviewed changes

maclandrol requested changes Apr 27, 2023

View reviewed changes

maclandrol closed this May 3, 2023

maclandrol force-pushed the main branch from e6ea40e to 16955d3 Compare May 3, 2023 14:13

maclandrol mentioned this pull request May 3, 2023

Pr/dessygil/40 #47

Merged

4 tasks

		@@ -0,0 +1,20 @@
		- name: laituan245/molt5-large-smiles2caption

Adding MolT5 to molfeat #40

Adding MolT5 to molfeat #40

Uh oh!

Conversation

dessygil commented Apr 22, 2023 • edited by maclandrol Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

maclandrol commented Apr 23, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

maclandrol left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

maclandrol commented May 3, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

dessygil commented Apr 22, 2023 •

edited by maclandrol

Loading