tag:github.com,2008:https://github.com/DeepGraphLearning/torchdrug/releases Release notes from torchdrug 2023-07-16T20:53:17Z tag:github.com,2008:Repository/394517440/v0.2.1 2023-07-16T22:37:17Z 0.2.1 Release <p>The new 0.2.1 release supports PyTorch 2.0 and Python 3.10. We fix quite a few bugs based on the suggestions from the community. Thanks everyone for contributing to this library.</p> <h1>Compatibility Changes</h1> <ul> <li>TorchDrug now supports Python versions from 3.7 to 3.10, and PyTorch versions from 1.8 to 2.0. There is no change in the minimal versions, so you can easily update your previous environment to use TorchDrug 0.2.1.</li> <li>For <code>PropertyPrediction</code> , we change the its <code>predict</code> function to directly output original values rather than standardized values. This is more intuitive for new users. Note this change is not backward compatible. (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="1295029993" data-permission-text="Title is private" data-url="https://github.com/DeepGraphLearning/torchdrug/issues/109" data-hovercard-type="issue" data-hovercard-url="/DeepGraphLearning/torchdrug/issues/109/hovercard" href="https://github.com/DeepGraphLearning/torchdrug/issues/109">#109</a>, thanks to <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/users/kanojikajino/hovercard" data-octo-click="hovercard-link-click" data-octo-dimensions="link_type:self" href="https://github.com/kanojikajino">@kanojikajino</a>)</li> </ul> <h1>Improvements</h1> <ul> <li>Add batch normalization and dropout in <code>PropertyPrediction</code></li> <li>Support custom edge feature function in <code>GraphConstruction</code></li> <li>Support ESM-2 models in <code>EvolutionaryScaleModeling</code></li> <li>Add full batch evaluation for <code>KnowledgeGraphCompletion</code></li> <li>Support dict, list and tuple in config dict</li> <li>Add instructions for installation on Apple silicon (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="1622930109" data-permission-text="Title is private" data-url="https://github.com/DeepGraphLearning/torchdrug/issues/176" data-hovercard-type="pull_request" data-hovercard-url="/DeepGraphLearning/torchdrug/pull/176/hovercard" href="https://github.com/DeepGraphLearning/torchdrug/pull/176">#176</a>, thanks to <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/users/migalkin/hovercard" data-octo-click="hovercard-link-click" data-octo-dimensions="link_type:self" href="https://github.com/migalkin">@migalkin</a>)</li> <li>Reduce dependency to <code>matplotlib-base</code> (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="1408084970" data-permission-text="Title is private" data-url="https://github.com/DeepGraphLearning/torchdrug/issues/141" data-hovercard-type="pull_request" data-hovercard-url="/DeepGraphLearning/torchdrug/pull/141/hovercard" href="https://github.com/DeepGraphLearning/torchdrug/pull/141">#141</a>, thanks to <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/users/jamesmyatt/hovercard" data-octo-click="hovercard-link-click" data-octo-dimensions="link_type:self" href="https://github.com/jamesmyatt">@jamesmyatt</a>)</li> </ul> <h1>Bug Fixes</h1> <ul> <li>Fix variable names in <code>NeuralLogicProgramming</code> (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="1349264410" data-permission-text="Title is private" data-url="https://github.com/DeepGraphLearning/torchdrug/issues/126" data-hovercard-type="issue" data-hovercard-url="/DeepGraphLearning/torchdrug/issues/126/hovercard" href="https://github.com/DeepGraphLearning/torchdrug/issues/126">#126</a>)</li> <li>Fix interface for the new esm library (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="1382500555" data-permission-text="Title is private" data-url="https://github.com/DeepGraphLearning/torchdrug/issues/133" data-hovercard-type="issue" data-hovercard-url="/DeepGraphLearning/torchdrug/issues/133/hovercard" href="https://github.com/DeepGraphLearning/torchdrug/issues/133">#133</a>)</li> <li>Fix the version of <code>AlphaFoldDB</code> to <code>v2</code> (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="1390557601" data-permission-text="Title is private" data-url="https://github.com/DeepGraphLearning/torchdrug/issues/137" data-hovercard-type="pull_request" data-hovercard-url="/DeepGraphLearning/torchdrug/pull/137/hovercard" href="https://github.com/DeepGraphLearning/torchdrug/pull/137">#137</a>, thanks to <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/users/ShoufaChen/hovercard" data-octo-click="hovercard-link-click" data-octo-dimensions="link_type:self" href="https://github.com/ShoufaChen">@ShoufaChen</a>)</li> <li>Fix inconsistent output when using edge features in convolution layers (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="1036931298" data-permission-text="Title is private" data-url="https://github.com/DeepGraphLearning/torchdrug/issues/53" data-hovercard-type="issue" data-hovercard-url="/DeepGraphLearning/torchdrug/issues/53/hovercard" href="https://github.com/DeepGraphLearning/torchdrug/issues/53">#53</a>, <a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="1406340339" data-permission-text="Title is private" data-url="https://github.com/DeepGraphLearning/torchdrug/issues/140" data-hovercard-type="issue" data-hovercard-url="/DeepGraphLearning/torchdrug/issues/140/hovercard" href="https://github.com/DeepGraphLearning/torchdrug/issues/140">#140</a>)</li> <li>Handle side cases in property optimization (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="1338957661" data-permission-text="Title is private" data-url="https://github.com/DeepGraphLearning/torchdrug/issues/125" data-hovercard-type="pull_request" data-hovercard-url="/DeepGraphLearning/torchdrug/pull/125/hovercard" href="https://github.com/DeepGraphLearning/torchdrug/pull/125">#125</a>, thanks to <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/users/jannisborn/hovercard" data-octo-click="hovercard-link-click" data-octo-dimensions="link_type:self" href="https://github.com/jannisborn">@jannisborn</a>)</li> <li>Fix a bug when using LR schedulers (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="1422319329" data-permission-text="Title is private" data-url="https://github.com/DeepGraphLearning/torchdrug/issues/148" data-hovercard-type="issue" data-hovercard-url="/DeepGraphLearning/torchdrug/issues/148/hovercard" href="https://github.com/DeepGraphLearning/torchdrug/issues/148">#148</a>, <a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="1441738192" data-permission-text="Title is private" data-url="https://github.com/DeepGraphLearning/torchdrug/issues/152" data-hovercard-type="issue" data-hovercard-url="/DeepGraphLearning/torchdrug/issues/152/hovercard" href="https://github.com/DeepGraphLearning/torchdrug/issues/152">#152</a>)</li> <li>Fix a bug in graph construction (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="1458966260" data-permission-text="Title is private" data-url="https://github.com/DeepGraphLearning/torchdrug/issues/158" data-hovercard-type="issue" data-hovercard-url="/DeepGraphLearning/torchdrug/issues/158/hovercard" href="https://github.com/DeepGraphLearning/torchdrug/issues/158">#158</a>)</li> <li>Fix a bug in <code>layers.Set2Set</code> (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="1667409515" data-permission-text="Title is private" data-url="https://github.com/DeepGraphLearning/torchdrug/issues/185" data-hovercard-type="issue" data-hovercard-url="/DeepGraphLearning/torchdrug/issues/185/hovercard" href="https://github.com/DeepGraphLearning/torchdrug/issues/185">#185</a>)</li> <li>Avoid in-place RDKit operations in <code>data.Molecule.from_molecule</code> (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="1408261544" data-permission-text="Title is private" data-url="https://github.com/DeepGraphLearning/torchdrug/issues/142" data-hovercard-type="issue" data-hovercard-url="/DeepGraphLearning/torchdrug/issues/142/hovercard" href="https://github.com/DeepGraphLearning/torchdrug/issues/142">#142</a>)</li> <li>Fix <code>num_class</code> in PropertyPrediction (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="1408261544" data-permission-text="Title is private" data-url="https://github.com/DeepGraphLearning/torchdrug/issues/142" data-hovercard-type="issue" data-hovercard-url="/DeepGraphLearning/torchdrug/issues/142/hovercard" href="https://github.com/DeepGraphLearning/torchdrug/issues/142">#142</a>)</li> <li>Fix docker file for new CUDA image (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="1762416658" data-permission-text="Title is private" data-url="https://github.com/DeepGraphLearning/torchdrug/issues/207" data-hovercard-type="pull_request" data-hovercard-url="/DeepGraphLearning/torchdrug/pull/207/hovercard" href="https://github.com/DeepGraphLearning/torchdrug/pull/207">#207</a>, thanks to <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/users/cscandore/hovercard" data-octo-click="hovercard-link-click" data-octo-dimensions="link_type:self" href="https://github.com/cscandore">@cscandore</a>)</li> <li>Fix chain ID in <code>data.Protein.from_molecule</code></li> </ul> <h1>Deprecations</h1> <ul> <li>Deprecate <code>functional._size_to_index</code>. Use <code>torch.repeat_interleave</code> instead.</li> </ul> KiddoZhu tag:github.com,2008:Repository/394517440/v0.2.0 2022-09-19T22:34:00Z 0.2.0 Release <p>V0.2.0 is a major release with a new family member <strong>TorchProtein</strong>, a library for machine-learning-guided protein science. Aiming at simplifying the development of protein methods, TorchProtein encapsulates many complicated yet repetitive subroutines into functional modules, including widely-used datasets, flexible data processing operations, advanced encoding models, and diverse protein tasks.</p> <p>Such comprehensive encapsulation enables users to develop protein machine learning solutions with one easy-to-use library. It avoids the embarrassment of gluing multiple libraries into a pipeline.</p> <p>With TorchProtein, we can rapidly prototype machine learning solutions to various protein applications within 20 lines of codes, and conduct ablation studies by substituting different parts of the solution with off-the-shelf modules. Furthermore, we can easily adapt these modules to our own needs, and make systematic analyses by comparing the new results to a benchmark provided in the library.</p> <p>Additionally, TorchProtein is designed to be accessible to everyone. For inexperienced users, like beginners or biological researchers, TorchProtein provides <a href="https://torchdrug.ai/docs/" rel="nofollow">user-friendly APIs</a> to simplify the development of protein machine learning solutions. Meanwhile, for professional users, TorchProtein also preserves enough flexibility to satisfy their demands, supported by features like modular design of the library and on-the-fly graph construction.</p> <h1>Main Features</h1> <h2>Simplify Data Processing</h2> <ul> <li> <p>It is challenging to transform raw bioinformatic protein datasets into tensor formats for machine learning. To reduce tedious operations, TorchProtein provides us with a data structure <code>data.Protein</code> and its batched extension <code>data.PackedProtein</code> to automate the data processing step.</p> <ul> <li> <p><code>data.Protein</code> and <code>data.PackedProtein</code> automatically gather protein data from various bio-sources and seamlessly switch between data formats like pdb files, RDKit objects and sequences. Please see the section data structures and operations for transforming from and to sequences and RDKit objects.</p> <div class="highlight highlight-source-python notranslate position-relative overflow-auto" data-snippet-clipboard-copy-content="# construct a data.Protein instance from a pdb file pdb_file = ... protein = data.Protein.from_pdb(pdb_file, atom_feature=&quot;position&quot;, bond_feature=&quot;length&quot;, residue_feature=&quot;symbol&quot;) print(protein) # write a data.Protein instance back to a pdb file new_pdb_file = ... protein.to_pdb(new_pdb_file)"><pre><span class="pl-c"># construct a data.Protein instance from a pdb file</span> <span class="pl-s1">pdb_file</span> <span class="pl-c1">=</span> ... <span class="pl-s1">protein</span> <span class="pl-c1">=</span> <span class="pl-s1">data</span>.<span class="pl-c1">Protein</span>.<span class="pl-c1">from_pdb</span>(<span class="pl-s1">pdb_file</span>, <span class="pl-s1">atom_feature</span><span class="pl-c1">=</span><span class="pl-s">"position"</span>, <span class="pl-s1">bond_feature</span><span class="pl-c1">=</span><span class="pl-s">"length"</span>, <span class="pl-s1">residue_feature</span><span class="pl-c1">=</span><span class="pl-s">"symbol"</span>) <span class="pl-en">print</span>(<span class="pl-s1">protein</span>) <span class="pl-c"># write a data.Protein instance back to a pdb file</span> <span class="pl-s1">new_pdb_file</span> <span class="pl-c1">=</span> ... <span class="pl-s1">protein</span>.<span class="pl-c1">to_pdb</span>(<span class="pl-s1">new_pdb_file</span>)</pre></div> <div class="highlight highlight-source-shell notranslate position-relative overflow-auto" data-snippet-clipboard-copy-content="Protein(num_atom=445, num_bond=916, num_residue=57)"><pre>Protein(num_atom=445, num_bond=916, num_residue=57)</pre></div> </li> <li> <p><code>data.Protein</code> and <code>data.PackedProtein</code> automatically pre-process all kinds of features of atoms, bonds and residues, by simply setting up several arguments.</p> <div class="highlight highlight-source-python notranslate position-relative overflow-auto" data-snippet-clipboard-copy-content="pdb_file = ... protein = data.Protein.from_pdb(pdb_file, atom_feature=&quot;position&quot;, bond_feature=&quot;length&quot;, residue_feature=&quot;symbol&quot;) # feature print(protein.residue_feature.shape) print(protein.atom_feature.shape) print(protein.bond_feature.shape)"><pre><span class="pl-s1">pdb_file</span> <span class="pl-c1">=</span> ... <span class="pl-s1">protein</span> <span class="pl-c1">=</span> <span class="pl-s1">data</span>.<span class="pl-c1">Protein</span>.<span class="pl-c1">from_pdb</span>(<span class="pl-s1">pdb_file</span>, <span class="pl-s1">atom_feature</span><span class="pl-c1">=</span><span class="pl-s">"position"</span>, <span class="pl-s1">bond_feature</span><span class="pl-c1">=</span><span class="pl-s">"length"</span>, <span class="pl-s1">residue_feature</span><span class="pl-c1">=</span><span class="pl-s">"symbol"</span>) <span class="pl-c"># feature</span> <span class="pl-en">print</span>(<span class="pl-s1">protein</span>.<span class="pl-c1">residue_feature</span>.<span class="pl-c1">shape</span>) <span class="pl-en">print</span>(<span class="pl-s1">protein</span>.<span class="pl-c1">atom_feature</span>.<span class="pl-c1">shape</span>) <span class="pl-en">print</span>(<span class="pl-s1">protein</span>.<span class="pl-c1">bond_feature</span>.<span class="pl-c1">shape</span>)</pre></div> <div class="highlight highlight-source-shell notranslate position-relative overflow-auto" data-snippet-clipboard-copy-content="torch.Size([57, 21]) torch.Size([445, 3]) torch.Size([916, 1])"><pre>torch.Size([57, 21]) torch.Size([445, 3]) torch.Size([916, 1])</pre></div> </li> <li> <p><code>data.Protein</code> and <code>data.PackedProtein</code> automatically keeps track of numerous attributes associated with atoms, bonds, residues and the whole protein.</p> <ul> <li>For example, reference offers a way to register new attributes as node, edge or graph property, and in this way, the new attributes would automatically go along with the node, edge or graph themself. More in-built attributes are listed in the section data structures and operations.</li> </ul> <div class="highlight highlight-source-python notranslate position-relative overflow-auto" data-snippet-clipboard-copy-content="protein = ... with protein.node(): protein.node_id = torch.tensor([i for i in range(0, protein.num_node)]) with protein.edge(): protein.edge_cost = torch.rand(protein.num_edge) with protein.graph(): protein.graph_feature = torch.randn(128)"><pre><span class="pl-s1">protein</span> <span class="pl-c1">=</span> ... <span class="pl-k">with</span> <span class="pl-s1">protein</span>.<span class="pl-c1">node</span>(): <span class="pl-s1">protein</span>.<span class="pl-c1">node_id</span> <span class="pl-c1">=</span> <span class="pl-s1">torch</span>.<span class="pl-c1">tensor</span>([<span class="pl-s1">i</span> <span class="pl-k">for</span> <span class="pl-s1">i</span> <span class="pl-c1">in</span> <span class="pl-en">range</span>(<span class="pl-c1">0</span>, <span class="pl-s1">protein</span>.<span class="pl-c1">num_node</span>)]) <span class="pl-k">with</span> <span class="pl-s1">protein</span>.<span class="pl-c1">edge</span>(): <span class="pl-s1">protein</span>.<span class="pl-c1">edge_cost</span> <span class="pl-c1">=</span> <span class="pl-s1">torch</span>.<span class="pl-c1">rand</span>(<span class="pl-s1">protein</span>.<span class="pl-c1">num_edge</span>) <span class="pl-k">with</span> <span class="pl-s1">protein</span>.<span class="pl-c1">graph</span>(): <span class="pl-s1">protein</span>.<span class="pl-c1">graph_feature</span> <span class="pl-c1">=</span> <span class="pl-s1">torch</span>.<span class="pl-c1">randn</span>(<span class="pl-c1">128</span>)</pre></div> <ul> <li>Even more, reference can be utilized to maintain the correspondence between two well related objects. For example, the mapping <code>atom2residue</code> maintains relationship between atoms and residues, and enables indexing on either of them.</li> </ul> <div class="highlight highlight-source-python notranslate position-relative overflow-auto" data-snippet-clipboard-copy-content="protein = ... # create a mask indices for atoms in a glutamine (GLN) is_glutamine = protein.residue_type[protein.atom2residue] == protein.residue2id[&quot;GLN&quot;] mask_indices = is_glutamine.nonzero().squeeze(-1) print(mask_indices) # map the masked atoms back to the glutamine residue residue_type = protein.residue_type[protein.atom2residue[mask_indices]] print([protein.id2residue[r] for r in residue_type.tolist()])"><pre><span class="pl-s1">protein</span> <span class="pl-c1">=</span> ... <span class="pl-c"># create a mask indices for atoms in a glutamine (GLN)</span> <span class="pl-s1">is_glutamine</span> <span class="pl-c1">=</span> <span class="pl-s1">protein</span>.<span class="pl-c1">residue_type</span>[<span class="pl-s1">protein</span>.<span class="pl-c1">atom2residue</span>] <span class="pl-c1">==</span> <span class="pl-s1">protein</span>.<span class="pl-c1">residue2id</span>[<span class="pl-s">"GLN"</span>] <span class="pl-s1">mask_indices</span> <span class="pl-c1">=</span> <span class="pl-s1">is_glutamine</span>.<span class="pl-c1">nonzero</span>().<span class="pl-c1">squeeze</span>(<span class="pl-c1">-</span><span class="pl-c1">1</span>) <span class="pl-en">print</span>(<span class="pl-s1">mask_indices</span>) <span class="pl-c"># map the masked atoms back to the glutamine residue</span> <span class="pl-s1">residue_type</span> <span class="pl-c1">=</span> <span class="pl-s1">protein</span>.<span class="pl-c1">residue_type</span>[<span class="pl-s1">protein</span>.<span class="pl-c1">atom2residue</span>[<span class="pl-s1">mask_indices</span>]] <span class="pl-en">print</span>([<span class="pl-s1">protein</span>.<span class="pl-c1">id2residue</span>[<span class="pl-s1">r</span>] <span class="pl-k">for</span> <span class="pl-s1">r</span> <span class="pl-c1">in</span> <span class="pl-s1">residue_type</span>.<span class="pl-c1">tolist</span>()])</pre></div> <div class="highlight highlight-source-shell notranslate position-relative overflow-auto" data-snippet-clipboard-copy-content="tensor([ 26, 27, 28, 29, 30, 31, 32, 33, 34, 307, 308, 309, 310, 311, 312, 313, 314, 315, 384, 385, 386, 387, 388, 389, 390, 391, 392]) ['GLN', 'GLN', 'GLN', 'GLN', 'GLN', 'GLN', 'GLN', 'GLN', 'GLN', 'GLN', 'GLN', 'GLN', 'GLN', 'GLN', 'GLN', 'GLN', 'GLN', 'GLN', 'GLN', 'GLN', 'GLN', 'GLN', 'GLN', 'GLN', 'GLN', 'GLN', 'GLN']"><pre>tensor([ 26, 27, 28, 29, 30, 31, 32, 33, 34, 307, 308, 309, 310, 311, 312, 313, 314, 315, 384, 385, 386, 387, 388, 389, 390, 391, 392]) [<span class="pl-s"><span class="pl-pds">'</span>GLN<span class="pl-pds">'</span></span>, <span class="pl-s"><span class="pl-pds">'</span>GLN<span class="pl-pds">'</span></span>, <span class="pl-s"><span class="pl-pds">'</span>GLN<span class="pl-pds">'</span></span>, <span class="pl-s"><span class="pl-pds">'</span>GLN<span class="pl-pds">'</span></span>, <span class="pl-s"><span class="pl-pds">'</span>GLN<span class="pl-pds">'</span></span>, <span class="pl-s"><span class="pl-pds">'</span>GLN<span class="pl-pds">'</span></span>, <span class="pl-s"><span class="pl-pds">'</span>GLN<span class="pl-pds">'</span></span>, <span class="pl-s"><span class="pl-pds">'</span>GLN<span class="pl-pds">'</span></span>, <span class="pl-s"><span class="pl-pds">'</span>GLN<span class="pl-pds">'</span></span>, <span class="pl-s"><span class="pl-pds">'</span>GLN<span class="pl-pds">'</span></span>, <span class="pl-s"><span class="pl-pds">'</span>GLN<span class="pl-pds">'</span></span>, <span class="pl-s"><span class="pl-pds">'</span>GLN<span class="pl-pds">'</span></span>, <span class="pl-s"><span class="pl-pds">'</span>GLN<span class="pl-pds">'</span></span>, <span class="pl-s"><span class="pl-pds">'</span>GLN<span class="pl-pds">'</span></span>, <span class="pl-s"><span class="pl-pds">'</span>GLN<span class="pl-pds">'</span></span>, <span class="pl-s"><span class="pl-pds">'</span>GLN<span class="pl-pds">'</span></span>, <span class="pl-s"><span class="pl-pds">'</span>GLN<span class="pl-pds">'</span></span>, <span class="pl-s"><span class="pl-pds">'</span>GLN<span class="pl-pds">'</span></span>, <span class="pl-s"><span class="pl-pds">'</span>GLN<span class="pl-pds">'</span></span>, <span class="pl-s"><span class="pl-pds">'</span>GLN<span class="pl-pds">'</span></span>, <span class="pl-s"><span class="pl-pds">'</span>GLN<span class="pl-pds">'</span></span>, <span class="pl-s"><span class="pl-pds">'</span>GLN<span class="pl-pds">'</span></span>, <span class="pl-s"><span class="pl-pds">'</span>GLN<span class="pl-pds">'</span></span>, <span class="pl-s"><span class="pl-pds">'</span>GLN<span class="pl-pds">'</span></span>, <span class="pl-s"><span class="pl-pds">'</span>GLN<span class="pl-pds">'</span></span>, <span class="pl-s"><span class="pl-pds">'</span>GLN<span class="pl-pds">'</span></span>, <span class="pl-s"><span class="pl-pds">'</span>GLN<span class="pl-pds">'</span></span>]</pre></div> </li> </ul> </li> <li> <p>It is useful to augment protein data by modifying protein graphs or constructing new ones. With the protein operations and the graph construction layers provided in TorchProtein,</p> <ul> <li> <p>we can easily modify proteins on the fly by batching, slicing sequences, masking out side chains, etc. Please see the tutorials for more details on masking.</p> <div class="highlight highlight-source-python notranslate position-relative overflow-auto" data-snippet-clipboard-copy-content="pdb_file = ... protein = data.Protein.from_pdb(pdb_file, atom_feature=&quot;position&quot;, bond_feature=&quot;length&quot;, residue_feature=&quot;symbol&quot;) # batch proteins = data.Protein.pack([protein, protein, protein]) # slice sequences # use indexing to extract consecutive residues of a particular protein two_residues = protein[[0,2]] two_residues.visualize()"><pre><span class="pl-s1">pdb_file</span> <span class="pl-c1">=</span> ... <span class="pl-s1">protein</span> <span class="pl-c1">=</span> <span class="pl-s1">data</span>.<span class="pl-c1">Protein</span>.<span class="pl-c1">from_pdb</span>(<span class="pl-s1">pdb_file</span>, <span class="pl-s1">atom_feature</span><span class="pl-c1">=</span><span class="pl-s">"position"</span>, <span class="pl-s1">bond_feature</span><span class="pl-c1">=</span><span class="pl-s">"length"</span>, <span class="pl-s1">residue_feature</span><span class="pl-c1">=</span><span class="pl-s">"symbol"</span>) <span class="pl-c"># batch</span> <span class="pl-s1">proteins</span> <span class="pl-c1">=</span> <span class="pl-s1">data</span>.<span class="pl-c1">Protein</span>.<span class="pl-c1">pack</span>([<span class="pl-s1">protein</span>, <span class="pl-s1">protein</span>, <span class="pl-s1">protein</span>]) <span class="pl-c"># slice sequences</span> <span class="pl-c"># use indexing to extract consecutive residues of a particular protein</span> <span class="pl-s1">two_residues</span> <span class="pl-c1">=</span> <span class="pl-s1">protein</span>[[<span class="pl-c1">0</span>,<span class="pl-c1">2</span>]] <span class="pl-s1">two_residues</span>.<span class="pl-c1">visualize</span>()</pre></div> <p><a target="_blank" rel="noopener noreferrer" href="/DeepGraphLearning/torchdrug/blob/v0.2.0/asset/graph/residues.png"><img src="/DeepGraphLearning/torchdrug/raw/v0.2.0/asset/graph/residues.png" alt="two residues" style="max-width: 100%;"></a></p> </li> <li> <p>we can construct protein graphs on the fly with GPU acceleration, which offers users flexible choices rather than using fixed pre-processed graphs. Below is an example to build a graph with only alpha carbon atoms, please check tutorials for more cases, such as adding spatial / KNN / sequential edges.</p> <div class="highlight highlight-source-python notranslate position-relative overflow-auto" data-snippet-clipboard-copy-content="protein = ... # transfer from CPU to GPU protein = protein.cuda() print(protein) # build a graph with only alpha carbon (CA) atoms node_layers = [geometry.AlphaCarbonNode()] graph_construction_model = layers.GraphConstruction(node_layers=node_layers) original_protein = data.Protein.pack([protein]) CA_protein = graph_construction_model(_protein) print(&quot;Graph before:&quot;, original_protein) print(&quot;Graph after:&quot;, CA_protein)"><pre><span class="pl-s1">protein</span> <span class="pl-c1">=</span> ... <span class="pl-c"># transfer from CPU to GPU</span> <span class="pl-s1">protein</span> <span class="pl-c1">=</span> <span class="pl-s1">protein</span>.<span class="pl-c1">cuda</span>() <span class="pl-en">print</span>(<span class="pl-s1">protein</span>) <span class="pl-c"># build a graph with only alpha carbon (CA) atoms</span> <span class="pl-s1">node_layers</span> <span class="pl-c1">=</span> [<span class="pl-s1">geometry</span>.<span class="pl-c1">AlphaCarbonNode</span>()] <span class="pl-s1">graph_construction_model</span> <span class="pl-c1">=</span> <span class="pl-s1">layers</span>.<span class="pl-c1">GraphConstruction</span>(<span class="pl-s1">node_layers</span><span class="pl-c1">=</span><span class="pl-s1">node_layers</span>) <span class="pl-s1">original_protein</span> <span class="pl-c1">=</span> <span class="pl-s1">data</span>.<span class="pl-c1">Protein</span>.<span class="pl-c1">pack</span>([<span class="pl-s1">protein</span>]) <span class="pl-v">CA_protein</span> <span class="pl-c1">=</span> <span class="pl-en">graph_construction_model</span>(<span class="pl-s1">_protein</span>) <span class="pl-en">print</span>(<span class="pl-s">"Graph before:"</span>, <span class="pl-s1">original_protein</span>) <span class="pl-en">print</span>(<span class="pl-s">"Graph after:"</span>, <span class="pl-v">CA_protein</span>)</pre></div> <div class="highlight highlight-source-shell notranslate position-relative overflow-auto" data-snippet-clipboard-copy-content="Protein(num_atom=445, num_bond=916, num_residue=57, device='cuda:0') Graph before: PackedProtein(batch_size=1, num_atoms=[2639], num_bonds=[5368], num_residues=[350]) Graph after: PackedProtein(batch_size=1, num_atoms=[350], num_bonds=[0], num_residues=[350])"><pre>Protein(num_atom=445, num_bond=916, num_residue=57, device=<span class="pl-s"><span class="pl-pds">'</span>cuda:0<span class="pl-pds">'</span></span>) Graph before: PackedProtein(batch_size=1, num_atoms=[2639], num_bonds=[5368], num_residues=[350]) Graph after: PackedProtein(batch_size=1, num_atoms=[350], num_bonds=[0], num_residues=[350])</pre></div> </li> </ul> </li> </ul> <h2>Easy to Prototype Solutions</h2> <p>With TorchProtein, common protein tasks can be finished within 20 lines of codes, such as <strong>sequence-based protein property prediction task</strong>. Below is an example and more examples of different popular protein tasks and models can be found in Protein Tasks, Models and Tutorials.</p> <div class="highlight highlight-source-python notranslate position-relative overflow-auto" data-snippet-clipboard-copy-content="import torch from torchdrug import datasets, transforms, models, tasks, core truncate_transform = transforms.TruncateProtein(max_length=200, random=False) protein_view_transform = transforms.ProteinView(view=&quot;residue&quot;) transform = transforms.Compose([truncate_transform, protein_view_transform]) dataset = datasets.BetaLactamase(&quot;~/protein-datasets/&quot;, residue_only=True, transform=transform) train_set, valid_set, test_set = dataset.split() model = models.ProteinCNN(input_dim=21, hidden_dims=[1024, 1024], kernel_size=5, padding=2, readout=&quot;max&quot;) task = tasks.PropertyPrediction(model, task=dataset.tasks, criterion=&quot;mse&quot;, metric=(&quot;mae&quot;, &quot;rmse&quot;, &quot;spearmanr&quot;), normalization=False, num_mlp_layer=2) optimizer = torch.optim.Adam(task.parameters(), lr=1e-4) solver = core.Engine(task, train_set, valid_set, test_set, optimizer, gpus=[0], batch_size=64) solver.train(num_epoch=10) solver.evaluate(&quot;valid&quot;)"><pre><span class="pl-k">import</span> <span class="pl-s1">torch</span> <span class="pl-k">from</span> <span class="pl-s1">torchdrug</span> <span class="pl-k">import</span> <span class="pl-s1">datasets</span>, <span class="pl-s1">transforms</span>, <span class="pl-s1">models</span>, <span class="pl-s1">tasks</span>, <span class="pl-s1">core</span> <span class="pl-s1">truncate_transform</span> <span class="pl-c1">=</span> <span class="pl-s1">transforms</span>.<span class="pl-c1">TruncateProtein</span>(<span class="pl-s1">max_length</span><span class="pl-c1">=</span><span class="pl-c1">200</span>, <span class="pl-s1">random</span><span class="pl-c1">=</span><span class="pl-c1">False</span>) <span class="pl-s1">protein_view_transform</span> <span class="pl-c1">=</span> <span class="pl-s1">transforms</span>.<span class="pl-c1">ProteinView</span>(<span class="pl-s1">view</span><span class="pl-c1">=</span><span class="pl-s">"residue"</span>) <span class="pl-s1">transform</span> <span class="pl-c1">=</span> <span class="pl-s1">transforms</span>.<span class="pl-c1">Compose</span>([<span class="pl-s1">truncate_transform</span>, <span class="pl-s1">protein_view_transform</span>]) <span class="pl-s1">dataset</span> <span class="pl-c1">=</span> <span class="pl-s1">datasets</span>.<span class="pl-c1">BetaLactamase</span>(<span class="pl-s">"~/protein-datasets/"</span>, <span class="pl-s1">residue_only</span><span class="pl-c1">=</span><span class="pl-c1">True</span>, <span class="pl-s1">transform</span><span class="pl-c1">=</span><span class="pl-s1">transform</span>) <span class="pl-s1">train_set</span>, <span class="pl-s1">valid_set</span>, <span class="pl-s1">test_set</span> <span class="pl-c1">=</span> <span class="pl-s1">dataset</span>.<span class="pl-c1">split</span>() <span class="pl-s1">model</span> <span class="pl-c1">=</span> <span class="pl-s1">models</span>.<span class="pl-c1">ProteinCNN</span>(<span class="pl-s1">input_dim</span><span class="pl-c1">=</span><span class="pl-c1">21</span>, <span class="pl-s1">hidden_dims</span><span class="pl-c1">=</span>[<span class="pl-c1">1024</span>, <span class="pl-c1">1024</span>], <span class="pl-s1">kernel_size</span><span class="pl-c1">=</span><span class="pl-c1">5</span>, <span class="pl-s1">padding</span><span class="pl-c1">=</span><span class="pl-c1">2</span>, <span class="pl-s1">readout</span><span class="pl-c1">=</span><span class="pl-s">"max"</span>) <span class="pl-s1">task</span> <span class="pl-c1">=</span> <span class="pl-s1">tasks</span>.<span class="pl-c1">PropertyPrediction</span>(<span class="pl-s1">model</span>, <span class="pl-s1">task</span><span class="pl-c1">=</span><span class="pl-s1">dataset</span>.<span class="pl-c1">tasks</span>, <span class="pl-s1">criterion</span><span class="pl-c1">=</span><span class="pl-s">"mse"</span>, <span class="pl-s1">metric</span><span class="pl-c1">=</span>(<span class="pl-s">"mae"</span>, <span class="pl-s">"rmse"</span>, <span class="pl-s">"spearmanr"</span>), <span class="pl-s1">normalization</span><span class="pl-c1">=</span><span class="pl-c1">False</span>, <span class="pl-s1">num_mlp_layer</span><span class="pl-c1">=</span><span class="pl-c1">2</span>) <span class="pl-s1">optimizer</span> <span class="pl-c1">=</span> <span class="pl-s1">torch</span>.<span class="pl-c1">optim</span>.<span class="pl-c1">Adam</span>(<span class="pl-s1">task</span>.<span class="pl-c1">parameters</span>(), <span class="pl-s1">lr</span><span class="pl-c1">=</span><span class="pl-c1">1e-4</span>) <span class="pl-s1">solver</span> <span class="pl-c1">=</span> <span class="pl-s1">core</span>.<span class="pl-c1">Engine</span>(<span class="pl-s1">task</span>, <span class="pl-s1">train_set</span>, <span class="pl-s1">valid_set</span>, <span class="pl-s1">test_set</span>, <span class="pl-s1">optimizer</span>, <span class="pl-s1">gpus</span><span class="pl-c1">=</span>[<span class="pl-c1">0</span>], <span class="pl-s1">batch_size</span><span class="pl-c1">=</span><span class="pl-c1">64</span>) <span class="pl-s1">solver</span>.<span class="pl-c1">train</span>(<span class="pl-s1">num_epoch</span><span class="pl-c1">=</span><span class="pl-c1">10</span>) <span class="pl-s1">solver</span>.<span class="pl-c1">evaluate</span>(<span class="pl-s">"valid"</span>)</pre></div> <div class="highlight highlight-source-shell notranslate position-relative overflow-auto" data-snippet-clipboard-copy-content="mean absolute error [scaled_effect1]: 0.249482 root mean squared error [scaled_effect1]: 0.304326 spearmanr [scaled_effect1]: 0.44572"><pre>mean absolute error [scaled_effect1]: 0.249482 root mean squared error [scaled_effect1]: 0.304326 spearmanr [scaled_effect1]: 0.44572</pre></div> <h2>Compatible with Existing Molecular Models in TorchDrug</h2> <ul> <li> <p>TorchProtein follows the scientific fact that proteins are macromolecules. The core data structures <code>data.Protein</code> and <code>data.PackedProtein</code> inherit from <code>data.Molecule</code> and <code>data.PackedMolecule</code> respectively. Therefore, we can apply any existing molecule model in TorchDrug to proteins</p> <div class="highlight highlight-source-python notranslate position-relative overflow-auto" data-snippet-clipboard-copy-content="import torch from torchdrug import layers, datasets, transforms, models, tasks, core from torchdrug.layers import geometry truncate_transform = transforms.TruncateProtein(max_length=200, random=False) protein_view_transform = transforms.ProteinView(view=&quot;residue&quot;) transform = transforms.Compose([truncate_transform, protein_view_transform]) dataset = datasets.EnzymeCommission(&quot;~/protein-datasets/&quot;, transform=transform) train_set, valid_set, test_set = dataset.split() model = models.GIN(input_dim=21, hidden_dims=[256, 256, 256, 256], batch_norm=True, short_cut=True, concat_hidden=True) graph_construction_model = layers.GraphConstruction( node_layers=[geometry.AlphaCarbonNode()], edge_layers=[geometry.SpatialEdge(radius=10.0, min_distance=5), geometry.KNNEdge(k=10, min_distance=5), geometry.SequentialEdge(max_distance=2)], edge_feature=&quot;residue_type&quot; ) task = tasks.MultipleBinaryClassification(model, graph_construction_model=graph_construction_model, num_mlp_layer=3, task=list(range(len(dataset.tasks))), criterion=&quot;bce&quot;, metric=(&quot;auprc@micro&quot;, &quot;f1_max&quot;)) optimizer = torch.optim.Adam(task.parameters(), lr=1e-4) solver = core.Engine(task, train_set, valid_set, test_set, optimizer, gpus=[0], batch_size=4) solver.train(num_epoch=10) solver.evaluate(&quot;valid&quot;)"><pre><span class="pl-k">import</span> <span class="pl-s1">torch</span> <span class="pl-k">from</span> <span class="pl-s1">torchdrug</span> <span class="pl-k">import</span> <span class="pl-s1">layers</span>, <span class="pl-s1">datasets</span>, <span class="pl-s1">transforms</span>, <span class="pl-s1">models</span>, <span class="pl-s1">tasks</span>, <span class="pl-s1">core</span> <span class="pl-k">from</span> <span class="pl-s1">torchdrug</span>.<span class="pl-s1">layers</span> <span class="pl-k">import</span> <span class="pl-s1">geometry</span> <span class="pl-s1">truncate_transform</span> <span class="pl-c1">=</span> <span class="pl-s1">transforms</span>.<span class="pl-c1">TruncateProtein</span>(<span class="pl-s1">max_length</span><span class="pl-c1">=</span><span class="pl-c1">200</span>, <span class="pl-s1">random</span><span class="pl-c1">=</span><span class="pl-c1">False</span>) <span class="pl-s1">protein_view_transform</span> <span class="pl-c1">=</span> <span class="pl-s1">transforms</span>.<span class="pl-c1">ProteinView</span>(<span class="pl-s1">view</span><span class="pl-c1">=</span><span class="pl-s">"residue"</span>) <span class="pl-s1">transform</span> <span class="pl-c1">=</span> <span class="pl-s1">transforms</span>.<span class="pl-c1">Compose</span>([<span class="pl-s1">truncate_transform</span>, <span class="pl-s1">protein_view_transform</span>]) <span class="pl-s1">dataset</span> <span class="pl-c1">=</span> <span class="pl-s1">datasets</span>.<span class="pl-c1">EnzymeCommission</span>(<span class="pl-s">"~/protein-datasets/"</span>, <span class="pl-s1">transform</span><span class="pl-c1">=</span><span class="pl-s1">transform</span>) <span class="pl-s1">train_set</span>, <span class="pl-s1">valid_set</span>, <span class="pl-s1">test_set</span> <span class="pl-c1">=</span> <span class="pl-s1">dataset</span>.<span class="pl-c1">split</span>() <span class="pl-s1">model</span> <span class="pl-c1">=</span> <span class="pl-s1">models</span>.<span class="pl-c1">GIN</span>(<span class="pl-s1">input_dim</span><span class="pl-c1">=</span><span class="pl-c1">21</span>, <span class="pl-s1">hidden_dims</span><span class="pl-c1">=</span>[<span class="pl-c1">256</span>, <span class="pl-c1">256</span>, <span class="pl-c1">256</span>, <span class="pl-c1">256</span>], <span class="pl-s1">batch_norm</span><span class="pl-c1">=</span><span class="pl-c1">True</span>, <span class="pl-s1">short_cut</span><span class="pl-c1">=</span><span class="pl-c1">True</span>, <span class="pl-s1">concat_hidden</span><span class="pl-c1">=</span><span class="pl-c1">True</span>) <span class="pl-s1">graph_construction_model</span> <span class="pl-c1">=</span> <span class="pl-s1">layers</span>.<span class="pl-c1">GraphConstruction</span>( <span class="pl-s1">node_layers</span><span class="pl-c1">=</span>[<span class="pl-s1">geometry</span>.<span class="pl-c1">AlphaCarbonNode</span>()], <span class="pl-s1">edge_layers</span><span class="pl-c1">=</span>[<span class="pl-s1">geometry</span>.<span class="pl-c1">SpatialEdge</span>(<span class="pl-s1">radius</span><span class="pl-c1">=</span><span class="pl-c1">10.0</span>, <span class="pl-s1">min_distance</span><span class="pl-c1">=</span><span class="pl-c1">5</span>), <span class="pl-s1">geometry</span>.<span class="pl-c1">KNNEdge</span>(<span class="pl-s1">k</span><span class="pl-c1">=</span><span class="pl-c1">10</span>, <span class="pl-s1">min_distance</span><span class="pl-c1">=</span><span class="pl-c1">5</span>), <span class="pl-s1">geometry</span>.<span class="pl-c1">SequentialEdge</span>(<span class="pl-s1">max_distance</span><span class="pl-c1">=</span><span class="pl-c1">2</span>)], <span class="pl-s1">edge_feature</span><span class="pl-c1">=</span><span class="pl-s">"residue_type"</span> ) <span class="pl-s1">task</span> <span class="pl-c1">=</span> <span class="pl-s1">tasks</span>.<span class="pl-c1">MultipleBinaryClassification</span>(<span class="pl-s1">model</span>, <span class="pl-s1">graph_construction_model</span><span class="pl-c1">=</span><span class="pl-s1">graph_construction_model</span>, <span class="pl-s1">num_mlp_layer</span><span class="pl-c1">=</span><span class="pl-c1">3</span>, <span class="pl-s1">task</span><span class="pl-c1">=</span><span class="pl-en">list</span>(<span class="pl-en">range</span>(<span class="pl-en">len</span>(<span class="pl-s1">dataset</span>.<span class="pl-c1">tasks</span>))), <span class="pl-s1">criterion</span><span class="pl-c1">=</span><span class="pl-s">"bce"</span>, <span class="pl-s1">metric</span><span class="pl-c1">=</span>(<span class="pl-s">"auprc@micro"</span>, <span class="pl-s">"f1_max"</span>)) <span class="pl-s1">optimizer</span> <span class="pl-c1">=</span> <span class="pl-s1">torch</span>.<span class="pl-c1">optim</span>.<span class="pl-c1">Adam</span>(<span class="pl-s1">task</span>.<span class="pl-c1">parameters</span>(), <span class="pl-s1">lr</span><span class="pl-c1">=</span><span class="pl-c1">1e-4</span>) <span class="pl-s1">solver</span> <span class="pl-c1">=</span> <span class="pl-s1">core</span>.<span class="pl-c1">Engine</span>(<span class="pl-s1">task</span>, <span class="pl-s1">train_set</span>, <span class="pl-s1">valid_set</span>, <span class="pl-s1">test_set</span>, <span class="pl-s1">optimizer</span>, <span class="pl-s1">gpus</span><span class="pl-c1">=</span>[<span class="pl-c1">0</span>], <span class="pl-s1">batch_size</span><span class="pl-c1">=</span><span class="pl-c1">4</span>) <span class="pl-s1">solver</span>.<span class="pl-c1">train</span>(<span class="pl-s1">num_epoch</span><span class="pl-c1">=</span><span class="pl-c1">10</span>) <span class="pl-s1">solver</span>.<span class="pl-c1">evaluate</span>(<span class="pl-s">"valid"</span>)</pre></div> <div class="highlight highlight-source-shell notranslate position-relative overflow-auto" data-snippet-clipboard-copy-content="auprc@micro: 0.187884 f1_max: 0.231008"><pre>auprc@micro: 0.187884 f1_max: 0.231008</pre></div> </li> <li> <p>In <strong>Protein-Ligand Interaction (PLI) prediction task</strong>, we can utilize a molecular encoder module to extract the representations of molecules. Please check <a href="https://torchprotein.ai/tutorial_2" rel="nofollow">tutorial 2</a> for more details.</p> <div class="highlight highlight-source-python notranslate position-relative overflow-auto" data-snippet-clipboard-copy-content="train_set, valid_set, test_set = ... # protein encoder model = models.ProteinCNN(input_dim=21, hidden_dims=[1024, 1024], kernel_size=5, padding=2, readout=&quot;max&quot;) # molecule encoder model2 = models.GIN(input_dim=66, hidden_dims=[256, 256, 256, 256], batch_norm=True, short_cut=True, concat_hidden=True) task = tasks.InteractionPrediction(model, model2=model2, task=dataset.tasks, criterion=&quot;mse&quot;, metric=(&quot;mae&quot;, &quot;rmse&quot;, &quot;spearmanr&quot;), normalization=False, num_mlp_layer=2) optimizer = torch.optim.Adam(task.parameters(), lr=1e-4) solver = core.Engine(task, train_set, valid_set, test_set, optimizer, gpus=[0], batch_size=16) solver.train(num_epoch=5) solver.evaluate(&quot;valid&quot;)"><pre><span class="pl-s1">train_set</span>, <span class="pl-s1">valid_set</span>, <span class="pl-s1">test_set</span> <span class="pl-c1">=</span> ... <span class="pl-c"># protein encoder</span> <span class="pl-s1">model</span> <span class="pl-c1">=</span> <span class="pl-s1">models</span>.<span class="pl-c1">ProteinCNN</span>(<span class="pl-s1">input_dim</span><span class="pl-c1">=</span><span class="pl-c1">21</span>, <span class="pl-s1">hidden_dims</span><span class="pl-c1">=</span>[<span class="pl-c1">1024</span>, <span class="pl-c1">1024</span>], <span class="pl-s1">kernel_size</span><span class="pl-c1">=</span><span class="pl-c1">5</span>, <span class="pl-s1">padding</span><span class="pl-c1">=</span><span class="pl-c1">2</span>, <span class="pl-s1">readout</span><span class="pl-c1">=</span><span class="pl-s">"max"</span>) <span class="pl-c"># molecule encoder</span> <span class="pl-s1">model2</span> <span class="pl-c1">=</span> <span class="pl-s1">models</span>.<span class="pl-c1">GIN</span>(<span class="pl-s1">input_dim</span><span class="pl-c1">=</span><span class="pl-c1">66</span>, <span class="pl-s1">hidden_dims</span><span class="pl-c1">=</span>[<span class="pl-c1">256</span>, <span class="pl-c1">256</span>, <span class="pl-c1">256</span>, <span class="pl-c1">256</span>], <span class="pl-s1">batch_norm</span><span class="pl-c1">=</span><span class="pl-c1">True</span>, <span class="pl-s1">short_cut</span><span class="pl-c1">=</span><span class="pl-c1">True</span>, <span class="pl-s1">concat_hidden</span><span class="pl-c1">=</span><span class="pl-c1">True</span>) <span class="pl-s1">task</span> <span class="pl-c1">=</span> <span class="pl-s1">tasks</span>.<span class="pl-c1">InteractionPrediction</span>(<span class="pl-s1">model</span>, <span class="pl-s1">model2</span><span class="pl-c1">=</span><span class="pl-s1">model2</span>, <span class="pl-s1">task</span><span class="pl-c1">=</span><span class="pl-s1">dataset</span>.<span class="pl-c1">tasks</span>, <span class="pl-s1">criterion</span><span class="pl-c1">=</span><span class="pl-s">"mse"</span>, <span class="pl-s1">metric</span><span class="pl-c1">=</span>(<span class="pl-s">"mae"</span>, <span class="pl-s">"rmse"</span>, <span class="pl-s">"spearmanr"</span>), <span class="pl-s1">normalization</span><span class="pl-c1">=</span><span class="pl-c1">False</span>, <span class="pl-s1">num_mlp_layer</span><span class="pl-c1">=</span><span class="pl-c1">2</span>) <span class="pl-s1">optimizer</span> <span class="pl-c1">=</span> <span class="pl-s1">torch</span>.<span class="pl-c1">optim</span>.<span class="pl-c1">Adam</span>(<span class="pl-s1">task</span>.<span class="pl-c1">parameters</span>(), <span class="pl-s1">lr</span><span class="pl-c1">=</span><span class="pl-c1">1e-4</span>) <span class="pl-s1">solver</span> <span class="pl-c1">=</span> <span class="pl-s1">core</span>.<span class="pl-c1">Engine</span>(<span class="pl-s1">task</span>, <span class="pl-s1">train_set</span>, <span class="pl-s1">valid_set</span>, <span class="pl-s1">test_set</span>, <span class="pl-s1">optimizer</span>, <span class="pl-s1">gpus</span><span class="pl-c1">=</span>[<span class="pl-c1">0</span>], <span class="pl-s1">batch_size</span><span class="pl-c1">=</span><span class="pl-c1">16</span>) <span class="pl-s1">solver</span>.<span class="pl-c1">train</span>(<span class="pl-s1">num_epoch</span><span class="pl-c1">=</span><span class="pl-c1">5</span>) <span class="pl-s1">solver</span>.<span class="pl-c1">evaluate</span>(<span class="pl-s">"valid"</span>)</pre></div> <div class="highlight highlight-source-shell notranslate position-relative overflow-auto" data-snippet-clipboard-copy-content="mean absolute error [scaled_effect1]: 0.249482 root mean squared error [scaled_effect1]: 0.304326 spearmanr [scaled_effect1]: 0.44572"><pre>mean absolute error [scaled_effect1]: 0.249482 root mean squared error [scaled_effect1]: 0.304326 spearmanr [scaled_effect1]: 0.44572</pre></div> </li> </ul> <h2>Support From the Developer (<a class="team-mention js-team-mention notranslate" data-error-text="Failed to load team members" data-id="5023292" data-permission-text="Team members are private" data-url="/orgs/DeepGraphLearning/teams/torchdrug-maintainers/members" data-hovercard-type="team" data-hovercard-url="/orgs/DeepGraphLearning/teams/torchdrug-maintainers/hovercard" href="https://github.com/orgs/DeepGraphLearning/teams/torchdrug-maintainers">@DeepGraphLearning/torchdrug-maintainers</a>)</h2> <p>There is always an active supporting team to answer questions and provide helps. Feedbacks of use experience and contributions for development are welcomed.</p> <h1>New Modules</h1> <h2>Data Structures and Operations</h2> <h3><code>data.Protein</code></h3> <ul> <li>Representative attributes: <ul> <li><code>data.Protein.edge_list</code>: list of edges and each edge is represented by a tuple (node_in, node_out, bond_type)</li> <li><code>data.Protein.atom_type</code>: atom types</li> <li><code>data.Protein.bond_type</code>: bond types</li> <li><code>data.Protein.residue_type</code>: residue types</li> <li><code>data.Protein.view</code>: default view for this protein. Can be “atom” or “residue”</li> <li><code>data.Protein.atom_name</code>: atom names in each residue</li> <li><code>data.Protein.atom2residue</code>: atom id to residue id mapping</li> <li><code>data.Protein.is_hetero_atom</code>: hetero atom indicator</li> <li><code>data.Protein.occupancy</code>: protein occupancy</li> <li><code>data.Protein.b_factor</code>: temperature factors</li> <li><code>data.Protein.residue_number</code>: residue numbers</li> <li><code>data.Protein.insertion_code</code>: insertion codes</li> <li><code>data.Protein.chain_id</code>: chain ids</li> </ul> </li> <li>Representative Methods: <ul> <li><code>data.Protein.from_molecule</code>: create a protein from an RDKit object.</li> <li><code>data.Protein.from_sequence</code>: create a protein from a sequence.</li> <li><code>data.Protein.from_sequence_fast</code>: a faster version of creating a protein from a sequence.</li> <li><code>data.Protein.from_pdb</code>: create a protein from a PDB file.</li> <li><code>data.Protein.to_molecule</code>: return an RDKit object of this protein.</li> <li><code>data.Protein.to_sequence</code>: return a sequence of this protein.</li> <li><code>data.Protein.to_pdb</code>: write this protein to a pdb file.</li> <li><code>data.Protein.split</code>: split this protein graph into multiple disconnected protein graphs.</li> <li><code>data.Protein.pack</code>: batch a list of <code>data.Protein</code> into <code>data.PackedProtein</code>.</li> <li><code>data.Protein.repeat</code>: repeat this protein.</li> <li><code>data.Protein.residue2atom</code>: map residue id to atom ids.</li> <li><code>data.Protein.residue_mask</code>: return a masked protein based on the specified residues.</li> <li><code>data.Protein.subresidue</code>: return a subgraph based on the specified residues.</li> <li><code>data.Protein.residue2graph</code>: residue id to protein id mapping.</li> <li><code>data.Protein.node_mask</code>: return a masked protein based on the specified nodes.</li> <li><code>data.Protein.edge_mask</code>: return a masked protein based on the specified edges.</li> <li><code>data.Protein.compact</code>: remove isolated nodes and compact node ids.</li> </ul> </li> </ul> <h3><code>data.PackedProtein</code></h3> <ul> <li>Representative attributes: <ul> <li><code>data.PackedProtein.edge_list</code>: list of edges and each edge is represented by a tuple (node_in, node_out, bond_type)</li> <li><code>data.PackedProtein.atom_type</code>: atom types</li> <li><code>data.PackedProtein.bond_type</code>: bond types</li> <li><code>data.PackedProtein.residue_type</code>: residue types</li> <li><code>data.PackedProtein.view</code>: default view for this protein. Can be “atom” or “residue”</li> <li><code>data.PackedProtein.num_nodes</code>: number of nodes in each protein graph</li> <li><code>data.PackedProtein.num_edges</code>: number of edges in each protein graph</li> <li><code>data.PackedProtein.num_residues</code>: number of residues in each protein graph</li> <li><code>data.PackedProtein.offsets</code>: node id offsets in different proteins</li> </ul> </li> <li>Representative methods: <ul> <li><code>data.PackedProtein.node_mask</code>: return a masked packed protein based on the specified nodes.</li> <li><code>data.PackedProtein.edge_mask</code>: return a masked packed protein based on the specified edges.</li> <li><code>data.PackedProtein.residue_mask</code>: return a masked packed protein based on the specified residues.</li> <li><code>data.PackedProtein.graph_mask</code>: return a masked packed protein based on the specified protein graphs.</li> <li><code>data.PackedProtein.from_molecule</code>: create a protein from a list of RDKit objects.</li> <li><code>data.PackedProtein.from_sequence</code>: create a protein from a list of sequences.</li> <li><code>data.PackedProtein.from_sequence_fast</code>: a faster version of creating a protein from a list of sequences.</li> <li><code>data.PackedProtein.from_pdb</code>: create a protein from a list of PDB files.</li> <li><code>data.PackedProtein.to_molecule</code>: return a list of RDKit objects of this packed protein.</li> <li><code>data.PackedProtein.to_sequence</code>: return a list of sequences of this packed protein.</li> <li><code>data.PackedProtein.to_pdb</code>: write this packed protein to a list of pdb files.</li> <li><code>data.PackedProtein.merge</code>: merge multiple packed proteins into a single packed protein.</li> <li><code>data.PackedProtein.repeat</code>: repeat this packed protein.</li> <li><code>data.PackedProtein.repeat_interleave</code>: repeat this packed protein, behaving similarly to <code>torch.repeat_interleave</code>_.</li> <li><code>data.PackedProtein.residue2graph</code>: residue id to graph id mapping.</li> </ul> </li> </ul> <h2>Models</h2> <ul> <li><code>GearNet</code>: Geometry Aware Relational Graph Neural Network.</li> <li><code>ESM</code>: Evolutionary Scale Modeling (ESM).</li> <li><code>ProteinCNN</code>: protein shallow CNN.</li> <li><code>ProteinResNet</code>: protein ResNet.</li> <li><code>ProteinLSTM</code>: protein LSTM.</li> <li><code>ProteinBERT</code>: protein BERT.</li> <li><code>Statistic</code>: the statistic feature engineering for protein sequence.</li> <li><code>Physicochemical</code>: the physicochemical feature engineering for protein sequence.</li> </ul> <h2>Protein Tasks</h2> <h3>Sequence-based Protein Property Prediction:</h3> <ul> <li><code>tasks.PropertyPrediction</code> predicts some property of each protein, such as Beta-lactamase activity, stability and solubility for proteins.</li> <li><code>tasks.NodePropertyPrediction</code> predicts some property of each residue in proteins, such as the secondary structure (coil, strand or helix) of each residue.</li> <li><code>tasks.ContactPrediction</code> predicts whether any pair of residues contact or not in the folded structure.</li> <li><code>tasks.InteractionPrediction</code> predicts the binding affinity of two interacting proteins or of a protein and a ligand, <em>i.e.</em> performing PPI affinity prediction or PLI affinity prediction.</li> </ul> <h3>Structure-based Protein Property Prediction:</h3> <ul> <li><code>tasks.MultipleBinaryClassification</code> predicts whether a protein owns several specific functions or not with binary labels.</li> </ul> <h3>Pre-trained Protein Structure Representations:</h3> <ul> <li><strong>Self-Supervised Protein Structure Pre-training</strong>: acquires informative protein representations from massive unlabeled protein structures, such as <code>tasks.EdgePrediction</code>, <code>tasks.AttributeMasking</code>, <code>tasks.ContextPrediction</code>, <code>tasks.DistancePrediction</code>, <code>tasks.AnglePrediction</code>, <code>tasks.DihedralPrediction</code> .</li> <li><strong>Fine-tuning on Downstream Task</strong>: fine-tunes the pre-trained protein encoder on downstream tasks, such as any property prediction task mentioned above.</li> </ul> <h2>Protein Datasets</h2> <h3>Protein Property Prediction Datasets</h3> <ul> <li><code>BetaLactamase</code> : protein sequences with activity labels</li> <li><code>Fluorescence</code>: protein sequences with fitness labels</li> <li><code>Stability</code>: protein sequences with stability labels</li> <li><code>Solubility</code>: protein sequences with solubility labels</li> <li><code>BinaryLocalization</code>: protein sequences with membrane-bound or soluble labels</li> <li><code>SubcellularLocalization</code>: protein sequences with natural cell location labels</li> <li><code>EnzymeCommission</code>: protein sequences and 3D structures with EC number labels for catalysis in biochemical reactions</li> <li><code>GeneOntology</code>: protein sequences and 3D structures with GO term labels, including molecular function (MF), biological process (BP) and cellular component (CC)</li> <li><code>AlphaFoldDB</code>: protein sequences and 3D structures predicted by AlphaFold</li> </ul> <h3>Protein Structure Prediction Datasets</h3> <ul> <li><code>Fold</code>: protein sequences and 3D structures with fold labels determined by the global structural topology</li> <li><code>SecondaryStructure</code>: protein sequences and 3D structures with secondary structure labels determined by the local structures</li> <li><code>ProteinNet</code>: protein sequences and 3D structures for the contact prediction task</li> </ul> <h3>Protein-Protein Interaction Prediction Datasets</h3> <ul> <li> <code>HumanPPI</code>: protein sequences with binary interaction labels for human proteins</li> <li> <code>YeastPPI</code>: protein sequences with binary interaction labels for yeast proteins</li> <li> <code>PPIAffinity</code>: protein sequences with binding affinity values measured by <math-renderer class="js-inline-math" style="display: inline-block" data-static-url="https://github.githubassets.com/static" data-run-id="e7723f849fc59607b60fb9077984241c">$p_{K_d}$</math-renderer> </li> </ul> <h3>Protein Ligand Interaction Prediction Datasets</h3> <ul> <li><code>BindingDB</code>: protein sequences and molecule graphs with binding affinity between pairs of protein and ligand</li> <li><code>PDBBind</code>: protein sequences and molecule graphs with binding affinity between pairs of protein and ligand</li> </ul> <h2>Data Transform Modules</h2> <ul> <li><code>TruncateProtein</code>: truncate over long protein sequences into a fixed length</li> <li><code>ProteinView</code>: convert proteins to a specific view</li> </ul> <h2>Graph Construction Layers</h2> <ul> <li><code>SubsequenceNode</code>: take a protein subsequence of a specific length</li> <li><code>SubspaceNode</code>: extract a subgraph by only keeping neighboring nodes in a spatial ball for each centered node</li> <li><code>RandomEdgeMask</code>: mask out some edges randomly from the protein graph</li> </ul> <h1>Tutorials</h1> <p>To help users gain a comprehensive understanding of TorchProtein, we recommend some user-friendly tutorials for its basic usage and examples to various protein-related tasks. These tutorials may also serve as boilerplate codes for users to develop their own applications.</p> <ul> <li><a href="https://torchprotein.ai/tutorial_1" rel="nofollow">Tutorial 1 - Protein Data Structure</a> for basic usage of TorchProtein, like how to represent proteins and what operations are feasible.</li> <li><a href="https://torchprotein.ai/tutorial_2" rel="nofollow">Tutorial 2 - Sequence-based Protein Property Prediction</a> for 5 types of protein sequence understanding tasks provided in TorchProtein.</li> <li><a href="https://torchprotein.ai/tutorial_3" rel="nofollow">Tutorial 3 - Structure-based Protein Property Prediction</a> for property prediction tasks based on protein structure representations.</li> <li><a href="https://torchprotein.ai/tutorial_4" rel="nofollow">Tutorial 4 - Pre-trained Protein Structure Representations</a> for self-supervised pre-training of protein structure encoders and its fine-tuning on downstream tasks.</li> </ul> <h1>Bug Fixes</h1> <ul> <li>Fix an error in the decorator <code>@utils.cached</code> (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="1320033980" data-permission-text="Title is private" data-url="https://github.com/DeepGraphLearning/torchdrug/issues/118" data-hovercard-type="issue" data-hovercard-url="/DeepGraphLearning/torchdrug/issues/118/hovercard" href="https://github.com/DeepGraphLearning/torchdrug/issues/118">#118</a>)</li> <li>Fix an index error in <code>data.Graph.split()</code> (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="1317770753" data-permission-text="Title is private" data-url="https://github.com/DeepGraphLearning/torchdrug/issues/115" data-hovercard-type="issue" data-hovercard-url="/DeepGraphLearning/torchdrug/issues/115/hovercard" href="https://github.com/DeepGraphLearning/torchdrug/issues/115">#115</a>)</li> <li>Fix setting attribute <code>node_feature</code> , <code>edge_feature</code> and <code>graph_feature</code> (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="1317773479" data-permission-text="Title is private" data-url="https://github.com/DeepGraphLearning/torchdrug/issues/116" data-hovercard-type="issue" data-hovercard-url="/DeepGraphLearning/torchdrug/issues/116/hovercard" href="https://github.com/DeepGraphLearning/torchdrug/issues/116">#116</a>)</li> <li>Fix incorrect node feature shape for the synthon dataset <code>USPTO50k</code> (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="1317773479" data-permission-text="Title is private" data-url="https://github.com/DeepGraphLearning/torchdrug/issues/116" data-hovercard-type="issue" data-hovercard-url="/DeepGraphLearning/torchdrug/issues/116/hovercard" href="https://github.com/DeepGraphLearning/torchdrug/issues/116">#116</a>)</li> <li>Fix a compatible issue when adding node/edge/graph reference and changing node/edge to atom/bond (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="1317773479" data-permission-text="Title is private" data-url="https://github.com/DeepGraphLearning/torchdrug/issues/116" data-hovercard-type="issue" data-hovercard-url="/DeepGraphLearning/torchdrug/issues/116/hovercard" href="https://github.com/DeepGraphLearning/torchdrug/issues/116">#116</a>, <a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="1320011872" data-permission-text="Title is private" data-url="https://github.com/DeepGraphLearning/torchdrug/issues/117" data-hovercard-type="issue" data-hovercard-url="/DeepGraphLearning/torchdrug/issues/117/hovercard" href="https://github.com/DeepGraphLearning/torchdrug/issues/117">#117</a>)</li> </ul> KiddoZhu tag:github.com,2008:Repository/394517440/v0.1.3 2022-06-04T04:39:54Z 0.1.3 Release <p>TorchDrug 0.1.3 release introduces new features like W&amp;B intergration and index reference. It also provides new functions and metrics for common development need. Note 0.1.3 has some compatibility changes and be careful when you update your TorchDrug from an older version.</p> <ul> <li>W&amp;B Integration</li> <li>Index Reference</li> <li>New Functions</li> <li>New Metrics</li> <li>Improvements</li> <li>Bug Fixes</li> <li>Compatibility Changes</li> </ul> <h1>W&amp;B Integration</h1> <p>Tracking experiment progress is one of the most important demand from ML researchers and developers. For TorchDrug users, we provide a native integration of W&amp;B platform. By adding only one argument in <code>core.Engine</code>, TorchDrug will automatically copy every hyperparameter and training log to your W&amp;B database (thanks to <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/users/manangoel99/hovercard" data-octo-click="hovercard-link-click" data-octo-dimensions="link_type:self" href="https://github.com/manangoel99">@manangoel99</a>).</p> <div class="highlight highlight-source-python notranslate position-relative overflow-auto" data-snippet-clipboard-copy-content="solver = core.Engine(task, train_set, valid_set, test_set, optimizer, logger=&quot;wandb&quot;)"><pre><span class="pl-s1">solver</span> <span class="pl-c1">=</span> <span class="pl-s1">core</span>.<span class="pl-c1">Engine</span>(<span class="pl-s1">task</span>, <span class="pl-s1">train_set</span>, <span class="pl-s1">valid_set</span>, <span class="pl-s1">test_set</span>, <span class="pl-s1">optimizer</span>, <span class="pl-s1">logger</span><span class="pl-c1">=</span><span class="pl-s">"wandb"</span>)</pre></div> <p>Now you can track your training and validation performance in your browser, and compare them across different experiments.</p> <p><a target="_blank" rel="noopener noreferrer" href="/DeepGraphLearning/torchdrug/blob/v0.1.3/asset/model/wandb_demo.png"><img src="/DeepGraphLearning/torchdrug/raw/v0.1.3/asset/model/wandb_demo.png" alt="Wandb demo" style="max-width: 100%;"></a></p> <h1>Index Reference</h1> <p>Maintaining node and edge attributes could be painful when one applies a lot of transformations to a graph. TorchDrug aims to eliminate such tedious steps by registering custom attributes. This update extends the capacity of custom attributes to index reference. That means, we allow attributes to refer to indexes of nodes, edges or graphs, and they will be automatically maintained in any graph operation.</p> <p>To use index reference, simply add a context manager when we define the attributes.</p> <div class="highlight highlight-source-python notranslate position-relative overflow-auto" data-snippet-clipboard-copy-content="with graph.edge(), graph.edge_reference(): graph.inv_edge_index = torch.tensor(inv_edge_index)"><pre><span class="pl-k">with</span> <span class="pl-s1">graph</span>.<span class="pl-c1">edge</span>(), <span class="pl-s1">graph</span>.<span class="pl-c1">edge_reference</span>(): <span class="pl-s1">graph</span>.<span class="pl-c1">inv_edge_index</span> <span class="pl-c1">=</span> <span class="pl-s1">torch</span>.<span class="pl-c1">tensor</span>(<span class="pl-s1">inv_edge_index</span>)</pre></div> <p>Foor more details on index reference, please take a look at <a href="https://torchdrug.ai/docs/notes/reference.html" rel="nofollow">our notes</a>. Typical use cases include</p> <ul> <li>A pointer to the inverse edge of each edge.</li> <li>A pointer to the parent node of each node in a tree.</li> <li>A pointer to the incoming tree edge of each node in a DFS.</li> </ul> <p>Let us know if you find more interesting usage of index reference!</p> <h1>New Functions</h1> <p>Message passing over line graphs is getting more and more popular in the recent years. This version provides <code>data.Graph.line_graph</code> to efficiently construct line graphs on GPUs. It supports both a single graph or a batch of graphs.</p> <p>We are constantly focusing on better batching of irregular structures, and the variadic functions in TorchDrug are an efficient way to process batch of variadic-sized tensors without padding. This update introduces 3 new variadic functions.</p> <ul> <li><code>variadic_meshgrid</code> generates a meshgrid from two variadic tensors. Useful for implementing pairwise operations.</li> <li><code>variadic_to_padded</code> converts a variadic tensor to a padded tensor.</li> <li><code>padded_to_variadic</code> converts a padded tensor to a variadic tensor.</li> </ul> <h1>New Metrics</h1> <p>New metrics include <code>accuracy</code>, <code>matthews_corrcoef</code>, <code>pearsonr</code>, <code>spearmanr</code>. All the metrics are the same as their counterparts in scipy, but they are implemented in PyTorch and support auto differentiation.</p> <h1>Improvements</h1> <ul> <li>Add <code>data.Graph.to</code> (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="1127351379" data-permission-text="Title is private" data-url="https://github.com/DeepGraphLearning/torchdrug/issues/70" data-hovercard-type="pull_request" data-hovercard-url="/DeepGraphLearning/torchdrug/pull/70/hovercard" href="https://github.com/DeepGraphLearning/torchdrug/pull/70">#70</a>, thanks to <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/users/cthoyt/hovercard" data-octo-click="hovercard-link-click" data-octo-dimensions="link_type:self" href="https://github.com/cthoyt">@cthoyt</a>)</li> <li>Extend <code>tasks.SynthonCompletion</code> for arbitrary atom features (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="1106746939" data-permission-text="Title is private" data-url="https://github.com/DeepGraphLearning/torchdrug/issues/62" data-hovercard-type="issue" data-hovercard-url="/DeepGraphLearning/torchdrug/issues/62/hovercard" href="https://github.com/DeepGraphLearning/torchdrug/issues/62">#62</a>)</li> <li>Speed up lazy data loading (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="1073783912" data-permission-text="Title is private" data-url="https://github.com/DeepGraphLearning/torchdrug/issues/58" data-hovercard-type="pull_request" data-hovercard-url="/DeepGraphLearning/torchdrug/pull/58/hovercard" href="https://github.com/DeepGraphLearning/torchdrug/pull/58">#58</a>, thanks to <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/users/wconnell/hovercard" data-octo-click="hovercard-link-click" data-octo-dimensions="link_type:self" href="https://github.com/wconnell">@wconnell</a>)</li> <li>Speed up rspmm cuda kernels</li> <li>Add docker support</li> <li>Add more documentation for <code>data.Graph</code> and <code>data.Molecule</code></li> </ul> <h1>Bug Fixes</h1> <ul> <li>Fix computation of output dimension in several GNNs (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="1216662617" data-permission-text="Title is private" data-url="https://github.com/DeepGraphLearning/torchdrug/issues/92" data-hovercard-type="issue" data-hovercard-url="/DeepGraphLearning/torchdrug/issues/92/hovercard" href="https://github.com/DeepGraphLearning/torchdrug/issues/92">#92</a>, thanks to <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/users/kanojikajino/hovercard" data-octo-click="hovercard-link-click" data-octo-dimensions="link_type:self" href="https://github.com/kanojikajino">@kanojikajino</a>)</li> <li>Fix <code>data.PackedGraph.__getitem__</code> when the batch is empty (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="1190693549" data-permission-text="Title is private" data-url="https://github.com/DeepGraphLearning/torchdrug/issues/83" data-hovercard-type="issue" data-hovercard-url="/DeepGraphLearning/torchdrug/issues/83/hovercard" href="https://github.com/DeepGraphLearning/torchdrug/issues/83">#83</a>, thanks to <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/users/jannisborn/hovercard" data-octo-click="hovercard-link-click" data-octo-dimensions="link_type:self" href="https://github.com/jannisborn">@jannisborn</a>)</li> <li>Fix patched modules for PyTorch&gt;=1.6.0 (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="1166853569" data-permission-text="Title is private" data-url="https://github.com/DeepGraphLearning/torchdrug/issues/77" data-hovercard-type="issue" data-hovercard-url="/DeepGraphLearning/torchdrug/issues/77/hovercard" href="https://github.com/DeepGraphLearning/torchdrug/issues/77">#77</a>)</li> <li>Fix <code>make_configurable</code> for <code>torch.utils.data</code> (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="1190808676" data-permission-text="Title is private" data-url="https://github.com/DeepGraphLearning/torchdrug/issues/85" data-hovercard-type="issue" data-hovercard-url="/DeepGraphLearning/torchdrug/issues/85/hovercard" href="https://github.com/DeepGraphLearning/torchdrug/issues/85">#85</a>)</li> <li>Fix <code>multi_slice_mask</code>, <code>variadic_max</code> for multi-dimensional input</li> <li>Fix <code>variadic_topk</code> for input containing infinite values</li> </ul> <h1>Compatibility Changes</h1> <p>TorchDrug now supports Python 3.7/3.8/3.9. Starting from this version, TorchDrug requires a minimal PyTorch version of 1.8.0 and a minimal RDKit version of 2020.09.</p> <p>Argument <code>node_feature</code> and <code>edge_feature</code> are renamed to <code>atom_feature</code> and <code>bond_feature</code> in <code>data.Molecule.from_smiles</code> and <code>data.Molecule.from_molecule</code>. The old interface is still supported with deprecated warnings.</p> KiddoZhu tag:github.com,2008:Repository/394517440/v0.1.2 2021-10-26T09:25:38Z 0.1.2 Release <h1>0.1.2 Release Notes</h1> <p>The recent 0.1.2 release of TorchDrug is an update on Colab tutorials, data structures, functions, datasets and bug fixes. We are grateful to see growing interests and involvement from the community, especially on the retrosynthesis task. Welcome more in the future!</p> <ul> <li>Colab Tutorials</li> <li>New Data Structures</li> <li>New Functions</li> <li>New Datasets</li> <li>Bug Fixes</li> </ul> <h1>Colab Tutorials</h1> <p>To familiarize users with the logic and capacity of TorchDrug, we compile a full set of Colab tutorials, covering from basic usage to different drug discovery tasks. All the tutorials are fully interactive and may serve as boilerplate code for your own applications.</p> <ul> <li><a href="https://colab.research.google.com/drive/1Tbnr1Fog_YjkqU1MOhcVLuxqZ4DC-c8-?usp=sharing" rel="nofollow">Basic Usage and Pipeline</a> shows the manipulation of data structures like <code>data.Graph</code> and <code>data.Molecule</code>, as well as the training and evaluation pipelines for property prediction models.</li> <li><a href="https://colab.research.google.com/drive/10faCIVIfln20f2h1oQk2UrXiAMqZKLoW?usp=sharing#forceEdit=true&amp;sandboxMode=true" rel="nofollow">Pretrained Molecular Representations</a> demonstrates the steps for self-supervised pretraining of a molecular representation model and finetuning it on downstream tasks.</li> <li><a href="https://colab.research.google.com/drive/1JEMiMvSBuqCuzzREYpviNZZRVOYsgivA?usp=sharing#forceEdit=true&amp;sandboxMode=true" rel="nofollow">De novo Molecule Design</a> illustrates the routine of training generative models for molecule generation and finetuning them with reinforcement learning for property optimization. Two popular models, GCPN and GraphAF, are covered in the tutorial.</li> <li><a href="https://colab.research.google.com/drive/1IH1hk7K3MaxAEe5m6CFY7Eyej3RuiEL1?usp=sharing#forceEdit=true&amp;sandboxMode=true" rel="nofollow">Retrosynthesis</a> shows how to use the state-of-the-art model, G2Gs, to predict a set reactants for synthesizing a target molecule.</li> <li><a href="https://colab.research.google.com/drive/1-sjqQZhYrGM0HiMuaqXOiqhDNlJi7g_I?usp=sharing#forceEdit=true&amp;sandboxMode=true" rel="nofollow">Knowledge Graph Reasoning</a> goes through the steps of training and evaluating models for knowledge graph completion, including both knowledge graph embeddings and neural inductive logic programming.</li> </ul> <h1>New Data Structures</h1> <ul> <li>A new data structure <code>data.Dictionary</code> that stores key-value mapping of PyTorch tensors on either CPUs or GPUs. It enjoys O(n) memory consumption and O(1) query time, and supports parallelism over batch of queries. This API provides a great opportunity for implementing sparse lookup tables or set operations in a PyTorchic style.</li> <li>A new method <code>data.Graph.match</code> to efficiently retrieve all edges of specific patterns on either CPUs or GPUs. It scales linearly w.r.t. the number of patterns plus the number of retrieved edges, regardless the size of the graph. Typical usage of this method includes querying the existence of edges, generating random walks or even extracting ego graphs.</li> </ul> <h1>New Functions</h1> <p>Batching irregular structures, such as graphs, sets or sequences with different sizes, is a common demand in drug discovery. Instead of clumsy padding-based implementation, TorchDrug provides a family of functions that efficiently manipulate batch of variadic-sized tensors without padding. The update contains the following new variadic functions.</p> <ul> <li><code>variadic_arange</code> returns a 1-D tensor that contains integer intervals of variadic sizes.</li> <li><code>variadic_softmax</code> computes softmax over categories with variadic sizes.</li> <li><code>variadic_sort</code> sorts elements in sets with variadic sizes.</li> <li><code>variadic_randperm</code> returns random permutations for sets with variadic sizes, where the <code>i</code>-th permutation contains integers from 0 to <code>size[i] - 1</code>.</li> <li><code>variadic_sample</code> draws samples with replacement from sets with variadic sizes.</li> </ul> <h1>New Datasets</h1> <ul> <li>PCQM4M: A large-scale molecule property prediction dataset, originally used in OGB-LSC (thanks to <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/users/OPAYA/hovercard" data-octo-click="hovercard-link-click" data-octo-dimensions="link_type:self" href="https://github.com/OPAYA">@OPAYA</a> )</li> </ul> <h1>Bug Fixes</h1> <ul> <li>Fix import of sascorer in plogp evaluation (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="980140451" data-permission-text="Title is private" data-url="https://github.com/DeepGraphLearning/torchdrug/issues/18" data-hovercard-type="issue" data-hovercard-url="/DeepGraphLearning/torchdrug/issues/18/hovercard" href="https://github.com/DeepGraphLearning/torchdrug/issues/18">#18</a>, <a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="992925822" data-permission-text="Title is private" data-url="https://github.com/DeepGraphLearning/torchdrug/issues/31" data-hovercard-type="issue" data-hovercard-url="/DeepGraphLearning/torchdrug/issues/31/hovercard" href="https://github.com/DeepGraphLearning/torchdrug/issues/31">#31</a>)</li> <li>Fix atoms with stereo bonds in retrosynthesis (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="1007009131" data-permission-text="Title is private" data-url="https://github.com/DeepGraphLearning/torchdrug/issues/42" data-hovercard-type="issue" data-hovercard-url="/DeepGraphLearning/torchdrug/issues/42/hovercard" href="https://github.com/DeepGraphLearning/torchdrug/issues/42">#42</a>, <a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="1010562013" data-permission-text="Title is private" data-url="https://github.com/DeepGraphLearning/torchdrug/issues/43" data-hovercard-type="issue" data-hovercard-url="/DeepGraphLearning/torchdrug/issues/43/hovercard" href="https://github.com/DeepGraphLearning/torchdrug/issues/43">#43</a>)</li> <li>Fix lazy construction for molecule datasets (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="992913005" data-permission-text="Title is private" data-url="https://github.com/DeepGraphLearning/torchdrug/issues/30" data-hovercard-type="issue" data-hovercard-url="/DeepGraphLearning/torchdrug/issues/30/hovercard" href="https://github.com/DeepGraphLearning/torchdrug/issues/30">#30</a>, thanks to <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/users/DaShenZi721/hovercard" data-octo-click="hovercard-link-click" data-octo-dimensions="link_type:self" href="https://github.com/DaShenZi721">@DaShenZi721</a> )</li> <li>Fix ChEMBLFiltered dataset (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="995012855" data-permission-text="Title is private" data-url="https://github.com/DeepGraphLearning/torchdrug/issues/36" data-hovercard-type="issue" data-hovercard-url="/DeepGraphLearning/torchdrug/issues/36/hovercard" href="https://github.com/DeepGraphLearning/torchdrug/issues/36">#36</a>)</li> <li>Fix ZINC2m dataset (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="994187008" data-permission-text="Title is private" data-url="https://github.com/DeepGraphLearning/torchdrug/issues/33" data-hovercard-type="issue" data-hovercard-url="/DeepGraphLearning/torchdrug/issues/33/hovercard" href="https://github.com/DeepGraphLearning/torchdrug/issues/33">#33</a>)</li> <li>Fix USPTO50k dataset (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="993261477" data-permission-text="Title is private" data-url="https://github.com/DeepGraphLearning/torchdrug/issues/32" data-hovercard-type="issue" data-hovercard-url="/DeepGraphLearning/torchdrug/issues/32/hovercard" href="https://github.com/DeepGraphLearning/torchdrug/issues/32">#32</a>)</li> <li>Fix bugs in core.Configurable (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="985267357" data-permission-text="Title is private" data-url="https://github.com/DeepGraphLearning/torchdrug/issues/26" data-hovercard-type="issue" data-hovercard-url="/DeepGraphLearning/torchdrug/issues/26/hovercard" href="https://github.com/DeepGraphLearning/torchdrug/issues/26">#26</a>)</li> <li>Fix/improve documentation (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="979782930" data-permission-text="Title is private" data-url="https://github.com/DeepGraphLearning/torchdrug/issues/16" data-hovercard-type="issue" data-hovercard-url="/DeepGraphLearning/torchdrug/issues/16/hovercard" href="https://github.com/DeepGraphLearning/torchdrug/issues/16">#16</a>, <a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="991216065" data-permission-text="Title is private" data-url="https://github.com/DeepGraphLearning/torchdrug/issues/28" data-hovercard-type="issue" data-hovercard-url="/DeepGraphLearning/torchdrug/issues/28/hovercard" href="https://github.com/DeepGraphLearning/torchdrug/issues/28">#28</a>, <a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="1006029801" data-permission-text="Title is private" data-url="https://github.com/DeepGraphLearning/torchdrug/issues/41" data-hovercard-type="issue" data-hovercard-url="/DeepGraphLearning/torchdrug/issues/41/hovercard" href="https://github.com/DeepGraphLearning/torchdrug/issues/41">#41</a>)</li> <li>Fix installation on macOS (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="992622930" data-permission-text="Title is private" data-url="https://github.com/DeepGraphLearning/torchdrug/issues/29" data-hovercard-type="issue" data-hovercard-url="/DeepGraphLearning/torchdrug/issues/29/hovercard" href="https://github.com/DeepGraphLearning/torchdrug/issues/29">#29</a>)</li> </ul> KatarinaYuan tag:github.com,2008:Repository/394517440/v0.1.1 2021-09-14T08:55:45Z v0.1.1 <p>release v0.1.1</p> KiddoZhu tag:github.com,2008:Repository/394517440/v0.1.0 2021-08-11T09:40:39Z v0.1.0 <p>release v0.1.0</p> KiddoZhu