Merge branch 'normalization_resolver' of github.com:pyg-team/pytorch_…

…geometric into normalization_resolver
pyg-team · Jul 7, 2022 · a608932 · a608932
2 parents 142c2fc + e02bd10
commit a608932
Show file tree

Hide file tree

Showing 21 changed files with 160 additions and 144 deletions.
diff --git a/.github/workflows/testing.yml b/.github/workflows/testing.yml
@@ -37,7 +37,7 @@ jobs:
       - name: Install internal dependencies
         run: |
           pip install torch-scatter -f https://data.pyg.org/whl/torch-${{ matrix.torch-version }}+cpu.html
-          pip install torch-sparse -f https://data.pyg.org/whl/torch-${{ matrix.torch-version }}+cpu.html
+          pip install torch-sparse==0.6.13 -f https://data.pyg.org/whl/torch-${{ matrix.torch-version }}+cpu.html
           pip install torch-cluster -f https://data.pyg.org/whl/torch-${{ matrix.torch-version }}+cpu.html
           pip install torch-spline-conv -f https://data.pyg.org/whl/torch-${{ matrix.torch-version }}+cpu.html
 

diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -6,6 +6,7 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
 ## [2.0.5] - 2022-MM-DD
 ### Added
 - Added support for `normalization_resolver` ([#4926](https://github.com/pyg-team/pytorch_geometric/pull/4926))
+- Added notebook tutorial for `torch_geometric.nn.aggr` package to documentation ([#4927](https://github.com/pyg-team/pytorch_geometric/pull/4927))
 - Added support for `follow_batch` for lists or dictionaries of tensors ([#4837](https://github.com/pyg-team/pytorch_geometric/pull/4837))
 - Added `Data.validate()` and `HeteroData.validate()` functionality ([#4885](https://github.com/pyg-team/pytorch_geometric/pull/4885))
 - Added `LinkNeighborLoader` support to `LightningDataModule` ([#4868](https://github.com/pyg-team/pytorch_geometric/pull/4868))
@@ -27,7 +28,7 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
 - Added the `bias` vector to the `GCN` model definition in the "Create Message Passing Networks" tutorial ([#4755](https://github.com/pyg-team/pytorch_geometric/pull/4755))
 - Added `transforms.RootedSubgraph` interface with two implementations: `RootedEgoNets` and `RootedRWSubgraph` ([#3926](https://github.com/pyg-team/pytorch_geometric/pull/3926))
 - Added `ptr` vectors for `follow_batch` attributes within `Batch.from_data_list` ([#4723](https://github.com/pyg-team/pytorch_geometric/pull/4723))
-- Added `torch_geometric.nn.aggr` package ([#4687](https://github.com/pyg-team/pytorch_geometric/pull/4687), [#4721](https://github.com/pyg-team/pytorch_geometric/pull/4721), [#4731](https://github.com/pyg-team/pytorch_geometric/pull/4731), [#4762](https://github.com/pyg-team/pytorch_geometric/pull/4762), [#4749](https://github.com/pyg-team/pytorch_geometric/pull/4749), [#4779](https://github.com/pyg-team/pytorch_geometric/pull/4779), [#4863](https://github.com/pyg-team/pytorch_geometric/pull/4863), [#4864](https://github.com/pyg-team/pytorch_geometric/pull/4864), [#4865](https://github.com/pyg-team/pytorch_geometric/pull/4865), [#4866](https://github.com/pyg-team/pytorch_geometric/pull/4866), [#4872](https://github.com/pyg-team/pytorch_geometric/pull/4872))
+- Added `torch_geometric.nn.aggr` package ([#4687](https://github.com/pyg-team/pytorch_geometric/pull/4687), [#4721](https://github.com/pyg-team/pytorch_geometric/pull/4721), [#4731](https://github.com/pyg-team/pytorch_geometric/pull/4731), [#4762](https://github.com/pyg-team/pytorch_geometric/pull/4762), [#4749](https://github.com/pyg-team/pytorch_geometric/pull/4749), [#4779](https://github.com/pyg-team/pytorch_geometric/pull/4779), [#4863](https://github.com/pyg-team/pytorch_geometric/pull/4863), [#4864](https://github.com/pyg-team/pytorch_geometric/pull/4864), [#4865](https://github.com/pyg-team/pytorch_geometric/pull/4865), [#4866](https://github.com/pyg-team/pytorch_geometric/pull/4866), [#4872](https://github.com/pyg-team/pytorch_geometric/pull/4872), [#4934](https://github.com/pyg-team/pytorch_geometric/pull/4934), [#4935](https://github.com/pyg-team/pytorch_geometric/pull/4935))
 - Added the `DimeNet++` model ([#4432](https://github.com/pyg-team/pytorch_geometric/pull/4432), [#4699](https://github.com/pyg-team/pytorch_geometric/pull/4699), [#4700](https://github.com/pyg-team/pytorch_geometric/pull/4700), [#4800](https://github.com/pyg-team/pytorch_geometric/pull/4800))
 - Added an example of using PyG with PyTorch Ignite ([#4487](https://github.com/pyg-team/pytorch_geometric/pull/4487))
 - Added `GroupAddRev` module with support for reducing training GPU memory ([#4671](https://github.com/pyg-team/pytorch_geometric/pull/4671), [#4701](https://github.com/pyg-team/pytorch_geometric/pull/4701), [#4715](https://github.com/pyg-team/pytorch_geometric/pull/4715), [#4730](https://github.com/pyg-team/pytorch_geometric/pull/4730))
@@ -49,6 +50,8 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
 - Added support for graph-level outputs in `to_hetero` ([#4582](https://github.com/pyg-team/pytorch_geometric/pull/4582))
 - Added `CHANGELOG.md` ([#4581](https://github.com/pyg-team/pytorch_geometric/pull/4581))
 ### Changed
+- `len(batch)` will now return the number of graphs inside the batch, not the number of attributes ([#4931](https://github.com/pyg-team/pytorch_geometric/pull/4931))
+- Fixed `data.subgraph` generation for 0-dim tensors ([#4932](https://github.com/pyg-team/pytorch_geometric/pull/4932))
 - Removed unnecssary inclusion of self-loops when sampling negative edges ([#4880](https://github.com/pyg-team/pytorch_geometric/pull/4880))
 - Fixed `InMemoryDataset` inferring wrong `len` for lists of tensors ([#4837](https://github.com/pyg-team/pytorch_geometric/pull/4837))
 - Fixed `Batch.separate` when using it for lists of tensors ([#4837](https://github.com/pyg-team/pytorch_geometric/pull/4837))

diff --git a/docker/Dockerfile b/docker/Dockerfile
@@ -1,5 +1,12 @@
 FROM ubuntu:18.04
 
+# metainformation
+LABEL org.opencontainers.image.version = "2.0.4"
+LABEL org.opencontainers.image.authors = "Matthias Fey"
+LABEL org.opencontainers.image.source = "https://github.com/pyg-team/pytorch_geometric"
+LABEL org.opencontainers.image.licenses = "MIT"
+LABEL org.opencontainers.image.base.name="docker.io/library/ubuntu:18.04"
+
 RUN apt-get update && apt-get install -y apt-transport-https ca-certificates && \
     rm -rf /var/lib/apt/lists/*
 

diff --git a/docs/source/notes/colabs.rst b/docs/source/notes/colabs.rst
@@ -9,6 +9,7 @@ We have prepared a list of colab notebooks that practically introduces you to th
 4. `Scaling Graph Neural Networks <https://colab.research.google.com/drive/1XAjcjRHrSR_ypCk_feIWFbcBKyT4Lirs?usp=sharing>`__
 5. `Point Cloud Classification with Graph Neural Networks <https://colab.research.google.com/drive/1D45E5bUK3gQ40YpZo65ozs7hg5l-eo_U?usp=sharing>`__
 6. `Explaining GNN Model Predictions using Captum <https://colab.research.google.com/drive/1fLJbFPz0yMCQg81DdCP5I8jXw9LoggKO?usp=sharing>`__
+7. `Customizing Aggregations within Message Passing <https://colab.research.google.com/drive/1KKw-VUDQuHhMo7sCd7ZaRROza3leBjRR?usp=sharing>`__
 
 **Stanford CS224W Graph ML Tutorials:**
 

diff --git a/test/data/test_batch.py b/test/data/test_batch.py
@@ -52,8 +52,7 @@ def test_batch():
     assert str(batch) == ('DataBatch(x=[3], edge_index=[2, 4], y=[1], '
                           'x_sp=[3, 1, nnz=3], adj=[3, 3, nnz=4], s=[1], '
                           'array=[1], num_nodes=3, batch=[3], ptr=[2])')
-    assert batch.num_graphs == 1
-    assert len(batch) == 10
+    assert batch.num_graphs == len(batch) == 1
     assert batch.x.tolist() == [1, 2, 3]
     assert batch.y.tolist() == [1]
     assert batch.x_sp.to_dense().view(-1).tolist() == batch.x.tolist()
@@ -72,8 +71,7 @@ def test_batch():
                           'x_sp=[9, 1, nnz=9], adj=[9, 9, nnz=12], s=[3], '
                           's_batch=[3], s_ptr=[4], array=[3], num_nodes=9, '
                           'batch=[9], ptr=[4])')
-    assert batch.num_graphs == 3
-    assert len(batch) == 12
+    assert batch.num_graphs == len(batch) == 3
     assert batch.x.tolist() == [1, 2, 3, 1, 2, 1, 2, 3, 4]
     assert batch.y.tolist() == [1, 2, 3]
     assert batch.x_sp.to_dense().view(-1).tolist() == batch.x.tolist()
@@ -174,7 +172,7 @@ def __cat_dim__(self, key, value, *args, **kwargs):
 
     assert str(batch) == ('MyDataBatch(x=[5], y=[2], foo=[2, 4], batch=[5], '
                           'ptr=[3])')
-    assert len(batch) == 5
+    assert batch.num_graphs == len(batch) == 2
     assert batch.x.tolist() == [1, 2, 3, 1, 2]
     assert batch.foo.size() == (2, 4)
     assert batch.foo[0].tolist() == foo1.tolist()
@@ -208,7 +206,7 @@ def test_pickling():
     assert batch.num_nodes == 20
 
     assert batch.__class__.__name__ == 'DataBatch'
-    assert len(batch) == 3
+    assert batch.num_graphs == len(batch) == 4
 
     os.remove(path)
 
@@ -230,8 +228,7 @@ def test_recursive_batch():
 
     batch = Batch.from_data_list([data1, data2])
 
-    assert len(batch) == 5
-    assert batch.num_graphs == 2
+    assert batch.num_graphs == len(batch) == 2
     assert batch.num_nodes == 90
 
     assert torch.allclose(batch.x['1'],
@@ -267,7 +264,7 @@ def test_batching_of_batches():
     batch = Batch.from_data_list([data, data])
 
     batch = Batch.from_data_list([batch, batch])
-    assert len(batch) == 2
+    assert batch.num_graphs == len(batch) == 2
     assert batch.x[0:2].tolist() == data.x.tolist()
     assert batch.x[2:4].tolist() == data.x.tolist()
     assert batch.x[4:6].tolist() == data.x.tolist()
@@ -296,8 +293,7 @@ def test_hetero_batch():
 
     batch = Batch.from_data_list([data1, data2])
 
-    assert len(batch) == 5
-    assert batch.num_graphs == 2
+    assert batch.num_graphs == len(batch) == 2
     assert batch.num_nodes == 450
 
     assert torch.allclose(batch['p'].x[:100], data1['p'].x)

diff --git a/test/datasets/test_enzymes.py b/test/datasets/test_enzymes.py
@@ -22,25 +22,24 @@ def test_enzymes(get_dataset):
     assert len(dataset[mask]) == 100
 
     loader = DataLoader(dataset, batch_size=len(dataset))
-    for data in loader:
-        assert data.num_graphs == 600
+    for batch in loader:
+        assert batch.num_graphs == len(batch) == 600
 
-        avg_num_nodes = data.num_nodes / data.num_graphs
+        avg_num_nodes = batch.num_nodes / batch.num_graphs
         assert pytest.approx(avg_num_nodes, abs=1e-2) == 32.63
 
-        avg_num_edges = data.num_edges / (2 * data.num_graphs)
+        avg_num_edges = batch.num_edges / (2 * batch.num_graphs)
         assert pytest.approx(avg_num_edges, abs=1e-2) == 62.14
 
-        assert len(data) == 5
-        assert list(data.x.size()) == [data.num_nodes, 3]
-        assert list(data.y.size()) == [data.num_graphs]
-        assert data.y.max() + 1 == 6
-        assert list(data.batch.size()) == [data.num_nodes]
-        assert data.ptr.numel() == data.num_graphs + 1
+        assert list(batch.x.size()) == [batch.num_nodes, 3]
+        assert list(batch.y.size()) == [batch.num_graphs]
+        assert batch.y.max() + 1 == 6
+        assert list(batch.batch.size()) == [batch.num_nodes]
+        assert batch.ptr.numel() == batch.num_graphs + 1
 
-        assert data.has_isolated_nodes()
-        assert not data.has_self_loops()
-        assert data.is_undirected()
+        assert batch.has_isolated_nodes()
+        assert not batch.has_self_loops()
+        assert batch.is_undirected()
 
     loader = DataListLoader(dataset, batch_size=len(dataset))
     for data_list in loader:
@@ -49,7 +48,6 @@ def test_enzymes(get_dataset):
     dataset.transform = ToDense(num_nodes=126)
     loader = DenseDataLoader(dataset, batch_size=len(dataset))
     for data in loader:
-        assert len(data) == 4
         assert list(data.x.size()) == [600, 126, 3]
         assert list(data.adj.size()) == [600, 126, 126]
         assert list(data.mask.size()) == [600, 126]

diff --git a/test/datasets/test_planetoid.py b/test/datasets/test_planetoid.py
@@ -8,25 +8,24 @@ def test_citeseer(get_dataset):
     assert len(dataset) == 1
     assert dataset.__repr__() == 'CiteSeer()'
 
-    for data in loader:
-        assert data.num_graphs == 1
-        assert data.num_nodes == 3327
-        assert data.num_edges / 2 == 4552
-
-        assert len(data) == 8
-        assert list(data.x.size()) == [data.num_nodes, 3703]
-        assert list(data.y.size()) == [data.num_nodes]
-        assert data.y.max() + 1 == 6
-        assert data.train_mask.sum() == 6 * 20
-        assert data.val_mask.sum() == 500
-        assert data.test_mask.sum() == 1000
-        assert (data.train_mask & data.val_mask & data.test_mask).sum() == 0
-        assert list(data.batch.size()) == [data.num_nodes]
-        assert data.ptr.tolist() == [0, data.num_nodes]
-
-        assert data.has_isolated_nodes()
-        assert not data.has_self_loops()
-        assert data.is_undirected()
+    for batch in loader:
+        assert batch.num_graphs == len(batch) == 1
+        assert batch.num_nodes == 3327
+        assert batch.num_edges / 2 == 4552
+
+        assert list(batch.x.size()) == [batch.num_nodes, 3703]
+        assert list(batch.y.size()) == [batch.num_nodes]
+        assert batch.y.max() + 1 == 6
+        assert batch.train_mask.sum() == 6 * 20
+        assert batch.val_mask.sum() == 500
+        assert batch.test_mask.sum() == 1000
+        assert (batch.train_mask & batch.val_mask & batch.test_mask).sum() == 0
+        assert list(batch.batch.size()) == [batch.num_nodes]
+        assert batch.ptr.tolist() == [0, batch.num_nodes]
+
+        assert batch.has_isolated_nodes()
+        assert not batch.has_self_loops()
+        assert batch.is_undirected()
 
 
 def test_citeseer_with_full_split(get_dataset):

diff --git a/test/loader/test_dataloader.py b/test/loader/test_dataloader.py
@@ -39,7 +39,7 @@ def test_dataloader(num_workers):
     assert len(loader) == 2
 
     for batch in loader:
-        assert len(batch) == 8
+        assert batch.num_graphs == len(batch) == 2
         assert batch.batch.tolist() == [0, 0, 0, 1, 1, 1]
         assert batch.ptr.tolist() == [0, 3, 6]
         assert batch.x.tolist() == [[1], [1], [1], [1], [1], [1]]
@@ -58,7 +58,7 @@ def test_dataloader(num_workers):
     assert len(loader) == 2
 
     for batch in loader:
-        assert len(batch) == 10
+        assert batch.num_graphs == len(batch) == 2
         assert batch.edge_index_batch.tolist() == [0, 0, 0, 0, 1, 1, 1, 1]
 
 
@@ -72,10 +72,10 @@ def test_multiprocessing():
         queue.put(batch)
 
     batch = queue.get()
-    assert len(batch) == 3
+    assert batch.num_graphs == len(batch) == 2
 
     batch = queue.get()
-    assert len(batch) == 3
+    assert batch.num_graphs == len(batch) == 2
 
 
 def test_pin_memory():
@@ -104,7 +104,7 @@ def test_heterogeneous_dataloader(num_workers):
     assert len(loader) == 2
 
     for batch in loader:
-        assert len(batch) == 5
+        assert batch.num_graphs == len(batch) == 2
         assert batch.num_nodes == 600
 
         for store in batch.stores:

diff --git a/test/loader/test_shadow.py b/test/loader/test_shadow.py
@@ -20,7 +20,7 @@ def test_shadow_k_hop_sampler():
     assert len(loader) == 1
 
     batch1 = next(iter(loader))
-    assert len(batch1) == 7
+    assert batch1.num_graphs == len(batch1) == 2
 
     assert batch1.batch.tolist() == [0, 0, 0, 0, 1, 1, 1]
     assert batch1.ptr.tolist() == [0, 4, 7]
@@ -42,7 +42,7 @@ def test_shadow_k_hop_sampler():
     assert len(loader) == 1
 
     batch2 = next(iter(loader))
-    assert len(batch2) == 6
+    assert batch2.num_graphs == len(batch2) == 2
 
     assert batch1.batch.tolist() == batch2.batch.tolist()
     assert batch1.ptr.tolist() == batch2.ptr.tolist()

diff --git a/test/nn/aggr/test_scaler.py b/test/nn/aggr/test_scaler.py
@@ -10,15 +10,16 @@ def test_degree_scaler_aggregation():
     ptr = torch.tensor([0, 2, 5, 6])
     deg = torch.tensor([0, 3, 0, 1, 1, 0])
 
-    aggrs = ['mean', 'sum', 'max']
-    scalers = [
+    aggr = ['mean', 'sum', 'max']
+    scaler = [
         'identity', 'amplification', 'attenuation', 'linear', 'inverse_linear'
     ]
-    aggr = DegreeScalerAggregation(aggrs, scalers, deg)
+    aggr = DegreeScalerAggregation(aggr, scaler, deg)
     assert str(aggr) == 'DegreeScalerAggregation()'
 
     out = aggr(x, index)
     assert out.size() == (3, 240)
+    assert torch.allclose(torch.jit.script(aggr)(x, index), out)
 
     with pytest.raises(NotImplementedError):
         aggr(x, ptr=ptr)
diff --git a/test/transforms/test_rooted_subgraph.py b/test/transforms/test_rooted_subgraph.py
@@ -73,7 +73,7 @@ def test_rooted_subgraph_minibatch():
     loader = DataLoader([data, data], batch_size=2)
     batch = next(iter(loader))
     batch = batch.map_data()
-    assert len(batch) == 6
+    assert batch.num_graphs == len(batch) == 2
 
     assert batch.x.size() == (14, 8)
     assert batch.edge_index.size() == (2, 16)

diff --git a/test/transforms/test_to_superpixels.py b/test/transforms/test_to_superpixels.py
@@ -57,13 +57,13 @@ def test_to_superpixels():
     assert y == 7
 
     loader = DataLoader(dataset, batch_size=2, shuffle=False)
-    for data, y in loader:
-        assert len(data) == 4
-        assert data.pos.dim() == 2 and data.pos.size(1) == 2
-        assert data.x.dim() == 2 and data.x.size(1) == 1
-        assert data.batch.dim() == 1
-        assert data.ptr.dim() == 1
-        assert data.pos.size(0) == data.x.size(0) == data.batch.size(0)
+    for batch, y in loader:
+        assert batch.num_graphs == len(batch) == 2
+        assert batch.pos.dim() == 2 and batch.pos.size(1) == 2
+        assert batch.x.dim() == 2 and batch.x.size(1) == 1
+        assert batch.batch.dim() == 1
+        assert batch.ptr.dim() == 1
+        assert batch.pos.size(0) == batch.x.size(0) == batch.batch.size(0)
         assert y.tolist() == [7, 2]
         break
 
@@ -81,15 +81,15 @@ def test_to_superpixels():
     assert y == 7
 
     loader = DataLoader(dataset, batch_size=2, shuffle=False)
-    for data, y in loader:
-        assert len(data) == 6
-        assert data.pos.dim() == 2 and data.pos.size(1) == 2
-        assert data.x.dim() == 2 and data.x.size(1) == 1
-        assert data.batch.dim() == 1
-        assert data.ptr.dim() == 1
-        assert data.pos.size(0) == data.x.size(0) == data.batch.size(0)
-        assert data.seg.size() == (2, 28, 28)
-        assert data.img.size() == (2, 1, 28, 28)
+    for batch, y in loader:
+        assert batch.num_graphs == len(batch) == 2
+        assert batch.pos.dim() == 2 and batch.pos.size(1) == 2
+        assert batch.x.dim() == 2 and batch.x.size(1) == 1
+        assert batch.batch.dim() == 1
+        assert batch.ptr.dim() == 1
+        assert batch.pos.size(0) == batch.x.size(0) == batch.batch.size(0)
+        assert batch.seg.size() == (2, 28, 28)
+        assert batch.img.size() == (2, 1, 28, 28)
         assert y.tolist() == [7, 2]
         break
 

diff --git a/torch_geometric/data/batch.py b/torch_geometric/data/batch.py
@@ -180,6 +180,9 @@ def num_graphs(self) -> int:
         else:
             raise ValueError("Can not infer the number of graphs")
 
+    def __len__(self) -> int:
+        return self.num_graphs
+
     def __reduce__(self):
         state = self.__dict__.copy()
         return DynamicInheritanceGetter(), self.__class__.__bases__, state
diff --git a/torch_geometric/data/collate.py b/torch_geometric/data/collate.py
@@ -142,7 +142,12 @@ def _collate(
             # Write directly into shared memory to avoid an extra copy:
             numel = sum(value.numel() for value in values)
             storage = elem.storage()._new_shared(numel)
-            out = elem.new(storage)
+            shape = list(elem.size())
+            if cat_dim is None or elem.dim() == 0:
+                shape = [len(values)] + shape
+            else:
+                shape[cat_dim] = int(slices[-1])
+            out = elem.new(storage).resize_(*shape)
         else:
             out = None