PyG 2.2.0: Accelerations and Scalability

We are excited to announce the release of PyG 2.2 🎉🎉🎉

PyG 2.2 is the culmination of work from 78 contributors who have worked on features and bug-fixes for a total of over 320 commits since torch-geometric==2.1.0.

Highlights

`pyg-lib` Integration

We are proud to release and integrate pyg-lib==0.1.0 into PyG, the first stable version of our new low-level Graph Neural Network library to drive all CPU and GPU acceleration needs of PyG (#5330, #5347, #5384, #5388).

You can install pyg-lib as described in our README.md:

pip install pyg-lib -f https://data.pyg.org/whl/torch-${TORCH}+${CUDA}.html

import pyg_lib

Once pyg-lib is installed, it will get automatically picked up by PyG, e.g., to accelerate neighborhood sampling routines or to accelerate heterogeneous GNN execution:

pyg-lib provides fast and optimized CPU routines to iteratively sample neighbors in homogeneous and heterogeneous graphs, and heavily improves upon the previously used neighborhood sampling techniques utilized in PyG.

pyg-lib provides efficient GPU-based routines to parallelize workloads in heterogeneous graphs across different node types and edge types. We achieve this by leveraging type-dependent transformations via NVIDIA CUTLASS integration, which is flexible to implement most heterogeneous GNNs with, and efficient, even for sparse edge types or a large number of different node types.

`GraphStore` and `FeatureStore` Abstractions

PyG 2.2 includes numerous primitives to easily integrate with simple paradigms for scalable graph machine learning, enabling users to train GNNs on graphs far larger than the size of their machine's available memory. It does so by introducing simple, easy-to-use, and extensible abstractions of a FeatureStore and a GraphStore that plug directly into existing familiar PyG interfaces (see here for the accompanying tutorial).

feature_store = CustomFeatureStore()
feature_store['paper', 'x', None] = ...  # Add paper features
feature_store['author', 'x', None] = ...  # Add author features

graph_store = CustomGraphStore()
graph_store['edge', 'coo'] = ...  # Add edges in "COO" format

# `CustomGraphSampler` knows how to sample on `CustomGraphStore`:
graph_sampler = CustomGraphSampler(
    graph_store=graph_store,
    num_neighbors=[10, 20],
    ...
)

from torch_geometric.loader import NodeLoader
loader = NodeLoader(
    data=(feature_store, graph_store),
    node_sampler=graph_sampler,
    batch_size=20,
    input_nodes='paper',
)

for batch in loader:
    pass

Data loading and sampling routines are refactored and decomposed into torch_geometric.loader and torch_geometric.sampler modules, respectively (#5563, #5820, #5456, #5457, #5312, #5365, #5402, #5404, #5418).

Optimized and Fused Aggregations

PyG 2.2 further accelerates scatter aggregations based on CPU/GPU and with/without backward computation paths (requires torch>=1.12.0 and torch-scatter>=2.1.0) (#5232, #5241, #5353, #5386, #5399, #6051, #6052).

We also optimized the usage of nn.aggr.MultiAggregation by fusing the computation of multiple aggregations together (see here for more details) (#6036, #6040).

Here are some benchmarking results on PyTorch 1.12 (summed over 1000 runs):

Aggregators	Vanilla	Fusion
`[sum, mean]`	0.3325s	0.1996s
`[sum, mean, min, max]`	0.7139s	0.5037s
`[sum, mean, var]`	0.6849s	0.3871s
`[sum, mean, var, std]`	1.0955s	0.3973s

Lastly, we have incorporated "fused" GNN operators via the dgNN package, starting with a FusedGATConv implementation (#5140).

Community Sprint: Type Hints and TorchScript Support

We are running regular community sprints to get our community more involved in building PyG. Whether you are just beginning to use graph learning or have been leveraging GNNs in research or production, the community sprints welcome members of all levels with different types of projects.

We had our first community sprint on 10/12 to fully-incorporate type hints and TorchScript support over the entire code base. The goal was to improve usability and cleanliness of our codebase. We had 20 contributors participating, contributing to 120 type hints within 2 weeks, adding around 2400 lines of code (#5842, #5603, #5659, #5664, #5665, #5666, #5667, #5668, #5669, #5673, #5675, #5673, #5678, #5682, #5683, #5684, #5685, #5687, #5688, #5695, #5699, #5701, #5702, #5703, #5706, #5707, #5710, #5714, #5715, #5716, #5722, #5724, #5725, #5726, #5729, #5730, #5731, #5732, #5733, #5743, #5734, #5735, #5736, #5737, #5738, #5747, #5752, #5753, #5754, #5756, #5757, #5758, #5760, #5766, #5767, #5768, #5781, #5778, #5797, #5798, #5799, #5800, #5806, #5810, #5811, #5828, #5847, #5851, #5852).

Explainability

Our second community sprint began on 11/15 with the goal to improve the explainability capabilities of PyG. With this, we introduce the torch_geometric.explain module to provide a unified set of tools to explain the predictions of a PyG model or to explain the underlying phenomenon of a dataset.

Some of the features developed in the sprint are incorporated into this release:

Added the torch_geometric.explain module (#5804, #6054, #6089)
Moved and adapted the GNNExplainer module to torch_geometric.explain (#5967, #6065). See here and here for the accompanying examples.
Extended GNNExplainer to support edge level explanations (#6056)
Added explainability support for heterogeneous GNNs via to_captum_model and to_captum_input (#5886, #5934)

data = HeteroData(...)
model = HeteroGNN(...)

# Explain predictions on heterogenenous graphs for output node 10:
captum_model = to_captum_model(model, mask_type, output_idx, metadata)
inputs, additional_forward_args = to_captum_input(data.x_dict, data.edge_index_dict, mask_type)

ig = IntegratedGradients(captum_model)
ig_attr = ig.attribute(
    inputs=inputs,
    target=int(y[output_idx]),
    additional_forward_args=additional_forward_args,
    internal_batch_size=1,
)

Breaking Changes

Renamed drop_unconnected_nodes to drop_unconnected_node_types and drop_orig_edges to drop_orig_edge_types in AddMetapaths (#5490)

Deprecations

The usage of nn.models.GNNExplainer is now deprecated in favor of explain.GNNExplainer
The usage of utils.dropout_adj is now deprecated in favor of utils.dropout_edge
The usage of loader.RandomNodeSampler is now deprecated in favor of loader.RandomNodeLoader
The usage of to_captum is now deprecated in favor of to_captum_model.

Features

Layers, Models and Examples

Added a "Link Prediction on MovieLens" Colab notebook (#5823)
Added a bipartite link-prediction example (#5834)
Added the SSGConv layer (#5599)
Added the WLConvContinuous layer for performing WL-refinement with continuous attributes (#5316)
Added the PositionalEncoding module (#5381)
Added a node classification example instrumented with Weights and Biases (#5192)

Data Loaders

Added support for triplet sampling in LinkNeighborLoader (#6004)
Added temporal_strategy = uniform/last option to NeighborLoader and LinkNeighborLoader (#5576)
Added a disjoint option to NeighborLoader and LinkNeighborLoader (#5717, #5775)
Added HeteroData support in RandomNodeLoader (#6007
Added int32-based edge_index support in NeighborLoader (#5948)
Added support for input_time in NeighborLoader (#5763)
Added np.memmap support in NeighborLoader (#5696)
Added CPU affinitization support to NeighborLoader (#6005)

Transformations

Added a FeaturePropagation transform (#5387)
Added IndexToMask and MaskToIndex transforms (#5375, #5455)
Added shuffle_node, mask_feature and add_random_edge augmentations (#5548)
Added dropout_node, dropout_edge and dropout_path augmentations (#5481, #5495, #5531)
Added a AddRandomMetaPaths transform that adds edges based on random walks along a metapath (#5397)
Added a utils.to_smiles function (#6038)
Added HeteroData support for transforms.Constant (#5700)

Datasets

Added the LRGBDataset to include 5 datasets from the Long Range Graph Benchmark (#5935)
Added the HydroNet water cluster dataset (#5537, #5902, #5903)
Added the DGraphFin dynamic graph dataset (#5504)
Added the official splits to the MalNetTiny dataset (#5078)
Added a print_summary method to torch_geometric.data.Dataset (#5438)

General Improvements

Added training and inference benchmark scripts (#5774, #5830, #5878, #5293, #5341, #5242, #5258, #5881, #5254)
Added the utils.assortativity function to compute the degree assortativity coefficient (#5587)
Add support for filling labels with dummy values in HeteroData.to_homogeneous() (#5540)
Added torch.onnx.export support (see here for an example) (#5877, #5997)
Added option to make normalization coefficients trainable in PNAConv (#6039)
Added a semi_grad option in VarAggregation and StdAggregation (#6042)
Added a warning for invalid node and edge type names in HeteroData (#5990)
Added lr_scheduler_solver and customized lr_scheduler classes (#5942)
Added to_fixed_size graph transformer (#5939)
Added support for symbolic tracing in the SchNet model (#5938)
Added support for customizing the interaction graph in the SchNet model (#5919)
Added SparseTensor support to SuperGATConv (#5888)
Added TorchScript support for AttentiveFP (#5868)
Added a return_semantic_attention_weights argument HANConv (#5787)
Added temperature value customization in dense_mincut_pool (#5908)
Added support for a tuple of in_channels in GENConv for bipartite message passing (#5627, #5641)
Added Aggregation.set_validate_args option to skip validation of dim_size (#5290)
Added BaseStorage.get() functionality (#5240)
Added support for batches of size one in BatchNorm (#5530, #5614)
The AttentionalAggregation module can now be applied to compute attention on a per-feature level (#5449)
Added TorchScript support to ASAPooling (#5395)
Updated the unsupervised GraphSAGE example to leverage LinkNeighborLoader (#5317)
Added better out-of-bounds error message in MessagePassing (#5339)
Added support to customize the activation function in PNAConv (#5262)

Bugfixes

Fixed a bug in TUDataset, in which node features were wrongly constructed whenever node_attributes only hold a single feature (e.g., in PROTEINS) (#5441)
Fixed a bug in the VirtualNode transform, in which node features were mistakenly treated as edge features (#5819)
Fixed a bug when applying several scalers with PNAConv (#5514)
Fixed setter and getter handling in BaseStorage (#5815)
Fixed the auto_select_device routine in GraphGym for pytorch_lightning>=1.7 (#5677)
Fixed RandomLinkSplit in case there aren't enough negative edges to sample (#5642)
Fixed the in-place modification to mode_kwargs in MultiAggregation (#5601)
Fixed the utils.to_dense_adj routine in case edge_index is empty (#5476)
Fixed the PointTransformerConv to now correctly use sum aggregation (#5332)
Fixed the output of InMemoryDataset.num_classes in case a transform modifies data.y (#5274)
Fail gracefully on GLIBC errors within torch-spline-conv (#5276)

Full Changelog

Added

Extended GNNExplainer to support edge level explanations (#6056)
Added CPU affinitization for NodeLoader (#6005)
Added triplet sampling in LinkNeighborLoader (#6004)
Added FusedAggregation of simple scatter reductions (#6036)
Added a to_smiles function (#6038)
Added option to make normalization coefficients trainable in PNAConv (#6039)
Added semi_grad option in VarAggregation and StdAggregation (#6042)
Allow for fused aggregations in MultiAggregation (#6036, #6040)
Added HeteroData support for to_captum_model and added to_captum_input (#5934)
Added HeteroData support in RandomNodeLoader (#6007)
Added bipartite GraphSAGE example (#5834)
Added LRGBDataset to include 5 datasets from the Long Range Graph Benchmark (#5935)
Added a warning for invalid node and edge type names in HeteroData (#5990)
Added PyTorch 1.13 support (#5975)
Added int32 support in NeighborLoader (#5948)
Add dgNN support and FusedGATConv implementation (#5140)
Added lr_scheduler_solver and customized lr_scheduler classes (#5942)
Add to_fixed_size graph transformer (#5939)
Add support for symbolic tracing of SchNet model (#5938)
Add support for customizable interaction graph in SchNet model (#5919)
Started adding torch.sparse support to PyG (#5906, #5944, #6003)
Added HydroNet water cluster dataset (#5537, #5902, #5903)
Added explainability support for heterogeneous GNNs (#5886)
Added SparseTensor support to SuperGATConv (#5888)
Added TorchScript support for AttentiveFP (#5868)
Added num_steps argument to training and inference benchmarks (#5898)
Added torch.onnx.export support (#5877, #5997)
Enable VTune ITT in inference and training benchmarks (#5830, #5878)
Add training benchmark (#5774)
Added a "Link Prediction on MovieLens" Colab notebook (#5823)
Added custom sampler support in LightningDataModule (#5820)
Added a return_semantic_attention_weights argument HANConv (#5787)
Added disjoint argument to NeighborLoader and LinkNeighborLoader (#5775)
Added support for input_time in NeighborLoader (#5763)
Added disjoint mode for temporal LinkNeighborLoader (#5717)
Added HeteroData support for transforms.Constant (#5700)
Added np.memmap support in NeighborLoader (#5696)
Added assortativity that computes degree assortativity coefficient (#5587)
Added SSGConv layer (#5599)
Added shuffle_node, mask_feature and add_random_edge augmentation methdos (#5548)
Added dropout_path augmentation that drops edges from a graph based on random walks (#5531)
Add support for filling labels with dummy values in HeteroData.to_homogeneous() (#5540)
Added temporal_strategy option to neighbor_sample (#5576)
Added torch_geometric.sampler package to docs (#5563)
Added the DGraphFin dynamic graph dataset (#5504)
Added dropout_edge augmentation that randomly drops edges from a graph - the usage of dropout_adj is now deprecated (#5495)
Added dropout_node augmentation that randomly drops nodes from a graph (#5481)
Added AddRandomMetaPaths that adds edges based on random walks along a metapath (#5397)
Added WLConvContinuous for performing WL refinement with continuous attributes (#5316)
Added print_summary method for the torch_geometric.data.Dataset interface (#5438)
Added sampler support to LightningDataModule (#5456, #5457)
Added official splits to MalNetTiny dataset (#5078)
Added IndexToMask and MaskToIndex transforms (#5375, #5455)
Added FeaturePropagation transform (#5387)
Added PositionalEncoding (#5381)
Consolidated sampler routines behind torch_geometric.sampler, enabling ease of extensibility in the future (#5312, #5365, #5402, #5404), #5418)
Added pyg-lib neighbor sampling (#5384, #5388)
Added pyg_lib.segment_matmul integration within HeteroLinear (#5330, #5347))
Enabled bf16 support in benchmark scripts (#5293, #5341)
Added Aggregation.set_validate_args option to skip validation of dim_size (#5290)
Added SparseTensor support to inference and training benchmark suite (#5242, #5258, #5881)
Added experimental mode in inference benchmarks (#5254)
Added node classification example instrumented with Weights and Biases (W&B) logging and W&B Sweeps (#5192)
Added experimental mode for utils.scatter (#5232, #5241, #5386)
Added missing test labels in HGBDataset (#5233)
Added BaseStorage.get() functionality (#5240)
Added a test to confirm that to_hetero works with SparseTensor (#5222)
Added torch_geometric.explain module with base functionality for explainability methods (#5804, #6054, #6089)

Changed

Moved and adapted GNNExplainer from torch_geometric.nn to torch_geometric.explain.algorithm (#5967, #6065)
Optimized scatter implementations for CPU/GPU, both with and without backward computation (#6051, #6052)
Support temperature value in dense_mincut_pool (#5908)
Fixed a bug in which VirtualNode mistakenly treated node features as edge features (#5819)
Fixed setter and getter handling in BaseStorage (#5815)
Fixed path in hetero_conv_dblp.py example (#5686)
Fix auto_select_device routine in GraphGym for PyTorch Lightning>=1.7 (#5677)
Support in_channels with tuple in GENConv for bipartite message passing (#5627, #5641)
Handle cases of not having enough possible negative edges in RandomLinkSplit (#5642)
Fix RGCN+pyg-lib for LongTensor input (#5610)
Improved type hint support (#5842, #5603, #5659, #5664, #5665, #5666, #5667, #5668, #5669, #5673, #5675, #5673, #5678, #5682, #5683, #5684, #5685, #5687, #5688, #5695, #5699, #5701, #5702, #5703, #5706, #5707, #5710, #5714, #5715, #5716, #5722, #5724, #5725, #5726, #5729, #5730, #5731, #5732, #5733, #5743, #5734, #5735, #5736, #5737, #5738, #5747, #5752, #5753, #5754, #5756, #5757, #5758, #5760, #5766, #5767, #5768), #5781, #5778, #5797, #5798, #5799, #5800, #5806, #5810, #5811, #5828, #5847, #5851, #5852)
Avoid modifying mode_kwargs in MultiAggregation (#5601)
Changed BatchNorm to allow for batches of size one during training (#5530, #5614)
Integrated better temporal sampling support by requiring that local neighborhoods are sorted according to time (#5516, #5602)
Fixed a bug when applying several scalers with PNAConv (#5514)
Allow . in ParameterDict key names (#5494)
Renamed drop_unconnected_nodes to drop_unconnected_node_types and drop_orig_edges to drop_orig_edge_types in AddMetapaths (#5490)
Improved utils.scatter performance by explicitly choosing better implementation for add and mean reduction (#5399)
Fix to_dense_adj with empty edge_index (#5476)
The AttentionalAggregation module can now be applied to compute attentin on a per-feature level (#5449)
Ensure equal lenghts of num_neighbors across edge types in NeighborLoader (#5444)
Fixed a bug in TUDataset in which node features were wrongly constructed whenever node_attributes only hold a single feature (e.g., in PROTEINS) (#5441)
Breaking change: removed num_neighbors as an attribute of loader (#5404)
ASAPooling is now jittable (#5395)
Updated unsupervised GraphSAGE example to leverage LinkNeighborLoader (#5317)
Replace in-place operations with out-of-place ones to align with torch.scatter_reduce API (#5353)
Breaking bugfix: PointTransformerConv now correctly uses sum aggregation (#5332)
Improve out-of-bounds error message in MessagePassing (#5339)
Allow file names of a Dataset to be specified as either property and method (#5338)
Fixed separating a list of SparseTensor within InMemoryDataset (#5299)
Improved name resolving of normalization layers (#5277)
Fail gracefully on GLIBC errors within torch-spline-conv (#5276)
Fixed Dataset.num_classes in case a transform modifies data.y (#5274)
Allow customization of the activation function within PNAConv (#5262)
Do not fill InMemoryDataset cache on dataset.num_features (#5264)
Changed tests relying on dblp datasets to instead use synthetic data (#5250)
Fixed a bug for the initialization of activation function examples in custom_graphgym (#5243)
Allow any integer tensors when checking edge_index input to message passing (5281)

Removed

Removed scatter_reduce option from experimental mode (#5399)

Full commit list: 2.1.0...2.2.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

PyG 2.2.0: Accelerations and Scalability

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Highlights

`pyg-lib` Integration

`GraphStore` and `FeatureStore` Abstractions

Optimized and Fused Aggregations

Community Sprint: Type Hints and TorchScript Support

Explainability

Breaking Changes

Deprecations

Features

Layers, Models and Examples

Data Loaders

Transformations

Datasets

General Improvements

Bugfixes

Full Changelog

Uh oh!

PyG 2.2.0: Accelerations and Scalability

Highlights

pyg-lib Integration

GraphStore and FeatureStore Abstractions

Optimized and Fused Aggregations

Community Sprint: Type Hints and TorchScript Support

Explainability

Breaking Changes

Deprecations

Features

Layers, Models and Examples

Data Loaders

Transformations

Datasets

General Improvements

Bugfixes

Full Changelog

Uh oh!

`pyg-lib` Integration

`GraphStore` and `FeatureStore` Abstractions