PyG 2.2.0: Accelerations and Scalability
We are excited to announce the release of PyG 2.2 πππ
PyG 2.2 is the culmination of work from 78 contributors who have worked on features and bug-fixes for a total of over 320 commits since torch-geometric==2.1.0.
Highlights
pyg-lib Integration
We are proud to release and integrate pyg-lib==0.1.0 into PyG, the first stable version of our new low-level Graph Neural Network library to drive all CPU and GPU acceleration needs of PyG (#5330, #5347, #5384, #5388).
You can install pyg-lib as described in our README.md:
pip install pyg-lib -f https://data.pyg.org/whl/torch-${TORCH}+${CUDA}.html
import pyg_libOnce pyg-lib is installed, it will get automatically picked up by PyG, e.g., to accelerate neighborhood sampling routines or to accelerate heterogeneous GNN execution:
pyg-libprovides fast and optimized CPU routines to iteratively sample neighbors in homogeneous and heterogeneous graphs, and heavily improves upon the previously used neighborhood sampling techniques utilized in PyG.
pyg-libprovides efficient GPU-based routines to parallelize workloads in heterogeneous graphs across different node types and edge types. We achieve this by leveraging type-dependent transformations via NVIDIA CUTLASS integration, which is flexible to implement most heterogeneous GNNs with, and efficient, even for sparse edge types or a large number of different node types.
GraphStore and FeatureStore Abstractions
PyG 2.2 includes numerous primitives to easily integrate with simple paradigms for scalable graph machine learning, enabling users to train GNNs on graphs far larger than the size of their machine's available memory. It does so by introducing simple, easy-to-use, and extensible abstractions of a FeatureStore and a GraphStore that plug directly into existing familiar PyG interfaces (see here for the accompanying tutorial).
feature_store = CustomFeatureStore()
feature_store['paper', 'x', None] = ... # Add paper features
feature_store['author', 'x', None] = ... # Add author features
graph_store = CustomGraphStore()
graph_store['edge', 'coo'] = ... # Add edges in "COO" format
# `CustomGraphSampler` knows how to sample on `CustomGraphStore`:
graph_sampler = CustomGraphSampler(
graph_store=graph_store,
num_neighbors=[10, 20],
...
)
from torch_geometric.loader import NodeLoader
loader = NodeLoader(
data=(feature_store, graph_store),
node_sampler=graph_sampler,
batch_size=20,
input_nodes='paper',
)
for batch in loader:
passData loading and sampling routines are refactored and decomposed into torch_geometric.loader and torch_geometric.sampler modules, respectively (#5563, #5820, #5456, #5457, #5312, #5365, #5402, #5404, #5418).
Optimized and Fused Aggregations
PyG 2.2 further accelerates scatter aggregations based on CPU/GPU and with/without backward computation paths (requires torch>=1.12.0 and torch-scatter>=2.1.0) (#5232, #5241, #5353, #5386, #5399, #6051, #6052).
We also optimized the usage of nn.aggr.MultiAggregation by fusing the computation of multiple aggregations together (see here for more details) (#6036, #6040).
Here are some benchmarking results on PyTorch 1.12 (summed over 1000 runs):
| Aggregators | Vanilla | Fusion |
|---|---|---|
[sum, mean] |
0.3325s | 0.1996s |
[sum, mean, min, max] |
0.7139s | 0.5037s |
[sum, mean, var] |
0.6849s | 0.3871s |
[sum, mean, var, std] |
1.0955s | 0.3973s |
Lastly, we have incorporated "fused" GNN operators via the dgNN package, starting with a FusedGATConv implementation (#5140).
Community Sprint: Type Hints and TorchScript Support
We are running regular community sprints to get our community more involved in building PyG. Whether you are just beginning to use graph learning or have been leveraging GNNs in research or production, the community sprints welcome members of all levels with different types of projects.
We had our first community sprint on 10/12 to fully-incorporate type hints and TorchScript support over the entire code base. The goal was to improve usability and cleanliness of our codebase. We had 20 contributors participating, contributing to 120 type hints within 2 weeks, adding around 2400 lines of code (#5842, #5603, #5659, #5664, #5665, #5666, #5667, #5668, #5669, #5673, #5675, #5673, #5678, #5682, #5683, #5684, #5685, #5687, #5688, #5695, #5699, #5701, #5702, #5703, #5706, #5707, #5710, #5714, #5715, #5716, #5722, #5724, #5725, #5726, #5729, #5730, #5731, #5732, #5733, #5743, #5734, #5735, #5736, #5737, #5738, #5747, #5752, #5753, #5754, #5756, #5757, #5758, #5760, #5766, #5767, #5768, #5781, #5778, #5797, #5798, #5799, #5800, #5806, #5810, #5811, #5828, #5847, #5851, #5852).
Explainability
Our second community sprint began on 11/15 with the goal to improve the explainability capabilities of PyG. With this, we introduce the torch_geometric.explain module to provide a unified set of tools to explain the predictions of a PyG model or to explain the underlying phenomenon of a dataset.
Some of the features developed in the sprint are incorporated into this release:
- Added the
torch_geometric.explainmodule (#5804, #6054, #6089) - Moved and adapted the
GNNExplainermodule totorch_geometric.explain(#5967, #6065). See here and here for the accompanying examples. - Extended
GNNExplainerto support edge level explanations (#6056) - Added explainability support for heterogeneous GNNs via
to_captum_modelandto_captum_input(#5886, #5934)
data = HeteroData(...)
model = HeteroGNN(...)
# Explain predictions on heterogenenous graphs for output node 10:
captum_model = to_captum_model(model, mask_type, output_idx, metadata)
inputs, additional_forward_args = to_captum_input(data.x_dict, data.edge_index_dict, mask_type)
ig = IntegratedGradients(captum_model)
ig_attr = ig.attribute(
inputs=inputs,
target=int(y[output_idx]),
additional_forward_args=additional_forward_args,
internal_batch_size=1,
)Breaking Changes
- Renamed
drop_unconnected_nodestodrop_unconnected_node_typesanddrop_orig_edgestodrop_orig_edge_typesinAddMetapaths(#5490)
Deprecations
- The usage of
nn.models.GNNExplaineris now deprecated in favor ofexplain.GNNExplainer - The usage of
utils.dropout_adjis now deprecated in favor ofutils.dropout_edge - The usage of
loader.RandomNodeSampleris now deprecated in favor ofloader.RandomNodeLoader - The usage of
to_captumis now deprecated in favor ofto_captum_model.
Features
Layers, Models and Examples
- Added a "Link Prediction on MovieLens" Colab notebook (#5823)
- Added a bipartite link-prediction example (#5834)
- Added the
SSGConvlayer (#5599) - Added the
WLConvContinuouslayer for performing WL-refinement with continuous attributes (#5316) - Added the
PositionalEncodingmodule (#5381) - Added a node classification example instrumented with Weights and Biases (#5192)
Data Loaders
- Added support for triplet sampling in
LinkNeighborLoader(#6004) - Added
temporal_strategy = uniform/lastoption toNeighborLoaderandLinkNeighborLoader(#5576) - Added a
disjointoption toNeighborLoaderandLinkNeighborLoader(#5717, #5775) - Added
HeteroDatasupport inRandomNodeLoader(#6007 - Added
int32-basededge_indexsupport inNeighborLoader(#5948) - Added support for
input_timeinNeighborLoader(#5763) - Added
np.memmapsupport inNeighborLoader(#5696) - Added CPU affinitization support to
NeighborLoader(#6005)
Transformations
- Added a
FeaturePropagationtransform (#5387) - Added
IndexToMaskandMaskToIndextransforms (#5375, #5455) - Added
shuffle_node,mask_featureandadd_random_edgeaugmentations (#5548) - Added
dropout_node,dropout_edgeanddropout_pathaugmentations (#5481, #5495, #5531) - Added a
AddRandomMetaPathstransform that adds edges based on random walks along a metapath (#5397) - Added a
utils.to_smilesfunction (#6038) - Added
HeteroDatasupport fortransforms.Constant(#5700)
Datasets
- Added the
LRGBDatasetto include 5 datasets from the Long Range Graph Benchmark (#5935) - Added the
HydroNetwater cluster dataset (#5537, #5902, #5903) - Added the
DGraphFindynamic graph dataset (#5504) - Added the official splits to the
MalNetTinydataset (#5078) - Added a
print_summarymethod totorch_geometric.data.Dataset(#5438)
General Improvements
- Added training and inference benchmark scripts (#5774, #5830, #5878, #5293, #5341, #5242, #5258, #5881, #5254)
- Added the
utils.assortativityfunction to compute the degree assortativity coefficient (#5587) - Add support for filling labels with dummy values in
HeteroData.to_homogeneous()(#5540) - Added
torch.onnx.exportsupport (see here for an example) (#5877, #5997) - Added option to make normalization coefficients trainable in
PNAConv(#6039) - Added a
semi_gradoption inVarAggregationandStdAggregation(#6042) - Added a warning for invalid node and edge type names in
HeteroData(#5990) - Added
lr_scheduler_solverand customizedlr_schedulerclasses (#5942) - Added
to_fixed_sizegraph transformer (#5939) - Added support for symbolic tracing in the
SchNetmodel (#5938) - Added support for customizing the interaction graph in the
SchNetmodel (#5919) - Added
SparseTensorsupport toSuperGATConv(#5888) - Added TorchScript support for
AttentiveFP(#5868) - Added a
return_semantic_attention_weightsargumentHANConv(#5787) - Added temperature value customization in
dense_mincut_pool(#5908) - Added support for a tuple of
in_channelsinGENConvfor bipartite message passing (#5627, #5641) - Added
Aggregation.set_validate_argsoption to skip validation ofdim_size(#5290) - Added
BaseStorage.get()functionality (#5240) - Added support for batches of size one in
BatchNorm(#5530, #5614) - The
AttentionalAggregationmodule can now be applied to compute attention on a per-feature level (#5449) - Added TorchScript support to
ASAPooling(#5395) - Updated the unsupervised
GraphSAGEexample to leverageLinkNeighborLoader(#5317) - Added better out-of-bounds error message in
MessagePassing(#5339) - Added support to customize the activation function in
PNAConv(#5262)
Bugfixes
- Fixed a bug in
TUDataset, in which node features were wrongly constructed whenevernode_attributesonly hold a single feature (e.g., inPROTEINS) (#5441) - Fixed a bug in the
VirtualNodetransform, in which node features were mistakenly treated as edge features (#5819) - Fixed a bug when applying several scalers with
PNAConv(#5514) - Fixed
setterandgetterhandling inBaseStorage(#5815) - Fixed the
auto_select_deviceroutine in GraphGym forpytorch_lightning>=1.7(#5677) - Fixed
RandomLinkSplitin case there aren't enough negative edges to sample (#5642) - Fixed the in-place modification to
mode_kwargsinMultiAggregation(#5601) - Fixed the
utils.to_dense_adjroutine in caseedge_indexis empty (#5476) - Fixed the
PointTransformerConvto now correctly usesumaggregation (#5332) - Fixed the output of
InMemoryDataset.num_classesin case atransformmodifiesdata.y(#5274) - Fail gracefully on
GLIBCerrors withintorch-spline-conv(#5276)
Full Changelog
Added
- Extended
GNNExplainerto support edge level explanations (#6056) - Added CPU affinitization for
NodeLoader(#6005) - Added triplet sampling in
LinkNeighborLoader(#6004) - Added
FusedAggregationof simple scatter reductions (#6036) - Added a
to_smilesfunction (#6038) - Added option to make normalization coefficients trainable in
PNAConv(#6039) - Added
semi_gradoption inVarAggregationandStdAggregation(#6042) - Allow for fused aggregations in
MultiAggregation(#6036, #6040) - Added
HeteroDatasupport forto_captum_modeland addedto_captum_input(#5934) - Added
HeteroDatasupport inRandomNodeLoader(#6007) - Added bipartite
GraphSAGEexample (#5834) - Added
LRGBDatasetto include 5 datasets from the Long Range Graph Benchmark (#5935) - Added a warning for invalid node and edge type names in
HeteroData(#5990) - Added PyTorch 1.13 support (#5975)
- Added
int32support inNeighborLoader(#5948) - Add
dgNNsupport andFusedGATConvimplementation (#5140) - Added
lr_scheduler_solverand customizedlr_schedulerclasses (#5942) - Add
to_fixed_sizegraph transformer (#5939) - Add support for symbolic tracing of
SchNetmodel (#5938) - Add support for customizable interaction graph in
SchNetmodel (#5919) - Started adding
torch.sparsesupport to PyG (#5906, #5944, #6003) - Added
HydroNetwater cluster dataset (#5537, #5902, #5903) - Added explainability support for heterogeneous GNNs (#5886)
- Added
SparseTensorsupport toSuperGATConv(#5888) - Added TorchScript support for
AttentiveFP(#5868) - Added
num_stepsargument to training and inference benchmarks (#5898) - Added
torch.onnx.exportsupport (#5877, #5997) - Enable VTune ITT in inference and training benchmarks (#5830, #5878)
- Add training benchmark (#5774)
- Added a "Link Prediction on MovieLens" Colab notebook (#5823)
- Added custom
samplersupport inLightningDataModule(#5820) - Added a
return_semantic_attention_weightsargumentHANConv(#5787) - Added
disjointargument toNeighborLoaderandLinkNeighborLoader(#5775) - Added support for
input_timeinNeighborLoader(#5763) - Added
disjointmode for temporalLinkNeighborLoader(#5717) - Added
HeteroDatasupport fortransforms.Constant(#5700) - Added
np.memmapsupport inNeighborLoader(#5696) - Added
assortativitythat computes degree assortativity coefficient (#5587) - Added
SSGConvlayer (#5599) - Added
shuffle_node,mask_featureandadd_random_edgeaugmentation methdos (#5548) - Added
dropout_pathaugmentation that drops edges from a graph based on random walks (#5531) - Add support for filling labels with dummy values in
HeteroData.to_homogeneous()(#5540) - Added
temporal_strategyoption toneighbor_sample(#5576) - Added
torch_geometric.samplerpackage to docs (#5563) - Added the
DGraphFindynamic graph dataset (#5504) - Added
dropout_edgeaugmentation that randomly drops edges from a graph - the usage ofdropout_adjis now deprecated (#5495) - Added
dropout_nodeaugmentation that randomly drops nodes from a graph (#5481) - Added
AddRandomMetaPathsthat adds edges based on random walks along a metapath (#5397) - Added
WLConvContinuousfor performing WL refinement with continuous attributes (#5316) - Added
print_summarymethod for thetorch_geometric.data.Datasetinterface (#5438) - Added
samplersupport toLightningDataModule(#5456, #5457) - Added official splits to
MalNetTinydataset (#5078) - Added
IndexToMaskandMaskToIndextransforms (#5375, #5455) - Added
FeaturePropagationtransform (#5387) - Added
PositionalEncoding(#5381) - Consolidated sampler routines behind
torch_geometric.sampler, enabling ease of extensibility in the future (#5312, #5365, #5402, #5404), #5418) - Added
pyg-libneighbor sampling (#5384, #5388) - Added
pyg_lib.segment_matmulintegration withinHeteroLinear(#5330, #5347)) - Enabled
bf16support in benchmark scripts (#5293, #5341) - Added
Aggregation.set_validate_argsoption to skip validation ofdim_size(#5290) - Added
SparseTensorsupport to inference and training benchmark suite (#5242, #5258, #5881) - Added experimental mode in inference benchmarks (#5254)
- Added node classification example instrumented with Weights and Biases (W&B) logging and W&B Sweeps (#5192)
- Added experimental mode for
utils.scatter(#5232, #5241, #5386) - Added missing test labels in
HGBDataset(#5233) - Added
BaseStorage.get()functionality (#5240) - Added a test to confirm that
to_heteroworks withSparseTensor(#5222) - Added
torch_geometric.explainmodule with base functionality for explainability methods (#5804, #6054, #6089)
Changed
- Moved and adapted
GNNExplainerfromtorch_geometric.nntotorch_geometric.explain.algorithm(#5967, #6065) - Optimized scatter implementations for CPU/GPU, both with and without backward computation (#6051, #6052)
- Support temperature value in
dense_mincut_pool(#5908) - Fixed a bug in which
VirtualNodemistakenly treated node features as edge features (#5819) - Fixed
setterandgetterhandling inBaseStorage(#5815) - Fixed
pathinhetero_conv_dblp.pyexample (#5686) - Fix
auto_select_deviceroutine in GraphGym for PyTorch Lightning>=1.7 (#5677) - Support
in_channelswithtupleinGENConvfor bipartite message passing (#5627, #5641) - Handle cases of not having enough possible negative edges in
RandomLinkSplit(#5642) - Fix
RGCN+pyg-libforLongTensorinput (#5610) - Improved type hint support (#5842, #5603, #5659, #5664, #5665, #5666, #5667, #5668, #5669, #5673, #5675, #5673, #5678, #5682, #5683, #5684, #5685, #5687, #5688, #5695, #5699, #5701, #5702, #5703, #5706, #5707, #5710, #5714, #5715, #5716, #5722, #5724, #5725, #5726, #5729, #5730, #5731, #5732, #5733, #5743, #5734, #5735, #5736, #5737, #5738, #5747, #5752, #5753, #5754, #5756, #5757, #5758, #5760, #5766, #5767, #5768), #5781, #5778, #5797, #5798, #5799, #5800, #5806, #5810, #5811, #5828, #5847, #5851, #5852)
- Avoid modifying
mode_kwargsinMultiAggregation(#5601) - Changed
BatchNormto allow for batches of size one during training (#5530, #5614) - Integrated better temporal sampling support by requiring that local neighborhoods are sorted according to time (#5516, #5602)
- Fixed a bug when applying several scalers with
PNAConv(#5514) - Allow
.inParameterDictkey names (#5494) - Renamed
drop_unconnected_nodestodrop_unconnected_node_typesanddrop_orig_edgestodrop_orig_edge_typesinAddMetapaths(#5490) - Improved
utils.scatterperformance by explicitly choosing better implementation foraddandmeanreduction (#5399) - Fix
to_dense_adjwith emptyedge_index(#5476) - The
AttentionalAggregationmodule can now be applied to compute attentin on a per-feature level (#5449) - Ensure equal lenghts of
num_neighborsacross edge types inNeighborLoader(#5444) - Fixed a bug in
TUDatasetin which node features were wrongly constructed whenevernode_attributesonly hold a single feature (e.g., inPROTEINS) (#5441) - Breaking change: removed
num_neighborsas an attribute of loader (#5404) ASAPoolingis now jittable (#5395)- Updated unsupervised
GraphSAGEexample to leverageLinkNeighborLoader(#5317) - Replace in-place operations with out-of-place ones to align with
torch.scatter_reduceAPI (#5353) - Breaking bugfix:
PointTransformerConvnow correctly usessumaggregation (#5332) - Improve out-of-bounds error message in
MessagePassing(#5339) - Allow file names of a
Datasetto be specified as either property and method (#5338) - Fixed separating a list of
SparseTensorwithinInMemoryDataset(#5299) - Improved name resolving of normalization layers (#5277)
- Fail gracefully on
GLIBCerrors withintorch-spline-conv(#5276) - Fixed
Dataset.num_classesin case atransformmodifiesdata.y(#5274) - Allow customization of the activation function within
PNAConv(#5262) - Do not fill
InMemoryDatasetcache ondataset.num_features(#5264) - Changed tests relying on
dblpdatasets to instead use synthetic data (#5250) - Fixed a bug for the initialization of activation function examples in
custom_graphgym(#5243) - Allow any integer tensors when checking edge_index input to message passing (5281)
Removed
- Removed
scatter_reduceoption from experimental mode (#5399)
Full commit list: 2.1.0...2.2.0

