Skip to content

Commit fe337c5

Browse files
committed
[SE3Transformer/DGLPyT] 22.08 container update
1 parent 077c388 commit fe337c5

File tree

25 files changed

+497
-294
lines changed

25 files changed

+497
-294
lines changed

DGLPyTorch/DrugDiscovery/SE3Transformer/Dockerfile

Lines changed: 14 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@
2424
# run docker daemon with --default-runtime=nvidia for GPU detection during build
2525
# multistage build for DGL with CUDA and FP16
2626

27-
ARG FROM_IMAGE_NAME=nvcr.io/nvidia/pytorch:21.07-py3
27+
ARG FROM_IMAGE_NAME=nvcr.io/nvidia/pytorch:22.08-py3
2828

2929
FROM ${FROM_IMAGE_NAME} AS dgl_builder
3030

@@ -33,11 +33,19 @@ RUN apt-get update \
3333
&& apt-get install -y git build-essential python3-dev make cmake \
3434
&& rm -rf /var/lib/apt/lists/*
3535
WORKDIR /dgl
36-
RUN git clone --branch v0.7.0 --recurse-submodules --depth 1 https://github.com/dmlc/dgl.git .
37-
RUN sed -i 's/"35 50 60 70"/"60 70 80"/g' cmake/modules/CUDA.cmake
36+
RUN git clone --branch 0.9.0 --recurse-submodules --depth 1 https://github.com/dmlc/dgl.git .
3837
WORKDIR build
39-
RUN cmake -DUSE_CUDA=ON -DUSE_FP16=ON ..
40-
RUN make -j8
38+
RUN export NCCL_ROOT=/usr \
39+
&& cmake .. -GNinja -DCMAKE_BUILD_TYPE=Release \
40+
-DUSE_CUDA=ON -DCUDA_ARCH_BIN="60 70 80" -DCUDA_ARCH_PTX="80" \
41+
-DCUDA_ARCH_NAME="Manual" \
42+
-DUSE_FP16=ON \
43+
-DBUILD_TORCH=ON \
44+
-DUSE_NCCL=ON \
45+
-DUSE_SYSTEM_NCCL=ON \
46+
-DBUILD_WITH_SHARED_NCCL=ON \
47+
-DUSE_AVX=ON \
48+
&& cmake --build .
4149

4250

4351
FROM ${FROM_IMAGE_NAME}
@@ -49,6 +57,7 @@ COPY --from=dgl_builder /dgl ./dgl
4957
RUN cd dgl/python && python setup.py install && cd ../.. && rm -rf dgl
5058

5159
ADD requirements.txt .
60+
RUN pip install --no-cache-dir --upgrade --pre pip
5261
RUN pip install --no-cache-dir -r requirements.txt
5362
ADD . .
5463

DGLPyTorch/DrugDiscovery/SE3Transformer/LICENSE

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
Copyright 2021 NVIDIA CORPORATION & AFFILIATES
1+
Copyright 2021-2022 NVIDIA CORPORATION & AFFILIATES
22

33
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
44

DGLPyTorch/DrugDiscovery/SE3Transformer/NOTICE

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,4 @@
1+
12
SE(3)-Transformer PyTorch
23

34
This repository includes software from https://github.com/FabianFuchsML/se3-transformer-public

DGLPyTorch/DrugDiscovery/SE3Transformer/README.md

Lines changed: 51 additions & 47 deletions
Original file line numberDiff line numberDiff line change
@@ -161,11 +161,11 @@ Competitive training results and analysis are provided for the following hyperpa
161161

162162
This model supports the following features::
163163

164-
| Feature | SE(3)-Transformer
165-
|-----------------------|--------------------------
166-
|Automatic mixed precision (AMP) | Yes
167-
|Distributed data parallel (DDP) | Yes
168-
164+
| Feature | SE(3)-Transformer |
165+
|---------------------------------|-------------------|
166+
| Automatic mixed precision (AMP) | Yes |
167+
| Distributed data parallel (DDP) | Yes |
168+
169169
#### Features
170170

171171

@@ -476,20 +476,20 @@ The following sections provide details on how we achieved our performance and ac
476476
477477
Our results were obtained by running the `scripts/train.sh` training script in the PyTorch 21.07 NGC container on NVIDIA DGX A100 (8x A100 80GB) GPUs.
478478
479-
| GPUs | Batch size / GPU | Absolute error - TF32 | Absolute error - mixed precision | Time to train - TF32 | Time to train - mixed precision | Time to train speedup (mixed precision to TF32) |
480-
|:------------------:|:----------------------:|:--------------------:|:------------------------------------:|:---------------------------------:|:----------------------:|:----------------------------------------------:|
481-
| 1 | 240 | 0.03456 | 0.03460 | 1h23min | 1h03min | 1.32x |
482-
| 8 | 240 | 0.03417 | 0.03424 | 15min | 12min | 1.25x |
479+
| GPUs | Batch size / GPU | Absolute error - TF32 | Absolute error - mixed precision | Time to train - TF32 | Time to train - mixed precision | Time to train speedup (mixed precision to TF32) |
480+
|:----:|:----------------:|:---------------------:|:--------------------------------:|:--------------------:|:-------------------------------:|:-----------------------------------------------:|
481+
| 1 | 240 | 0.03038 | 0.02987 | 1h02min | 50min | 1.24x |
482+
| 8 | 240 | 0.03466 | 0.03436 | 13min | 10min | 1.27x |
483483
484484
485485
##### Training accuracy: NVIDIA DGX-1 (8x V100 16GB)
486486
487487
Our results were obtained by running the `scripts/train.sh` training script in the PyTorch 21.07 NGC container on NVIDIA DGX-1 with (8x V100 16GB) GPUs.
488488
489-
| GPUs | Batch size / GPU | Absolute error - FP32 | Absolute error - mixed precision | Time to train - FP32 | Time to train - mixed precision | Time to train speedup (mixed precision to FP32) |
490-
|:------------------:|:----------------------:|:--------------------:|:------------------------------------:|:---------------------------------:|:----------------------:|:----------------------------------------------:|
491-
| 1 | 240 | 0.03432 | 0.03439 | 2h25min | 1h33min | 1.56x |
492-
| 8 | 240 | 0.03380 | 0.03495 | 29min | 20min | 1.45x |
489+
| GPUs | Batch size / GPU | Absolute error - FP32 | Absolute error - mixed precision | Time to train - FP32 | Time to train - mixed precision | Time to train speedup (mixed precision to FP32) |
490+
|:----:|:----------------:|:---------------------:|:--------------------------------:|:--------------------:|:-------------------------------:|:-----------------------------------------------:|
491+
| 1 | 240 | 0.03044 | 0.03076 | 2h07min | 1h22min | 1.55x |
492+
| 8 | 240 | 0.03435 | 0.03495 | 27min | 19min | 1.42x |
493493
494494
495495
@@ -499,12 +499,12 @@ Our results were obtained by running the `scripts/train.sh` training script in t
499499
500500
Our results were obtained by running the `scripts/benchmark_train.sh` and `scripts/benchmark_train_multi_gpu.sh` benchmarking scripts in the PyTorch 21.07 NGC container on NVIDIA DGX A100 with 8x A100 80GB GPUs. Performance numbers (in molecules per millisecond) were averaged over five entire training epochs after a warmup epoch.
501501
502-
| GPUs | Batch size / GPU | Throughput - TF32 [mol/ms] | Throughput - mixed precision [mol/ms] | Throughput speedup (mixed precision - TF32) | Weak scaling - TF32 | Weak scaling - mixed precision |
503-
|:------------------:|:----------------------:|:--------------------:|:------------------------------------:|:---------------------------------:|:----------------------:|:----------------------------------------------:|
504-
| 1 | 240 | 2.21 | 2.92 | 1.32x | | |
505-
| 1 | 120 | 1.81 | 2.04 | 1.13x | | |
506-
| 8 | 240 | 15.88 | 21.02 | 1.32x | 7.18 | 7.20 |
507-
| 8 | 120 | 12.68 | 13.99 | 1.10x | 7.00 | 6.86 |
502+
| GPUs | Batch size / GPU | Throughput - TF32 [mol/ms] | Throughput - mixed precision [mol/ms] | Throughput speedup (mixed precision - TF32) | Weak scaling - TF32 | Weak scaling - mixed precision |
503+
|:----------------:|:-------------------:|:--------------------------:|:-------------------------------------:|:-------------------------------------------:|:-------------------:|:------------------------------:|
504+
| 1 | 240 | 2.61 | 3.35 | 1.28x | | |
505+
| 1 | 120 | 1.94 | 2.07 | 1.07x | | |
506+
| 8 | 240 | 18.80 | 23.90 | 1.27x | 7.20 | 7.13 |
507+
| 8 | 120 | 14.10 | 14.52 | 1.03x | 7.27 | 7.01 |
508508
509509
510510
To achieve these same results, follow the steps in the [Quick Start Guide](#quick-start-guide).
@@ -514,12 +514,12 @@ To achieve these same results, follow the steps in the [Quick Start Guide](#quic
514514
515515
Our results were obtained by running the `scripts/benchmark_train.sh` and `scripts/benchmark_train_multi_gpu.sh` benchmarking scripts in the PyTorch 21.07 NGC container on NVIDIA DGX-1 with 8x V100 16GB GPUs. Performance numbers (in molecules per millisecond) were averaged over five entire training epochs after a warmup epoch.
516516
517-
| GPUs | Batch size / GPU | Throughput - FP32 [mol/ms] | Throughput - mixed precision [mol/ms] | Throughput speedup (FP32 - mixed precision) | Weak scaling - FP32 | Weak scaling - mixed precision |
518-
|:------------------:|:----------------------:|:--------------------:|:------------------------------------:|:---------------------------------:|:----------------------:|:----------------------------------------------:|
519-
| 1 | 240 | 1.25 | 1.88 | 1.50x | | |
520-
| 1 | 120 | 1.03 | 1.41 | 1.37x | | |
521-
| 8 | 240 | 8.68 | 12.75 | 1.47x | 6.94 | 6.78 |
522-
| 8 | 120 | 6.64 | 8.58 | 1.29x | 6.44 | 6.08 |
517+
| GPUs | Batch size / GPU | Throughput - FP32 [mol/ms] | Throughput - mixed precision [mol/ms] | Throughput speedup (FP32 - mixed precision) | Weak scaling - FP32 | Weak scaling - mixed precision |
518+
|:----------------:|:--------------------:|:--------------------------:|:--------------------------------------:|:-------------------------------------------:|:-------------------:|:------------------------------:|
519+
| 1 | 240 | 1.33 | 2.12 | 1.59x | | |
520+
| 1 | 120 | 1.11 | 1.45 | 1.31x | | |
521+
| 8 | 240 | 9.32 | 13.40 | 1.44x | 7.01 | 6.32 |
522+
| 8 | 120 | 6.90 | 8.39 | 1.22x | 6.21 | 5.79 |
523523
524524
525525
To achieve these same results, follow the steps in the [Quick Start Guide](#quick-start-guide).
@@ -532,21 +532,21 @@ To achieve these same results, follow the steps in the [Quick Start Guide](#quic
532532
533533
Our results were obtained by running the `scripts/benchmark_inference.sh` inferencing benchmarking script in the PyTorch 21.07 NGC container on NVIDIA DGX A100 with 1x A100 80GB GPU.
534534
535-
FP16
535+
AMP
536536
537-
| Batch size | Throughput Avg [mol/ms] | Latency Avg [ms] | Latency 90% [ms] |Latency 95% [ms] |Latency 99% [ms] |
538-
|:------------:|:------:|:-----:|:-----:|:-----:|:-----:|
539-
| 1600 | 11.60 | 140.94 | 138.29 | 140.12 | 386.40 |
540-
| 800 | 10.74 | 75.69 | 75.74 | 76.50 | 79.77 |
541-
| 400 | 8.86 | 45.57 | 46.11 | 46.60 | 49.97 |
537+
| Batch size | Throughput Avg [mol/ms] | Latency Avg [ms] | Latency 90% [ms] | Latency 95% [ms] | Latency 99% [ms] |
538+
|:----------:|:-----------------------:|:----------------:|:----------------:|:----------------:|:----------------:|
539+
| 1600 | 13.54 | 121.44 | 118.07 | 119.00 | 366.64 |
540+
| 800 | 12.63 | 64.11 | 63.78 | 64.37 | 68.19 |
541+
| 400 | 10.65 | 37.97 | 39.02 | 39.67 | 42.87 |
542542
543543
TF32
544544
545-
| Batch size | Throughput Avg [mol/ms] | Latency Avg [ms] | Latency 90% [ms] |Latency 95% [ms] |Latency 99% [ms] |
546-
|:------------:|:------:|:-----:|:-----:|:-----:|:-----:|
547-
| 1600 | 8.58 | 189.20 | 186.39 | 187.71 | 420.28 |
548-
| 800 | 8.28 | 97.56 | 97.20 | 97.73 | 101.13 |
549-
| 400 | 7.55 | 53.38 | 53.72 | 54.48 | 56.62 |
545+
| Batch size | Throughput Avg [mol/ms] | Latency Avg [ms] | Latency 90% [ms] | Latency 95% [ms] | Latency 99% [ms] |
546+
|:----------:|:-----------------------:|:----------------:|:----------------:|:----------------:|:----------------:|
547+
| 1600 | 8.97 | 180.85 | 178.31 | 178.92 | 375.33 |
548+
| 800 | 8.86 | 90.76 | 90.77 | 91.11 | 92.96 |
549+
| 400 | 8.49 | 47.42 | 47.65 | 48.15 | 50.74 |
550550
551551
To achieve these same results, follow the steps in the [Quick Start Guide](#quick-start-guide).
552552
@@ -556,21 +556,21 @@ To achieve these same results, follow the steps in the [Quick Start Guide](#quic
556556
557557
Our results were obtained by running the `scripts/benchmark_inference.sh` inferencing benchmarking script in the PyTorch 21.07 NGC container on NVIDIA DGX-1 with 1x V100 16GB GPU.
558558
559-
FP16
559+
AMP
560560
561-
| Batch size | Throughput Avg [mol/ms] | Latency Avg [ms] | Latency 90% [ms] |Latency 95% [ms] |Latency 99% [ms] |
562-
|:------------:|:------:|:-----:|:-----:|:-----:|:-----:|
563-
| 1600 | 6.42 | 254.54 | 247.97 | 249.29 | 721.15 |
564-
| 800 | 6.13 | 132.07 | 131.90 | 132.70 | 140.15 |
565-
| 400 | 5.37 | 75.12 | 76.01 | 76.66 | 79.90 |
561+
| Batch size | Throughput Avg [mol/ms] | Latency Avg [ms] | Latency 90% [ms] | Latency 95% [ms] | Latency 99% [ms] |
562+
|:----------:|:-----------------------:|:----------------:|:----------------:|:----------------:|:----------------:|
563+
| 1600 | 6.59 | 248.02 | 242.11 | 242.62 | 674.60 |
564+
| 800 | 6.38 | 126.49 | 125.96 | 126.31 | 127.72 |
565+
| 400 | 5.90 | 68.24 | 68.53 | 69.02 | 70.87 |
566566
567567
FP32
568568
569-
| Batch size | Throughput Avg [mol/ms] | Latency Avg [ms] | Latency 90% [ms] |Latency 95% [ms] |Latency 99% [ms] |
570-
|:------------:|:------:|:-----:|:-----:|:-----:|:-----:|
571-
| 1600 | 3.39 | 475.86 | 473.82 | 475.64 | 891.18 |
572-
| 800 | 3.36 | 239.17 | 240.64 | 241.65 | 243.70 |
573-
| 400 | 3.17 | 126.67 | 128.19 | 128.82 | 130.54 |
569+
| Batch size | Throughput Avg [mol/ms] | Latency Avg [ms] | Latency 90% [ms] | Latency 95% [ms] | Latency 99% [ms] |
570+
|:----------:|:-----------------------:|:----------------:|:----------------:|:----------------:|:----------------:|
571+
| 1600 | 3.33 | 482.20 | 483.50 | 485.28 | 754.84 |
572+
| 800 | 3.35 | 239.09 | 242.21 | 243.13 | 244.91 |
573+
| 400 | 3.27 | 122.68 | 123.60 | 124.18 | 125.85 |
574574
575575
576576
To achieve these same results, follow the steps in the [Quick Start Guide](#quick-start-guide).
@@ -580,6 +580,10 @@ To achieve these same results, follow the steps in the [Quick Start Guide](#quic
580580
581581
### Changelog
582582
583+
August 2022:
584+
- Slight performance improvements
585+
- Upgraded base container
586+
583587
November 2021:
584588
- Improved low memory mode to give further 6x memory savings
585589
- Disabled W&B logging by default

DGLPyTorch/DrugDiscovery/SE3Transformer/se3_transformer/data_loading/data_module.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# Copyright (c) 2021, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
1+
# Copyright (c) 2021-2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
22
#
33
# Permission is hereby granted, free of charge, to any person obtaining a
44
# copy of this software and associated documentation files (the "Software"),
@@ -18,7 +18,7 @@
1818
# FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
1919
# DEALINGS IN THE SOFTWARE.
2020
#
21-
# SPDX-FileCopyrightText: Copyright (c) 2021 NVIDIA CORPORATION & AFFILIATES
21+
# SPDX-FileCopyrightText: Copyright (c) 2021-2022 NVIDIA CORPORATION & AFFILIATES
2222
# SPDX-License-Identifier: MIT
2323

2424
import torch.distributed as dist

DGLPyTorch/DrugDiscovery/SE3Transformer/se3_transformer/data_loading/qm9.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# Copyright (c) 2021, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
1+
# Copyright (c) 2021-2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
22
#
33
# Permission is hereby granted, free of charge, to any person obtaining a
44
# copy of this software and associated documentation files (the "Software"),
@@ -18,7 +18,7 @@
1818
# FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
1919
# DEALINGS IN THE SOFTWARE.
2020
#
21-
# SPDX-FileCopyrightText: Copyright (c) 2021 NVIDIA CORPORATION & AFFILIATES
21+
# SPDX-FileCopyrightText: Copyright (c) 2021-2022 NVIDIA CORPORATION & AFFILIATES
2222
# SPDX-License-Identifier: MIT
2323
from typing import Tuple
2424

DGLPyTorch/DrugDiscovery/SE3Transformer/se3_transformer/model/basis.py

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# Copyright (c) 2021, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
1+
# Copyright (c) 2021-2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
22
#
33
# Permission is hereby granted, free of charge, to any person obtaining a
44
# copy of this software and associated documentation files (the "Software"),
@@ -18,7 +18,7 @@
1818
# FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
1919
# DEALINGS IN THE SOFTWARE.
2020
#
21-
# SPDX-FileCopyrightText: Copyright (c) 2021 NVIDIA CORPORATION & AFFILIATES
21+
# SPDX-FileCopyrightText: Copyright (c) 2021-2022 NVIDIA CORPORATION & AFFILIATES
2222
# SPDX-License-Identifier: MIT
2323

2424

@@ -33,6 +33,9 @@
3333

3434
from se3_transformer.runtime.utils import degree_to_dim
3535

36+
torch._C._jit_set_profiling_executor(False)
37+
torch._C._jit_set_profiling_mode(False)
38+
3639

3740
@lru_cache(maxsize=None)
3841
def get_clebsch_gordon(J: int, d_in: int, d_out: int, device) -> Tensor:

0 commit comments

Comments
 (0)