You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
*[PyTorch 18.12.1-py3 NGC container](https://ngc.nvidia.com/registry/nvidia-pytorch) or newer
75
+
*[PyTorch 19.05-py3 NGC container](https://ngc.nvidia.com/registry/nvidia-pytorch) or newer
76
76
*[NVIDIA Volta based GPU](https://www.nvidia.com/en-us/data-center/volta-gpu-architecture/)
77
77
78
78
@@ -379,7 +379,7 @@ and accuracy in training and inference.
379
379
380
380
## Training accuracy results
381
381
Our results were obtained by running the `./platform/train_{tacotron2,waveglow}_{FP16,FP32}_DGX1_16GB_8GPU.sh`
382
-
training script in the PyTorch-18.12.1-py3 NGC container on NVIDIA DGX-1 with 8x V100 16G GPUs.
382
+
training script in the PyTorch-19.05-py3 NGC container on NVIDIA DGX-1 with 8x V100 16G GPUs.
383
383
384
384
All of the results were produced using the `train.py` as described in the
385
385
[Training process](#training-process) section of this document.
@@ -402,7 +402,7 @@ WaveGlow FP32 loss - batch size 4 (mean and std over 16 runs)
402
402
403
403
## Training performance results
404
404
Our results were obtained by running the `./platform/train_{tacotron2,waveglow}_{FP16,FP32}_DGX1_16GB_8GPU.sh`
405
-
training script in the PyTorch-18.12.1-py3 NGC container on NVIDIA DGX-1 with
405
+
training script in the PyTorch-19.05-py3 NGC container on NVIDIA DGX-1 with
406
406
8x V100 16G GPUs. Performance numbers (in input tokens per second for
407
407
Tacotron 2 and output samples per second for WaveGlow) were averaged over
408
408
an entire training epoch.
@@ -412,17 +412,17 @@ for mixed precision and FP32 training, respectively.
412
412
413
413
|Number of GPUs|Mixed precision tokens/sec|FP32 tokens/sec|Speed-up with mixed precision|Multi-gpu weak scaling with mixed precision|Multi-gpu weak scaling with FP32|
414
414
|---:|---:|---:|---:|---:|---:|
415
-
|**1**|2,424|1,826|1.33|1.00|1.00|
416
-
|**4**|7,280|5,944|1.22|3.00|3.26|
417
-
|**8**|12,742|10,843|1.18|5.26|5.94|
415
+
|**1**|2,554|1,740|1.47|1.00|1.00|
416
+
|**4**|7,768|5,683|1.37|3.04|3.27|
417
+
|**8**|12,524|10,484|1.19|4.90|6.03|
418
418
419
419
The following table shows the results for WaveGlow, with batch size equal 4 and 8 for mixed precision and FP32 training, respectively.
420
420
421
421
|Number of GPUs|Mixed precision samples/sec|FP32 samples/sec|Speed-up with mixed precision|Multi-gpu weak scaling with mixed precision|Multi-gpu weak scaling with FP32|
422
422
|---:|---:|---:|---:|---:|---:|
423
-
|**1**|70,362|35,180| 2.00| 1.00 | 1.00 |
424
-
|**4**|215,380|118,961|1.81| 3.06| 3.38|
425
-
|**8**|500,375|257,687|1.94| 7.11| 7.32|
423
+
|**1**|76,686|36,602| 2.10| 1.00 | 1.00 |
424
+
|**4**|260,826|124,514|2.09| 3.40| 3.40|
425
+
|**8**|566,471|264,138|2.14| 7.39| 7.22|
426
426
427
427
To achieve these same results, follow the [Quick Start Guide](#quick-start-guide) outlined above.
428
428
@@ -432,21 +432,21 @@ This table shows the expected training time for convergence for Tacotron 2 (1500
432
432
433
433
|Number of GPUs|Expected training time with mixed precision|Expected training time with FP32|Speed-up with mixed precision|
434
434
|---:|---:|---:|---:|
435
-
|**1**|208.00|288.03| 1.38 |
436
-
|**4**|67.53|84.20| 1.25 |
437
-
|**8**| 33.14|44.00| 1.33 |
435
+
|**1**|197.39|302.32| 1.38 |
436
+
|**4**|63.29|88.07| 1.25 |
437
+
|**8**| 33.72|45.51| 1.33 |
438
438
439
439
This table shows the expected training timefor convergence for WaveGlow (1000 epochs).
440
440
441
441
|Number of GPUs|Expected training time with mixed precision|Expected training time with FP32|Speed-up with mixed precision|
442
442
|---:|---:|---:|---:|
443
-
|**1**|437.03|814.30| 1.86|
444
-
|**4**|108.26|223.04| 2.06|
445
-
|**8**|54.83|109.96| 2.01|
443
+
|**1**|400.99|782.67| 1.95|
444
+
|**4**|89.40|213.09| 2.38|
445
+
|**8**|48.43|107.27| 2.21|
446
446
447
447
## Inference performance results
448
448
Our results were obtained by running the `./inference.py` inference script in the
449
-
PyTorch-18.12.1-py3 NGC container on NVIDIA DGX-1 with 8x V100 16G GPUs.
449
+
PyTorch-19.05-py3 NGC container on NVIDIA DGX-1 with 8x V100 16G GPUs.
450
450
Performance numbers (in input tokens per second for Tacotron 2 and output
451
451
samples per second for WaveGlow) were averaged over 16 runs.
452
452
@@ -456,15 +456,15 @@ Results are measured in the number of input tokens per second.
456
456
457
457
|Number of GPUs|Mixed precision tokens/sec|FP32 tokens/sec|Speed-up with mixed precision|
458
458
|---:|---:|---:|---:|
459
-
|**1**|170|178|0.96|
459
+
|**1**|130|150|0.87|
460
460
461
461
462
462
This table shows the inference performance results for WaveGlow.
463
463
Results are measured in the number of output audio samples per second.<sup>1</sup>
464
464
465
465
|Number of GPUs|Mixed precision samples/sec|FP32 samples/sec|Speed-up with mixed precision|
466
466
|---:|---:|---:|---:|
467
-
|**1**|537525|404206|1.33|
467
+
|**1**|435110|400097|1.09|
468
468
469
469
<sup>1</sup>With sampling rate equal to 22050, one second of audio is generated from 22050 samples.
0 commit comments