Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
c4a479b
Added GitHub DCO workflow
nv-kkudrynski Apr 18, 2023
29aaae3
[Jasper/PyT] Update torch.stft for PyTorch 2.0
alancucki Apr 19, 2023
442791a
Merge: [Jasper/PyT] Update torch.stft for PyTorch 2.0
nv-kkudrynski Apr 19, 2023
1e10352
[EfficientNet/TF2] remove tf async level flag
Victor49152 Apr 20, 2023
5bc69ca
[DLRM/PyT] Stop using apex AMP and DDP
tgrel Apr 24, 2023
05ee986
[NCF/PyT] Stop using deprecated apex AMP and apex DDP
tgrel Apr 24, 2023
f81fca9
[ResNet50/Paddle] Do inference with synthetic input as default
leo0519 May 8, 2023
d56fe70
[UNet3+/TF2] Initial contribution (#1267)
hamidriasat May 8, 2023
7b89aed
[Transformer/PyT] minor bugfix
jbaczek May 15, 2023
9becdf8
[Jasper/PyT, QuartzNet/PyT] Update Pandas and Dali versions
alancucki May 22, 2023
370a221
Merge: [DLRM/PyT] Stop using apex AMP and DDP
nv-kkudrynski May 29, 2023
2a7c251
Merge: [NCF/PyT] Stop using deprecated apex AMP and apex DDP
nv-kkudrynski May 29, 2023
810bcf3
[resnet/mxnet] Apply horovod patch for hvd init
mmarcinkiewicz May 29, 2023
54e2fb4
Merge: [resnet/mxnet] Apply horovod patch for hvd init
nv-kkudrynski May 29, 2023
8ed53a4
[Jasper/PyT, QuartzNet/PyT] Fix Ada L40 on 23.06 base container
alancucki Jun 30, 2023
d53419f
[DLRM/TF2] DLRM and DCNv2 23.02 release
tgrel Jun 30, 2023
2693c63
Merge: [DLRM/TF2] DLRM 23.02 release
nv-kkudrynski Jun 30, 2023
fc9c09b
[ResNet/Paddle] Add CUDNNv8 ResUnit fusion
Tom-Zheng Jul 26, 2023
820b6dd
[Efficientnet/TF2] fix keras imports
Victor49152 Jul 27, 2023
96bdb5b
Merge: [Efficientnet/TF2] fix keras imports
nv-kkudrynski Jul 27, 2023
296bb99
[SynGen] 23.08 Release
ArturKasymov Aug 4, 2023
a5388a4
[BERT/Paddle] Update base image and integrate cuDNN fused MHA
Wong4j Aug 23, 2023
41f582b
[JAX] Add JAX models with reference to Rosetta Github
sharathts Sep 5, 2023
da7e1a7
[DLRM/TF2] CPU offloading
tgrel Oct 3, 2023
6f3a71a
[JAX/Imagen] Imagen model with reference to Rosetta Github
sharathts Oct 6, 2023
e36f9d9
[DLRM/TF2] Fix numpy bool API change
tgrel Nov 13, 2023
e52bcb0
[DLRM/PyT] Fix np.bool API deprecation
tgrel Nov 13, 2023
b849275
Merge: [DLRM/PyT] Fix np.bool API deprecation
nv-kkudrynski Nov 30, 2023
34770bb
[Jasper/PyT,QuartzNet/PyT] use the dafault DALI installation from the…
mwawrzos Dec 4, 2023
0131db6
Merge branch 'mwawrzos/nvbug/4393747' into 'internal/main'
nv-kkudrynski Dec 4, 2023
4fca54f
[RN50/Paddle] Fix 2308 compatibility issue
Wong4j Dec 7, 2023
9dd9fcb
[wav2vec2.0/PyT] Fix pip dependencies (librosa - numpy)
alancucki Dec 8, 2023
38934f9
[RN50/Paddle] Remove export script and add INT8 feature (QAT + infere…
leo0519 Feb 20, 2024
2788e44
[UNET2D/TF2] Fix numpy API deprecation
mmarcinkiewicz Mar 11, 2024
729963d
[TSPP] 24.03 Release
nv-dmajchrowski Apr 4, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
29 changes: 29 additions & 0 deletions .github/workflows/cla.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
name: "DCO Assistant"
on:
issue_comment:
types: [created]
pull_request_target:
types: [opened,closed,synchronize]

permissions:
actions: write
contents: write
pull-requests: write
statuses: write

jobs:
DCOAssistant:
runs-on: ubuntu-latest
steps:
- name: "DCO Assistant"
if: (github.event.comment.body == 'recheck' || github.event.comment.body == 'I have read the DCO Document and I hereby sign the DCO') || github.event_name == 'pull_request_target'
uses: contributor-assistant/github-action@v2.3.0
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
with:
path-to-signatures: '.github/dco/signatures.json'
path-to-document: 'https://developercertificate.org/'
branch: 'dco-do-not-remove'
allowlist: user1,bot*
use-dco-flag: true
custom-notsigned-prcomment: '<br/>Thank you for your submission. Before we can accept your contribution, please sign our [Developer Certificate of Origin](https://developercertificate.org) by posting a comment with the content exactly as below.<br/>'
46 changes: 46 additions & 0 deletions JAX/Classification/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
# Image Classification

Image classification is the task of categorizing an image into one of several predefined classes, often also giving a probability of the input belonging to a certain class. This task is crucial in understanding and analyzing images, and it comes quite effortlessly to human beings with our complex visual systems. Most powerful image classification models today are built using some form of Convolution Neural Networks (CNNs), which are also the backbone of many other tasks in Computer Vision.

![What is Image Classification?](../../PyTorch/Classification/img/1_image-classification-figure-1.PNG)

[Source](https://github.com/NVlabs/stylegan)

In this overview, we will cover
- Types of image Classification
- How does it work?
- How is the performance evaluated?
- Use cases and applications
- Where to get started

---
## Types of image Classification
Image Classification can be broadly divided into either Binary or Multi-class problems depending on the number of categories. Binary image classification problems entail predicting one of two classes. An example of this would be to predict whether an image is that of a dog or not. A subtly different problem is that of single-class (one vs all) classification, where the goal is to recognize data from one class and reject all other. This is beneficial when there is an overabundance of data from one of the classes, also called a class imbalance.

![Input and Outputs for Image Classification](../../PyTorch/Classification/img/1_image-classification-figure-2.PNG)

In Multi-class classification problems, models categorize instances into one of three or more categories. Multi-class models often also return confidence scores (or probabilities) of an image belonging to each of the possible classes. This should not be confused with multi-label classification, where a model assigns multiple labels to an instance.

---
## How is the performance evaluated?
Image Classification performance is often reported as Top-1 or Top-5 scores. In top-1 score, classification is considered correct if the top predicted class (with the highest predicted probability) matches the true class for a given instance. In top-5, we check if one of the top 5 predictions matches the true class. The score is just the number of correct predictions divided by the total number of instances evaluated.

---
## Use cases and applications
### Categorizing Images in Large Visual Databases
Businesses with visual databases may accumulate large amounts of images with missing tags or meta-data. Unless there is an effective way to organize such images, they may not be much use at all. On the contrary, they may hog precious storage space. Automated image classification algorithms can classify such untagged images into predefined categories. Businesses can avoid expensive manual labor by employing automated image classification algorithms.

A related task is that of Image Organization in smart devices like mobile phones. With Image Classification techniques, images and videos can be organized for improved accessibility.

### Visual Search
Visual Search or Image-based search has risen to popularity over the recent years. Many prominent search engines already provide this feature where users can search for visual content similar to a provided image. This has many applications in the e-commerce and retail industry where users can take a snap and upload an image of a product they are interested in purchasing. This makes the shopping experience much more efficient for customers, and can increase sales for businesses.


### Healthcare
Medical Imaging is about creating visual images of internal body parts for clinical purposes. This includes health monitoring, medical diagnosis, treatment, and keeping organized records. Image Classification algorithms can play a crucial role in Medical Imaging by assisting medical professionals detect presence of illness and having consistency in clinical diagnosis.

---
## Getting started
NVIDIA provides examples for JAX models on [Rosetta](https://github.com/NVIDIA/JAX-Toolbox/tree/main/rosetta/rosetta/projects). These examples provide you with easy to consume and highly optimized scripts for both training and inferencing. The quick start guide at our GitHub repository will help you in setting up the environment using NGC Docker Images, download pre-trained models from NGC and adapt the model training and inference for your application/use-case.

These models are tested and maintained by NVIDIA, leveraging mixed precision using tensor cores on our latest GPUs for faster training times while maintaining accuracy.
2 changes: 2 additions & 0 deletions JAX/Classification/ViT/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
# ViT on GPUs
Please refer to [Rosetta ViT](https://github.com/NVIDIA/JAX-Toolbox/tree/main/rosetta/rosetta/projects/vit), NVIDIA's project that enables seamless training of LLMs, CV models and multimodal models in JAX, for information about running Vision Transformer models and experiments on GPUs.
4 changes: 4 additions & 0 deletions JAX/LanguageModeling/PAXML/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
Paxml (aka Pax) is a framework for training LLMs. It allows for advanced and configurable experimentation and parallelization. It is based on [JAX](https://github.com/google/jax) and [Praxis](https://github.com/google/praxis).

# PAXML on GPUs
Please refer to [Rosetta PAXML](https://github.com/NVIDIA/JAX-Toolbox/tree/main/rosetta/rosetta/projects/pax), NVIDIA's project that enables seamless training of LLMs, CV models and multimodal models in JAX, for information about running models and experiments on GPUs in PAXML.
90 changes: 90 additions & 0 deletions JAX/LanguageModeling/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
# Language Modeling


Language modeling (LM) is a natural language processing (NLP) task that determines the probability of a given sequence of words occurring in a sentence.

In an era where computers, smartphones and other electronic devices increasingly need to interact with humans, language modeling has become an indispensable technique for teaching devices how to communicate in natural languages in human-like ways.

But how does language modeling work? And what can you build with it? What are the different approaches, what are its potential benefits and limitations, and how might you use it in your business?

In this guide, you’ll find answers to all of those questions and more. Whether you’re an experienced machine learning engineer considering implementation, a developer wanting to learn more, or a product manager looking to explore what’s possible with natural language processing and language modeling, this guide is for you.

Here’s a look at what we’ll cover:

- Language modeling – the basics
- How does language modeling work?
- Use cases and applications
- Getting started


## Language modeling – the basics

### What is language modeling?

"*Language modeling is the task of assigning a probability to sentences in a language. […]
Besides assigning a probability to each sequence of words, the language models also assign a
probability for the likelihood of a given word (or a sequence of words) to follow a sequence
of words.*" Source: Page 105, [Neural Network Methods in Natural Language Processing](http://amzn.to/2wt1nzv), 2017.


### Types of language models

There are primarily two types of Language Models:

- Statistical Language Models: These models use traditional statistical techniques like N-grams, Hidden Markov Models (HMM), and certain linguistic rules to learn the probability distribution of words.
- Neural Language Models: They use different kinds of Neural Networks to model language, and have surpassed the statistical language models in their effectiveness.

"*We provide ample empirical evidence to suggest that connectionist language models are
superior to standard n-gram techniques, except their high computational (training)
complexity.*" Source: [Recurrent neural network based language model](http://www.fit.vutbr.cz/research/groups/speech/publi/2010/mikolov_interspeech2010_IS100722.pdf), 2010.

Given the superior performance of neural language models, we include in the container two popular state-of-the-art neural language models: BERT and Transformer-XL.

### Why is language modeling important?

Language modeling is fundamental in modern NLP applications. It enables machines to understand qualitative information, and enables people to communicate with machines in the natural languages that humans use to communicate with each other.

Language modeling is used directly in a variety of industries, including tech, finance, healthcare, transportation, legal, military, government, and more -- actually, you probably have just interacted with a language model today, whether it be through Google search, engaging with a voice assistant, or using text autocomplete features.


## How does language modeling work?

The roots of modern language modeling can be traced back to 1948, when Claude Shannon
published a paper titled "A Mathematical Theory of Communication", laying the foundation for information theory and language modeling. In the paper, Shannon detailed the use of a stochastic model called the Markov chain to create a statistical model for the sequences of letters in English text. The Markov models, along with n-gram, are still among the most popular statistical language models today.

However, simple statistical language models have serious drawbacks in scalability and fluency because of its sparse representation of language. Overcoming the problem by representing language units (eg. words, characters) as a non-linear, distributed combination of weights in continuous space, neural language models can learn to approximate words without being misled by rare or unknown values.

Therefore, as mentioned above, we introduce two popular state-of-the-art neural language models, BERT and Transformer-XL, in Tensorflow and PyTorch. More details can be found in the [NVIDIA Deep Learning Examples Github Repository ](https://github.com/NVIDIA/DeepLearningExamples)


## Use cases and applications

### Speech Recognition

Imagine speaking a phrase to the phone, expecting it to convert the speech to text. How does
it know if you said "recognize speech" or "wreck a nice beach"? Language models help figure it out
based on the context, enabling machines to process and make sense of speech audio.


### Spelling Correction

Language-models-enabled spellcheckers can point to spelling errors and possibly suggest alternatives.


### Machine translation

Imagine you are translating the Chinese sentence "我在开车" into English. Your translation system gives you several choices:

- I at open car
- me at open car
- I at drive
- me at drive
- I am driving
- me am driving

A language model tells you which translation sounds the most natural.

## Getting started
NVIDIA provides examples for JAX models on [Rosetta](https://github.com/NVIDIA/JAX-Toolbox/tree/main/rosetta/rosetta/projects). These examples provide you with easy to consume and highly optimized scripts for both training and inferencing. The quick start guide at our GitHub repository will help you in setting up the environment using NGC Docker Images, download pre-trained models from NGC and adapt the model training and inference for your application/use-case.

These models are tested and maintained by NVIDIA, leveraging mixed precision using tensor cores on our latest GPUs for faster training times while maintaining accuracy.
5 changes: 5 additions & 0 deletions JAX/LanguageModeling/T5X/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
T5X is a framework for training, evaluation, and inference of sequence models (starting with language). It is based on [JAX](https://github.com/google/jax) and [Flax](https://github.com/google/flax). To learn more, see the [T5X Paper](https://arxiv.org/abs/2203.17189).

# T5X on GPUs

Please refer to [Rosetta T5X](https://github.com/NVIDIA/JAX-Toolbox/tree/main/rosetta/rosetta/projects/t5x), NVIDIA's project that enables seamless training of LLMs, CV models and multimodal models in JAX, for information about running models and experiments on GPUs in T5X.
2 changes: 2 additions & 0 deletions JAX/MultiModal/Imagen/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
# Imagen on GPUs
Please refer to [Rosetta Imagen](https://github.com/NVIDIA/JAX-Toolbox/tree/main/rosetta/rosetta/projects/imagen), NVIDIA's project that enables seamless training of LLMs, CV models and multimodal models in JAX, for information about running Imagen models and experiments on GPUs.
2 changes: 1 addition & 1 deletion MxNet/Classification/RN50v1.5/dali.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ def add_dali_args(parser):
group.add_argument('--dali-validation-threads', type=int, default=10, help="number of threads" +\
"per GPU for DALI for validation")
group.add_argument('--dali-prefetch-queue', type=int, default=5, help="DALI prefetch queue depth")
group.add_argument('--dali-nvjpeg-memory-padding', type=int, default=256, help="Memory padding value for nvJPEG (in MB)")
group.add_argument('--dali-nvjpeg-memory-padding', type=int, default=64, help="Memory padding value for nvJPEG (in MB)")
group.add_argument('--dali-fuse-decoder', type=int, default=1, help="0 or 1 whether to fuse decoder or not")

group.add_argument('--dali-nvjpeg-width-hint', type=int, default=5980, help="Width hint value for nvJPEG (in pixels)")
Expand Down
10 changes: 5 additions & 5 deletions MxNet/Classification/RN50v1.5/fit.py
Original file line number Diff line number Diff line change
Expand Up @@ -483,11 +483,6 @@ def fit(args, model, data_loader):
# select gpu for horovod process
if 'horovod' in args.kv_store:
args.gpus = [args.gpus[hvd.local_rank()]]
ctx = mx.gpu(hvd.local_rank())

tensor1 = mx.nd.zeros(shape=(1,), dtype='float32', ctx=ctx)
tensor2 = mx.nd.zeros(shape=(1,), dtype='float32', ctx=ctx)
tensor1, tensor2 = hvd.grouped_allreduce([tensor1,tensor2])

if args.amp:
amp.init()
Expand Down Expand Up @@ -579,6 +574,11 @@ def fit(args, model, data_loader):
params = model.collect_params()
if params is not None:
hvd.broadcast_parameters(params, root_rank=0)
ctx = mx.gpu(hvd.local_rank())
tensor1 = mx.nd.zeros(shape=(1,), dtype='float32', ctx=ctx)
tensor2 = mx.nd.zeros(shape=(1,), dtype='float32', ctx=ctx)
tensor1, tensor2 = hvd.grouped_allreduce([tensor1,tensor2])

global_metrics = CompositeMeter()
if args.mode in ['train_val', 'train']:
global_metrics.register_metric('train.loss', MinMeter())
Expand Down
2 changes: 1 addition & 1 deletion PaddlePaddle/Classification/RN50v1.5/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
ARG FROM_IMAGE_NAME=nvcr.io/nvidia/paddlepaddle:22.05-py3
ARG FROM_IMAGE_NAME=nvcr.io/nvidia/paddlepaddle:23.12-py3
FROM ${FROM_IMAGE_NAME}

ADD requirements.txt /workspace/
Expand Down
Loading