Onnx and onnx runtime

ONNX and ONNX Runtime
By : Vishwas Narayan

Agenda
1. Introduction to the ONNX ecosystem
2. Production Usage
3. ONNX Technical Design

ML models : Research to production
Deployment
Data
Collection
Trainin
g
Inference
and
deployment
Conversion

Data Storage
Collect
Transform
Normalize and other steps
Data from different source
Build a Model from Data
Deploy the Model

But in this form of research
ML engineers and Data Scientist will have some overlapping role.

Many Products use Machine Learning

ONNX
ONNX is a machine learning model representation format that is open source.
ONNX establishes a standard set of operators - the building blocks of machine
learning and deep learning models - as well as a standard file format, allowing AI
developers to utilise models with a range of frameworks, tools, runtimes, and
compilers.

ONNX is now on
● Training - CPU,GPU
● Deployment - Edge/Mobile

How do I get an ONNX model?
1. ONNX Model Zoo

1. ONNX Model Zoo
2. Model creation service such as Azure Custom Vision and
AutoML

1. ONNX Model Zoo
2. Model creation service such as Azure Custom Vision and AutoML
3. Convert model from existing model from another framework

1. ONNX Model Zoo
2. Model creation service such as Azure Custom Vision and AutoML
3. Convert model from existing model from another framework
4. End to End machine Learning Service using an Azure Machine
Learning Service

And also you can get them from
Native Export from
● Pytorch and CNTK
● Converter
https://github.com/microsoft/OLive

Inferencing of the Deep Learning Model
Model
Weights

ONNX Runtime
● Available for all platforms
● Hardware Vendors do provide support
https://github.com/microsoft/DirectML
https://github.com/onnx/tutorials

Product that leverage the ONNX model

So the Stats say it
https://www.onnxruntime.ai/docs/how-to/tune-performance.html
https://github.com/onnx/tutorials

Performance Metrics
● Accuracy
● Loss
● Recall
● Latency

Common results today are
● Word - Grammar Check
● Bing - Q/A
● Cognitive service
And many more

Technical Design Principles
● Should support new architecture
● Also be supportive for the traditional ML
● Backward Compatibility
● Compact
● Cross-Platform representation for Serialization

ONNX Specification
● ONNX is a open specification that consists of the following components:
● A definition of the extensible computation graph model
● Definition of Standard Data types
● Definition of the Build operation which is for the versioned operation
sert(Schema)

ONNX model File Format
Model
a. Version Info
b. Metadata
c. Acyclic Computation Data Flow Graph

Graph
a. Input and output units
b. List of computational Node
c. Graph Name

Computational Node
a. Zero to more input types of the design types
b. One or more output defined types
c. Operators
d. Operator Parameter

ONNX Supported Data Types
Tensor Data Type
● Int8,int16,int32,int64
● Uint8,uint16,uint32,uint64
● float16,float32,float64,float,double
● Bool
● String
● complex 64,complex128

ONNX Supported Data Types
Non Tensor Data TYpe
● Sequence
● Map

Operator in ONNX
https://github.com/onnx/onnx/blob/master/docs/Operators.md
● An operator is identified by <name,domain,version>
● Core ops (ONNX and ONNX-ML)
○ Should be supported by ONNX-compatible products
○ Generally cannot be meaningfully further decomposed
○ Currently 124 ops in ai.onnx domain and 18 in ai.onnx. ml
○ Supports many scenarios/problem areas including image classification,
recommendation,natural language processing,etc.

ONNX - Custom Ops
● Ops Specific to framework or runtime
● Indicated by custom domain name
● Primarily meant to be a safety valve

ONNX versioning
Done in 3 Levels
● IR version (file format) :Currenly at version %
● Opset Version ONNX Model declare which operator sets they require as a list
of the two-part-operator ids(domain,opset_version)
● Operator Verion: A given Operator is identified by a three
tuple(domain,Op_type,Op_version)
https://github.com/onnx/onnx/blob/master/docs/Versioning.md
And also the version do change

ONNX - Design Principles
● Provide complete implementation of the ONNX standard — implement all
versions of the operators (since opset 7)
● Backward compatibility
● High performance
● Cross platform
● Leverage custom accelerators and runtimes to enable maximum performance
(execution providers)
● Support hybrid execution of the models
● Extensible through pluggable modules

https://www.onnxruntime.ai/docs/resources/high-level-design.html

Graph Partitioning
Given a mutable graph, graph partitioner assigns graph nodes to each execution provider per their
capability and idea goal is to reach best performance in a heterogeneous environment.
ONNX RUNTIME uses a "greedy" node assignment mechanism
● Users specify a preferred execution provider list in order
● ONNX RUNTIME will go thru the list in order to check each provider's capability and assign nodes to
it if it can run the nodes.
FUTURE:
● Manual tuned partitioning
● ML based partitioning

Rewrite rule
An interface created for finding patterns (with specific nodes) and applying
rewriting rules against a sub-graph.

Graph Transformer
An interface created for applying graph transformation with full graph editing
capability.

Transformer Level 0
Transformers anyway will be applied after graph partitioning (e.g. cast insertion,
mem copy insertion)
Level 1: General transformers not specific to any specific execution provider (e.g.
drop out elimination)
Level 2: Execution provider specific transformers (e.g. transpose insertion for
FPGA)

Graph Optimizations
Level 0
Cast
MemCopy
https://gitee.com/arnoldfychen/onnxruntime/blob/master/docs/ONNX_Runtime_Pe
rf_Tuning.md

Graph Optimizations
Level 1
● Eliminateldentity
● EliminateSlice
● UnsqueezeElimination
● EliminateDropout
● FuseReluClip
● ShapeTolnitializer
● ConyAddFusion
● ConvMulFusion
● ConvBNFusion

Execution Provider
A hardware accelerator interface to query its capability and get corresponding executables.
● Kernel based execution providers
These execution providers provides implementations of operators defined in ONNX (e.g.
CPUExecutionProvider, CudaExecutionProvider, MKLDNNExecutionProvider, etc.)
● Runtime based execution providers
These execution providers may not have implementations with the granularity of ONNX ops, but it
can run whole or partial ONNX graph. Say, it can run several ONNX ops (a sub-graph) together
with one function it has (e.g. TensorRTExecutionProvider, nGraphExecutionProvider, etc.)

https://www.onnxruntime.ai/docs/reference/execution-providers/

Extending ONNX runtime
● Execution providers
○ Implement the lExecution Provider interface
■ Examples: TensorRT, OpenVino, NGraph, Android NNAPI, etc
● Custom operators
○ Support operators outside the ONNX standard
○ Support for writing custom ops in both C/C++ and Python
● Graph optimizers
○ Implement the Graph Transformer interface

ONNX Go Live(OLive)
● Automates the process of ONNX model shipping
● Integrates
○ model conversion
○ correctness test
○ performance tuning
into a single pipeline and outputs a production ready ONNX model with ONNX Runtime configurations
(execution provider + optimization options)
https://github.com/microsoft/OLive

So to know more you have to go to this link here
https://azure.microsoft.com/en-in/blog/onnx-runtime-is-now-open-source/
https://www.onnxruntime.ai/python/tutorial.html

Onnx and onnx runtime

More Related Content

What's hot

Similar to Onnx and onnx runtime

More from Vishwas N

Recently uploaded

Onnx and onnx runtime