ONNX and ONNX Runtime
By : Vishwas Narayan
Agenda
1. Introduction to the ONNX ecosystem
2. Production Usage
3. ONNX Technical Design
ML models : Research to production
Deployment
Data
Collection
Trainin
g
Inference
and
deployment
Conversion
ML models : Research to production
Deployment
Data
Collection
Trainin
g
Inference
and
deployment
Conversion
ML models : Research to production
Deployment
Data
Collection
Trainin
g
Inference
and
deployment
Conversion
ML models : Research to production
Deployment
Data
Collection
Trainin
g
Inference
and
deployment
Conversion
Data Storage
Collect
Transform
Normalize and other steps
Data from different source
Build a Model from Data
Deploy the Model
But in this form of research
ML engineers and Data Scientist will have some overlapping role.
ML
Engineer
Data
Scientist
Many Products use Machine Learning
We train using
Deployment Targets
We Train We Deploy
ONNX
ONNX is a machine learning model representation format that is open source.
ONNX establishes a standard set of operators - the building blocks of machine
learning and deep learning models - as well as a standard file format, allowing AI
developers to utilise models with a range of frameworks, tools, runtimes, and
compilers.
ONNX is now on
● Training - CPU,GPU
● Deployment - Edge/Mobile
How do I get an ONNX model?
1. ONNX Model Zoo
How do I get an ONNX model?
1. ONNX Model Zoo
2. Model creation service such as Azure Custom Vision and
AutoML
How do I get an ONNX model?
1. ONNX Model Zoo
2. Model creation service such as Azure Custom Vision and AutoML
3. Convert model from existing model from another framework
How do I get an ONNX model?
1. ONNX Model Zoo
2. Model creation service such as Azure Custom Vision and AutoML
3. Convert model from existing model from another framework
4. End to End machine Learning Service using an Azure Machine
Learning Service
And also you can get them from
Native Export from
● Pytorch and CNTK
● Converter
https://github.com/microsoft/OLive
ML models : Research to production
Deployment
Data
Collection
Trainin
g
Inference
and
deployment
Conversion
Inferencing of the Deep Learning Model
Model
Weights
ONNX Runtime
● Available for all platforms
● Hardware Vendors do provide support
https://github.com/microsoft/DirectML
https://github.com/onnx/tutorials
We Train We Deploy
ONNX runtime is available in
Product that leverage the ONNX model
So the Stats say it
https://www.onnxruntime.ai/docs/how-to/tune-performance.html
https://github.com/onnx/tutorials
Performance Metrics
● Accuracy
● Loss
● Recall
● Latency
Common results today are
● Word - Grammar Check
● Bing - Q/A
● Cognitive service
And many more
Technical Design Principles
● Should support new architecture
● Also be supportive for the traditional ML
● Backward Compatibility
● Compact
● Cross-Platform representation for Serialization
ONNX Specification
● ONNX is a open specification that consists of the following components:
● A definition of the extensible computation graph model
● Definition of Standard Data types
● Definition of the Build operation which is for the versioned operation
sert(Schema)
ONNX model File Format
Model
a. Version Info
b. Metadata
c. Acyclic Computation Data Flow Graph
ONNX model File Format
Graph
a. Input and output units
b. List of computational Node
c. Graph Name
ONNX model File Format
Computational Node
a. Zero to more input types of the design types
b. One or more output defined types
c. Operators
d. Operator Parameter
ONNX Supported Data Types
Tensor Data Type
● Int8,int16,int32,int64
● Uint8,uint16,uint32,uint64
● float16,float32,float64,float,double
● Bool
● String
● complex 64,complex128
ONNX Supported Data Types
Non Tensor Data TYpe
● Sequence
● Map
Operator in ONNX
https://github.com/onnx/onnx/blob/master/docs/Operators.md
● An operator is identified by <name,domain,version>
● Core ops (ONNX and ONNX-ML)
○ Should be supported by ONNX-compatible products
○ Generally cannot be meaningfully further decomposed
○ Currently 124 ops in ai.onnx domain and 18 in ai.onnx. ml
○ Supports many scenarios/problem areas including image classification,
recommendation,natural language processing,etc.
ONNX - Custom Ops
● Ops Specific to framework or runtime
● Indicated by custom domain name
● Primarily meant to be a safety valve
ONNX versioning
Done in 3 Levels
● IR version (file format) :Currenly at version %
● Opset Version ONNX Model declare which operator sets they require as a list
of the two-part-operator ids(domain,opset_version)
● Operator Verion: A given Operator is identified by a three
tuple(domain,Op_type,Op_version)
https://github.com/onnx/onnx/blob/master/docs/Versioning.md
And also the version do change
ONNX - Design Principles
● Provide complete implementation of the ONNX standard — implement all
versions of the operators (since opset 7)
● Backward compatibility
● High performance
● Cross platform
● Leverage custom accelerators and runtimes to enable maximum performance
(execution providers)
● Support hybrid execution of the models
● Extensible through pluggable modules
https://www.onnxruntime.ai/docs/resources/high-level-design.html
https://www.onnxruntime.ai/docs/resources/high-level-design.html
Graph Partitioning
Given a mutable graph, graph partitioner assigns graph nodes to each execution provider per their
capability and idea goal is to reach best performance in a heterogeneous environment.
ONNX RUNTIME uses a "greedy" node assignment mechanism
● Users specify a preferred execution provider list in order
● ONNX RUNTIME will go thru the list in order to check each provider's capability and assign nodes to
it if it can run the nodes.
FUTURE:
● Manual tuned partitioning
● ML based partitioning
Rewrite rule
An interface created for finding patterns (with specific nodes) and applying
rewriting rules against a sub-graph.
Graph Transformer
An interface created for applying graph transformation with full graph editing
capability.
Transformer Level 0
Transformers anyway will be applied after graph partitioning (e.g. cast insertion,
mem copy insertion)
Level 1: General transformers not specific to any specific execution provider (e.g.
drop out elimination)
Level 2: Execution provider specific transformers (e.g. transpose insertion for
FPGA)
Graph Optimizations
Level 0
Cast
MemCopy
https://gitee.com/arnoldfychen/onnxruntime/blob/master/docs/ONNX_Runtime_Pe
rf_Tuning.md
Graph Optimizations
Level 1
● Eliminateldentity
● EliminateSlice
● UnsqueezeElimination
● EliminateDropout
● FuseReluClip
● ShapeTolnitializer
● ConyAddFusion
● ConvMulFusion
● ConvBNFusion
Execution Provider
A hardware accelerator interface to query its capability and get corresponding executables.
● Kernel based execution providers
These execution providers provides implementations of operators defined in ONNX (e.g.
CPUExecutionProvider, CudaExecutionProvider, MKLDNNExecutionProvider, etc.)
● Runtime based execution providers
These execution providers may not have implementations with the granularity of ONNX ops, but it
can run whole or partial ONNX graph. Say, it can run several ONNX ops (a sub-graph) together
with one function it has (e.g. TensorRTExecutionProvider, nGraphExecutionProvider, etc.)
https://www.onnxruntime.ai/docs/reference/execution-providers/
Extending ONNX runtime
● Execution providers
○ Implement the lExecution Provider interface
■ Examples: TensorRT, OpenVino, NGraph, Android NNAPI, etc
● Custom operators
○ Support operators outside the ONNX standard
○ Support for writing custom ops in both C/C++ and Python
● Graph optimizers
○ Implement the Graph Transformer interface
ONNX Go Live(OLive)
● Automates the process of ONNX model shipping
● Integrates
○ model conversion
○ correctness test
○ performance tuning
into a single pipeline and outputs a production ready ONNX model with ONNX Runtime configurations
(execution provider + optimization options)
https://github.com/microsoft/OLive
So to know more you have to go to this link here
https://azure.microsoft.com/en-in/blog/onnx-runtime-is-now-open-source/
https://www.onnxruntime.ai/python/tutorial.html
Thank you

Onnx and onnx runtime

  • 1.
    ONNX and ONNXRuntime By : Vishwas Narayan
  • 2.
    Agenda 1. Introduction tothe ONNX ecosystem 2. Production Usage 3. ONNX Technical Design
  • 3.
    ML models :Research to production Deployment Data Collection Trainin g Inference and deployment Conversion
  • 4.
    ML models :Research to production Deployment Data Collection Trainin g Inference and deployment Conversion
  • 5.
    ML models :Research to production Deployment Data Collection Trainin g Inference and deployment Conversion
  • 6.
    ML models :Research to production Deployment Data Collection Trainin g Inference and deployment Conversion
  • 7.
    Data Storage Collect Transform Normalize andother steps Data from different source Build a Model from Data Deploy the Model
  • 8.
    But in thisform of research ML engineers and Data Scientist will have some overlapping role.
  • 9.
  • 10.
    Many Products useMachine Learning
  • 11.
  • 12.
  • 13.
  • 14.
    ONNX ONNX is amachine learning model representation format that is open source. ONNX establishes a standard set of operators - the building blocks of machine learning and deep learning models - as well as a standard file format, allowing AI developers to utilise models with a range of frameworks, tools, runtimes, and compilers.
  • 15.
    ONNX is nowon ● Training - CPU,GPU ● Deployment - Edge/Mobile
  • 16.
    How do Iget an ONNX model? 1. ONNX Model Zoo
  • 17.
    How do Iget an ONNX model? 1. ONNX Model Zoo 2. Model creation service such as Azure Custom Vision and AutoML
  • 18.
    How do Iget an ONNX model? 1. ONNX Model Zoo 2. Model creation service such as Azure Custom Vision and AutoML 3. Convert model from existing model from another framework
  • 19.
    How do Iget an ONNX model? 1. ONNX Model Zoo 2. Model creation service such as Azure Custom Vision and AutoML 3. Convert model from existing model from another framework 4. End to End machine Learning Service using an Azure Machine Learning Service
  • 20.
    And also youcan get them from Native Export from ● Pytorch and CNTK ● Converter https://github.com/microsoft/OLive
  • 21.
    ML models :Research to production Deployment Data Collection Trainin g Inference and deployment Conversion
  • 22.
    Inferencing of theDeep Learning Model Model Weights
  • 23.
    ONNX Runtime ● Availablefor all platforms ● Hardware Vendors do provide support https://github.com/microsoft/DirectML https://github.com/onnx/tutorials
  • 24.
  • 26.
    ONNX runtime isavailable in
  • 27.
    Product that leveragethe ONNX model
  • 29.
    So the Statssay it https://www.onnxruntime.ai/docs/how-to/tune-performance.html https://github.com/onnx/tutorials
  • 30.
    Performance Metrics ● Accuracy ●Loss ● Recall ● Latency
  • 31.
    Common results todayare ● Word - Grammar Check ● Bing - Q/A ● Cognitive service And many more
  • 32.
    Technical Design Principles ●Should support new architecture ● Also be supportive for the traditional ML ● Backward Compatibility ● Compact ● Cross-Platform representation for Serialization
  • 33.
    ONNX Specification ● ONNXis a open specification that consists of the following components: ● A definition of the extensible computation graph model ● Definition of Standard Data types ● Definition of the Build operation which is for the versioned operation sert(Schema)
  • 34.
    ONNX model FileFormat Model a. Version Info b. Metadata c. Acyclic Computation Data Flow Graph
  • 35.
    ONNX model FileFormat Graph a. Input and output units b. List of computational Node c. Graph Name
  • 36.
    ONNX model FileFormat Computational Node a. Zero to more input types of the design types b. One or more output defined types c. Operators d. Operator Parameter
  • 37.
    ONNX Supported DataTypes Tensor Data Type ● Int8,int16,int32,int64 ● Uint8,uint16,uint32,uint64 ● float16,float32,float64,float,double ● Bool ● String ● complex 64,complex128
  • 38.
    ONNX Supported DataTypes Non Tensor Data TYpe ● Sequence ● Map
  • 39.
    Operator in ONNX https://github.com/onnx/onnx/blob/master/docs/Operators.md ●An operator is identified by <name,domain,version> ● Core ops (ONNX and ONNX-ML) ○ Should be supported by ONNX-compatible products ○ Generally cannot be meaningfully further decomposed ○ Currently 124 ops in ai.onnx domain and 18 in ai.onnx. ml ○ Supports many scenarios/problem areas including image classification, recommendation,natural language processing,etc.
  • 40.
    ONNX - CustomOps ● Ops Specific to framework or runtime ● Indicated by custom domain name ● Primarily meant to be a safety valve
  • 41.
    ONNX versioning Done in3 Levels ● IR version (file format) :Currenly at version % ● Opset Version ONNX Model declare which operator sets they require as a list of the two-part-operator ids(domain,opset_version) ● Operator Verion: A given Operator is identified by a three tuple(domain,Op_type,Op_version) https://github.com/onnx/onnx/blob/master/docs/Versioning.md And also the version do change
  • 42.
    ONNX - DesignPrinciples ● Provide complete implementation of the ONNX standard — implement all versions of the operators (since opset 7) ● Backward compatibility ● High performance ● Cross platform ● Leverage custom accelerators and runtimes to enable maximum performance (execution providers) ● Support hybrid execution of the models ● Extensible through pluggable modules
  • 44.
  • 45.
  • 46.
    Graph Partitioning Given amutable graph, graph partitioner assigns graph nodes to each execution provider per their capability and idea goal is to reach best performance in a heterogeneous environment. ONNX RUNTIME uses a "greedy" node assignment mechanism ● Users specify a preferred execution provider list in order ● ONNX RUNTIME will go thru the list in order to check each provider's capability and assign nodes to it if it can run the nodes. FUTURE: ● Manual tuned partitioning ● ML based partitioning
  • 47.
    Rewrite rule An interfacecreated for finding patterns (with specific nodes) and applying rewriting rules against a sub-graph.
  • 48.
    Graph Transformer An interfacecreated for applying graph transformation with full graph editing capability.
  • 49.
    Transformer Level 0 Transformersanyway will be applied after graph partitioning (e.g. cast insertion, mem copy insertion) Level 1: General transformers not specific to any specific execution provider (e.g. drop out elimination) Level 2: Execution provider specific transformers (e.g. transpose insertion for FPGA)
  • 50.
  • 51.
    Graph Optimizations Level 1 ●Eliminateldentity ● EliminateSlice ● UnsqueezeElimination ● EliminateDropout ● FuseReluClip ● ShapeTolnitializer ● ConyAddFusion ● ConvMulFusion ● ConvBNFusion
  • 52.
    Execution Provider A hardwareaccelerator interface to query its capability and get corresponding executables. ● Kernel based execution providers These execution providers provides implementations of operators defined in ONNX (e.g. CPUExecutionProvider, CudaExecutionProvider, MKLDNNExecutionProvider, etc.) ● Runtime based execution providers These execution providers may not have implementations with the granularity of ONNX ops, but it can run whole or partial ONNX graph. Say, it can run several ONNX ops (a sub-graph) together with one function it has (e.g. TensorRTExecutionProvider, nGraphExecutionProvider, etc.)
  • 53.
  • 54.
    Extending ONNX runtime ●Execution providers ○ Implement the lExecution Provider interface ■ Examples: TensorRT, OpenVino, NGraph, Android NNAPI, etc ● Custom operators ○ Support operators outside the ONNX standard ○ Support for writing custom ops in both C/C++ and Python ● Graph optimizers ○ Implement the Graph Transformer interface
  • 55.
    ONNX Go Live(OLive) ●Automates the process of ONNX model shipping ● Integrates ○ model conversion ○ correctness test ○ performance tuning into a single pipeline and outputs a production ready ONNX model with ONNX Runtime configurations (execution provider + optimization options) https://github.com/microsoft/OLive
  • 56.
    So to knowmore you have to go to this link here https://azure.microsoft.com/en-in/blog/onnx-runtime-is-now-open-source/ https://www.onnxruntime.ai/python/tutorial.html
  • 57.