@OVHcloud_fr #OVHcloudTechTalks @ChrisRannou1
@OVHcloud_fr #OVHcloudTechTalks @ChrisRannou
OVHcloud ML Serving
Maël Le Gal github:mael-le-gal
Christophe Rannou @ChrisRannou
@OVHcloud_fr #OVHcloudTechTalks @ChrisRannou
Public Cloud ML Serving
3
@OVHcloud_fr #OVHcloudTechTalks @ChrisRannou
Public Cloud ML Serving
4
Instead of taking care of the deployment in production, simply select ML models (your own
or pre-trained), size and deploy. We provide API endpoints and more !
✔ We simplify your architecture : we deploy your ML models for you in few clicks
✔ We simplify your code : everything can be automated (via API / CLI)
✔ We reduce your costs : you reduce the time-to-production from weeks to second, pay as you go
✔ We fix your mains challenges : we provide Scaling, Monitoring and Versioning
Our extra value :
@OVHcloud_fr #OVHcloudTechTalks @ChrisRannou
Demo
5
@OVHcloud_fr #OVHcloudTechTalks @ChrisRannou
Serving Hub
6
@OVHcloud_fr #OVHcloudTechTalks @ChrisRannou
Serving Hub
7
Model CRD
@OVHcloud_fr #OVHcloudTechTalks @ChrisRannou
Serving Hub
8
Token CRD
@OVHcloud_fr #OVHcloudTechTalks @ChrisRannou
Serving Hub
9
Auth API
• Generate tokens
• Check tokens
Hub API, model management
• Create
• Delete
• Update
Web APIs
@OVHcloud_fr #OVHcloudTechTalks @ChrisRannou
Serving Hub
10
@OVHcloud_fr #OVHcloudTechTalks @ChrisRannou
Serving Hub
11
• Image building:
– Build image with model files from storage
– Push image to the registry
• Model Lifecycle:
– ApiStatus: describe the state of the runtime API
– VersionStatus: describe the state of the model image
Model Controller
@OVHcloud_fr #OVHcloudTechTalks @ChrisRannou
Serving Hub
12
Example
@OVHcloud_fr #OVHcloudTechTalks @ChrisRannou
Monitoring
13
@OVHcloud_fr #OVHcloudTechTalks @ChrisRannou
Serving Hub
14
• Ingress controller:
– Count of HTTP requests by method and status code
– Sum of HTTP latencies by method and status code
• k8s:
– Number of pods
– Number of Model CRD
• Custom Model Metrics:
– Liveness
– Version
– Version status
– API status
Metrics
@OVHcloud_fr #OVHcloudTechTalks @ChrisRannou
Serving Hub
15
• HPA: horizontal pod autoscaler
• RAM usage +60%
• CPU usage +60%
• Params:
– Max/Min threshold
– Scale decision: % and which resource
• To come: GPU usage
Auto Scaling
@OVHcloud_fr #OVHcloudTechTalks @ChrisRannou
Serving Runtime
16
@OVHcloud_fr #OVHcloudTechTalks @ChrisRannou
Serving Runtime
17
Prerequisites :
• Support ONNX & TensorFlow (TF) & PMML serialization format
• Able to chain several models of different kind
• Available through a web service API
 HTTP Query  Preprocessing  Model
 Postprocessin
g
 HTTP
Response
Example :
@OVHcloud_fr #OVHcloudTechTalks @ChrisRannou
Serving Runtime
18
Inputs
•     ?
 Evaluator
•ONNX
•TensorFlow
•PMML
Outputs
•     ?
Think generic :
Let's create one Evaluator per supported serialization
format.
          What are the common inputs & outputs ?
@OVHcloud_fr #OVHcloudTechTalks @ChrisRannou
Serving Runtime
19
ONNX & TensorFlow:
Inputs & outputs : 
• List n-dimensional arrays (i.e tensors)
identify by names and shapes
Example :
0 1 0
1 0 0
0 0 1
tensor_A (3,3)
3.2 0.1 8.7 6.0
0.0 1.2 2.0 7.7
tensor_B (2, 4)
@OVHcloud_fr #OVHcloudTechTalks @ChrisRannou
Serving Runtime
20
PMML
Inputs & outputs : 
• Tabular data (i.e Dataset)                      Can be interpreted as a list of named tensors :
Example :
prop_int prop_bool prop_string
1 true "John"
6 true "Kim"
8 false "Hugo"
1 6 8
prop_int (3, 1)
true true false
prop_int (3, 1)
"John" "Kim" "Hugo"
prop_string (3, 1)
@OVHcloud_fr #OVHcloudTechTalks @ChrisRannou
Serving Runtime
21
Inputs
• List of named tensors
 Evaluator
•ONNX
•TensorFlow
•PMML
Outputs
•List of named tensors
Think generic :
          What are the common inputs & outputs ?
@OVHcloud_fr #OVHcloudTechTalks @ChrisRannou
Serving Runtime
22
Web API
How to convert http query into a named tensors ?
How to convert named tensors into http response ?
Use the Content-Type header to decode/encode message body
?
?
..
@OVHcloud_fr #OVHcloudTechTalks @ChrisRannou
Serving Runtime
23
Web API
• Content-Type: application/json
0 1 0
1 0 0
0 0 1
tensor_A (3,3)
3.2 0.1 8.7 6.0
0.0 1.2 2.0 7.7
tensor_B (2, 3)
{
    "tensor_A": [
        [0, 1, 0],
        [1, 0, 0],
        [0, 0, 1]
    ],
    "tensor_B": [
        [3.2, 0.1, 8.7, 6.0],
        [0.0, 1.2, 2.0, 7.7]
    ]
}
@OVHcloud_fr #OVHcloudTechTalks @ChrisRannou
Serving Runtime
24
Web API
• Content-Type: image/png
... ... ...
(253, 152, 6) (253, 13, 10) ...
(200, 100, 255) (10, 10, 6) ...
image (1, width, height, 3)
width
h
e
i
g
h
t
@OVHcloud_fr #OVHcloudTechTalks @ChrisRannou
Serving Runtime
25
Web API
Other supported Content-Type :
• image/jpeg
• multipart/form-data
• text/html
Later :
• application/protobuf
@OVHcloud_fr #OVHcloudTechTalks @ChrisRannou
Serving Runtime
26
Web API
Example with available ONNX model :
https://github.com/onnx/models/tree/master/vision/style_transfer/fast_neural_style
@OVHcloud_fr #OVHcloudTechTalks @ChrisRannou
ML Serving
27
@OVHcloud_fr #OVHcloudTechTalks @ChrisRannou
Thanks!
Maël Le Gal github:mael-le-gal
Christophe Rannou @ChrisRannou

OVHcloud TechTalks - ML serving