OVHcloud TechTalks - ML serving

@OVHcloud_fr #OVHcloudTechTalks @ChrisRannou1

@OVHcloud_fr #OVHcloudTechTalks @ChrisRannou
OVHcloud ML Serving
Maël Le Gal github:mael-le-gal
Christophe Rannou @ChrisRannou

Public Cloud ML Serving
3

Public Cloud ML Serving
4
Instead of taking care of the deployment in production, simply select ML models (your own
or pre-trained), size and deploy. We provide API endpoints and more !
✔ We simplify your architecture : we deploy your ML models for you in few clicks
✔ We simplify your code : everything can be automated (via API / CLI)
✔ We reduce your costs : you reduce the time-to-production from weeks to second, pay as you go
✔ We ﬁx your mains challenges : we provide Scaling, Monitoring and Versioning
Our extra value :

Demo
5

Serving Hub
6

Serving Hub
7
Model CRD

Serving Hub
8
Token CRD

Serving Hub
9
Auth API
• Generate tokens
• Check tokens
Hub API, model management
• Create
• Delete
• Update
Web APIs

Serving Hub
10

Serving Hub
11
• Image building:
– Build image with model ﬁles from storage
– Push image to the registry
• Model Lifecycle:
– ApiStatus: describe the state of the runtime API
– VersionStatus: describe the state of the model image
Model Controller

Serving Hub
12
Example

Monitoring
13

Serving Hub
14
• Ingress controller:
– Count of HTTP requests by method and status code
– Sum of HTTP latencies by method and status code
• k8s:
– Number of pods
– Number of Model CRD
• Custom Model Metrics:
– Liveness
– Version
– Version status
– API status
Metrics

Serving Hub
15
• HPA: horizontal pod autoscaler
• RAM usage +60%
• CPU usage +60%
• Params:
– Max/Min threshold
– Scale decision: % and which resource
• To come: GPU usage
Auto Scaling

Serving Runtime
16

Serving Runtime
17
Prerequisites :
• Support ONNX & TensorFlow (TF) & PMML serialization format
• Able to chain several models of diﬀerent kind
• Available through a web service API
HTTP Query Preprocessing Model
Postprocessin
g
HTTP
Response
Example :

Serving Runtime
18
Inputs
• ?
Evaluator
•ONNX
•TensorFlow
•PMML
Outputs
• ?
Think generic :
Let's create one Evaluator per supported serialization
format.
What are the common inputs & outputs ?

Serving Runtime
19
ONNX & TensorFlow:
Inputs & outputs :
• List n-dimensional arrays (i.e tensors)
identify by names and shapes
Example :
0 1 0
1 0 0
0 0 1
tensor_A (3,3)
3.2 0.1 8.7 6.0
0.0 1.2 2.0 7.7
tensor_B (2, 4)

Serving Runtime
20
PMML
Inputs & outputs :
• Tabular data (i.e Dataset) Can be interpreted as a list of named tensors :
Example :
prop_int prop_bool prop_string
1 true "John"
6 true "Kim"
8 false "Hugo"
1 6 8
prop_int (3, 1)
true true false
prop_int (3, 1)
"John" "Kim" "Hugo"
prop_string (3, 1)

Serving Runtime
21
Inputs
• List of named tensors
Evaluator
•ONNX
•TensorFlow
•PMML
Outputs
•List of named tensors
Think generic :
What are the common inputs & outputs ?

Serving Runtime
22
Web API
How to convert http query into a named tensors ?
How to convert named tensors into http response ?
Use the Content-Type header to decode/encode message body
?
?
..

Serving Runtime
23
Web API
• Content-Type: application/json
0 1 0
1 0 0
0 0 1
tensor_A (3,3)
3.2 0.1 8.7 6.0
0.0 1.2 2.0 7.7
tensor_B (2, 3)
{
"tensor_A": [
[0, 1, 0],
[1, 0, 0],
[0, 0, 1]
],
"tensor_B": [
[3.2, 0.1, 8.7, 6.0],
[0.0, 1.2, 2.0, 7.7]
]
}

Serving Runtime
24
Web API
• Content-Type: image/png
... ... ...
(253, 152, 6) (253, 13, 10) ...
(200, 100, 255) (10, 10, 6) ...
image (1, width, height, 3)
width
h
e
i
g
h
t

Serving Runtime
25
Web API
Other supported Content-Type :
• image/jpeg
• multipart/form-data
• text/html
Later :
• application/protobuf

Serving Runtime
26
Web API
Example with available ONNX model :
https://github.com/onnx/models/tree/master/vision/style_transfer/fast_neural_style

ML Serving
27

Thanks!
Maël Le Gal github:mael-le-gal
Christophe Rannou @ChrisRannou

OVHcloud TechTalks - ML serving

More Related Content

Similar to OVHcloud TechTalks - ML serving

More from OVHcloud

Recently uploaded

OVHcloud TechTalks - ML serving