Operationalizing Machine Learning Pipeline

Summary

This project is a part of the Udacity Azure ML Nanodegree. The Machine Learning project aims to predict if a client will subscribe to a term deposit of a Portuguese banking institution. Bankmarketing Datset was used to train the models. The models were trained using AutoML and best Machine Learning model was deployed as an webservice to Azure Container Instance. Finally the process was automated by creating and deploying a pipeline.

Architectural Diagram

The models can be trained using 2 mathods or ways, an Automated or a non Automated way. In the non automated way (method 1) the model is trained and then deployed. It does not automat the workflows. Whereas in the second method, model is trained via a published pipeline. Pipeline is an indepdently executable workflow of a complete Machine Learning task which enables external services to interact with it so that it can do work more efficiently. In this project model was trained using both methods. Best performing model was deployed as a webservice which allowed endpoints to intereact with it and get reponse. Then a pipeline was also created and published, which enabled the exernal or enternal services to interact with it via HTTP API and trian the model.

Key Steps

Step 1: Automated Machine Learning Experiment

To train models through AutoML i followed following steps

Firstly, I registered the dataset into the Azure ML Studio from the URI. The screenshot below shows the registered dataset.
Secondly, I created compute cluster. The configuration of the compute cluster can be seen from the screenshot below.
Thridly, I trained a number of models with AutoML. The screenshot below shows successful completion of the AutoML Experiment.

The screenshot below shows top 11 models with the higest accuracy. VotingEnsamble model had the higest accuracy of all

Step 2: Deploying The Best Model

The best model was VotingEnsamble model which had accuracy of 92%. Some details of the model can be viewed in the screenshot below
The votingEnsamble model was then deployed using Azure Container Instance with authentication enabled.

Step 3: Enable Logging

Through Azure Application insight the logging data of deployed model can be monitored. It comes in handy in detecting any failure or anomalies

Firstly, i edited the logs.py file and set application insight to true and then executed the python script

logs.py getting executed.
After running the logs.py the Application insight got enabled in the Azure ML Studio.

Step 4: Swagger Documentation

Swagger documentation is loaded in the localhost using swagger.json from the deployed model.

Step 5: Consuming Model Endpoints

Firstly, I edited the endpoint.py script with scoring_uri and key from the deployed model. The script sends HTTP request to the deployed model. The screenshot below shows endpoint.py script
After executing the endpoint.py script the model sends follwing response.
Then i benchmarked the endpoint with Apache benchmark.

Step 6: Create, Publish and Consume a Pipeline

To automate the workflow we need to create and publish pipeline. Published pipelines allow external services to interact with them so that they can do work more efficiently.

The following image shows successful completion of pipeline creation
Then, the creted pipeline was published.
Finally, the published pipeline rest endpoint was used to create new pipeline.

Screen Recording

View screencast for this project

Standout Suggestions

To improve the performance further, the imbalance issue of the dataset can be resolved. Also AutoMl with deep learning may bring noteworthy improvement in the performance.

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
Swagger		Swagger
screenshots		screenshots
README.md		README.md
aml-exercise-pipelines-with-automated-machine-learning-step.ipynb		aml-exercise-pipelines-with-automated-machine-learning-step.ipynb
bankmarketing_train.csv		bankmarketing_train.csv
benchmark.sh		benchmark.sh
config.json		config.json
data.json		data.json
endpoint.py		endpoint.py
logs.py		logs.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Operationalizing Machine Learning Pipeline

Summary

Architectural Diagram

Key Steps