Skip to content

sayed6201/operationalizing_machine_learning

Repository files navigation

Operationalizing Machine Learning Pipeline

Summary

This project is a part of the Udacity Azure ML Nanodegree. The Machine Learning project aims to predict if a client will subscribe to a term deposit of a Portuguese banking institution. Bankmarketing Datset was used to train the models. The models were trained using AutoML and best Machine Learning model was deployed as an webservice to Azure Container Instance. Finally the process was automated by creating and deploying a pipeline.

Architectural Diagram

diagram.PNG

The models can be trained using 2 mathods or ways, an Automated or a non Automated way. In the non automated way (method 1) the model is trained and then deployed. It does not automat the workflows. Whereas in the second method, model is trained via a published pipeline. Pipeline is an indepdently executable workflow of a complete Machine Learning task which enables external services to interact with it so that it can do work more efficiently. In this project model was trained using both methods. Best performing model was deployed as a webservice which allowed endpoints to intereact with it and get reponse. Then a pipeline was also created and published, which enabled the exernal or enternal services to interact with it via HTTP API and trian the model.

Key Steps

Step 1: Automated Machine Learning Experiment

To train models through AutoML i followed following steps

  • Firstly, I registered the dataset into the Azure ML Studio from the URI. The screenshot below shows the registered dataset. Dataset registered

  • Secondly, I created compute cluster. The configuration of the compute cluster can be seen from the screenshot below. Compute Cluster Configuration

  • Thridly, I trained a number of models with AutoML. The screenshot below shows successful completion of the AutoML Experiment. Automl Completed

    The screenshot below shows top 11 models with the higest accuracy. VotingEnsamble model had the higest accuracy of all Automl Models

Step 2: Deploying The Best Model

  • The best model was VotingEnsamble model which had accuracy of 92%. Some details of the model can be viewed in the screenshot below votingensamble model

  • The votingEnsamble model was then deployed using Azure Container Instance with authentication enabled. deploying best model.png

Step 3: Enable Logging

Through Azure Application insight the logging data of deployed model can be monitored. It comes in handy in detecting any failure or anomalies

  • Firstly, i edited the logs.py file and set application insight to true and then executed the python script logs.py file

    logs.py getting executed. logs.py executing

  • After running the logs.py the Application insight got enabled in the Azure ML Studio. application insight enabled

    application insight enabled

Step 4: Swagger Documentation

Swagger documentation is loaded in the localhost using swagger.json from the deployed model. Swagger Swagger

Step 5: Consuming Model Endpoints

  • Firstly, I edited the endpoint.py script with scoring_uri and key from the deployed model. The script sends HTTP request to the deployed model. The screenshot below shows endpoint.py script Endpoint py

  • After executing the endpoint.py script the model sends follwing response. Response from the deployed model

  • Then i benchmarked the endpoint with Apache benchmark. benchmarking the endpoint

Step 6: Create, Publish and Consume a Pipeline

To automate the workflow we need to create and publish pipeline. Published pipelines allow external services to interact with them so that they can do work more efficiently.

  • The following image shows successful completion of pipeline creation Pipeline created
    Pipeline created Pipeline created

  • Then, the creted pipeline was published. Pipeline published

  • Finally, the published pipeline rest endpoint was used to create new pipeline. Pipeline created Pipeline created Pipeline created

Screen Recording

View screencast for this project

Standout Suggestions

To improve the performance further, the imbalance issue of the dataset can be resolved. Also AutoMl with deep learning may bring noteworthy improvement in the performance.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •