modelbuilder

ML.NET Model Builder Guide

Introduction

Model Builder is a simple UI tool that runs locally for developers to build, train and ship custom machine learning models in your applications. Try Model Builder preview now!.

Developers with no ML expertise can use this simple visual interface to connect to their data stored in files, SQL Server and more for training the model.

Model Builder leverages best in class automated machine learning (AutoML) to evaluate different models. It produces the best model for your scenario without any tuning required from the developer.

At the end, developers can generate code for training and consuming this model in their applications.

This readme provides the following details for Model Builder:

Installation

Model Builder can be installed on Visual Studio 2017/ 2019. You can install it by clicking "Download" from the Visual Studio Marketplace

Pre-requisites

Visual Studio 2017 15.9.12 or later or Visual Studio 2019
All SKU's of Visual Studio
.NET Core 2.1 SDK

Scenario

Which Machine Learning scenario is right for me?

Model Builder allows you to solve many real life scenarios by supporting a wide variety of machine learning tasks.

If you are trying to predict a value e.g. price of a car or how many units of this product will be sold these are examples of regression ML tasks.

If you are trying to classify data into two or multiple categories those are examples of classification machine learning tasks e.g.

If a sentiment is positive or negative is an example of binary classification
If a sentiment is positive, negative or neutral (2 or more categories) is an example of multi-classification task

Model Builder currently comes with two scenario templates which provide an example each of regression (i.e. Price Prediction) and binary-classification (i.e. Sentiment Analysis).

The custom scenario template allows you to pick the ML task during the train phase hence allowing you to build your own ML scenario.

Model Builder currently supports Regression, Binary-Classification and Multi-Classification tasks.

Pick the scenario template which makes the most sense for you!

Data

How do I get sample datasets and learn more?

The table above in the scenario section provides a list of datasets across ML scenarios and tasks.

You can use these datasets to get started with Model Builder

Scenario	Scenario Description	ML Task	Dataset	Column to Predict (Label)
Price Prediction	Predict the price for a particular item	Regression	Link	fare_amount
Sales Forecast	Forecast the sales for items this month	Regression	Link	ProductSales
Sentiment Analysis	Determine the sentiment for customer reviews as positive of negative	Binary-Classification	Link	Label
Spam Detection	Determine whether a particular email is a scam or not	Binary-Classification	Link	Label
Fraud Detection	Determine whether a particular transaction is fraud or not!	Binary-Classification	Link	Label
Issue Classification	Tag different issues or tickets filed into particular area tags	Multi-Classification	Link	Area

By default, Model Builder will take all but the column to predict (Label) as input for training the model.

The Label is what you are predicting, The rest of the columns are Features, which are attributes that help predict the Label. Limiting to most relevant attributes or features will result in providing you the best performance for your models.

We currently support .tsv, .csv and SQL as the data-types you can use as datasets for the Model Builder. The files should have a header row. For this preview Model Builder does not support *.txt files or ';' separated or ',' separated files

Train

How long should I train for?

Model Builder uses AutoML to explore multiple models to find you the best performing model.

In general longer training periods will allow AutoML to explore more models with multiple trainers and settings.

The table below summarizes the average time taken to get good performance for the datasets we tested with.

*Dataset Size	Dataset Type	Avg. Time to train*
0 - 10 Mb	Numeric and Text	10 sec
10 - 100 Mb	Numeric and Text	10 min
100 - 500 Mb	Numeric and Text	30 min
500 - 1 Gb	Numeric and Text	60 min
1 Gb+	Numeric and Text	3 hour+

The exact time to train is a function of a few parameters like:

The number of features or columns being used to predict
The type of columns i.e. text vs. numeric
The Type of machine learning task (e.g. regression vs. classification)

We have tested Model Builder with even 1TB dataset but building a high quality model for that size of dataset can take upto four days.

Evaluate

How do I understand my model performance?

Model Builder by default splits the data you provide into train and test data respectively. The train data (80% split) is used to train your model and the test data (20% split) is used to evaluate your model.

When using the Model Builder each scenario maps to a machine learning task. Each ML task has it’s own set of evaluation metrics. The table below describes these mappings of scenario and ML tasks.

Regression (e.g. Price Prediction)

The default metric for regression problems is RSquared, the value of RSquared ranges between 0 and 1. 1 is the best possible value or in other words the closer the value of RSquared to 1 the better your model is performing.

Other metrics reported such as absolute-loss, squared-loss and RMS loss are additional metrics which can be used to understand how your model is performing and comparing it against other regression models.

Binary Classification (e.g. Sentiment Analysis)

The default metric for classification problems is accuracy. Accuracy defines the proportion of correct predictions your model is making over the test dataset. The closer to 100% or 1.0 the better it is.

Other metrics reported such as AUC (Area under the curve) which measures the true positive rate vs. the false positive rate should be greater than 0.50 for models to be acceptable.

Additional metrics like F1 score can be used to control the balance between Precision and Recall.

Multi-Class Classification (e.g. Issue Classification)

The default metric for Multi-class classification is Micro Accuracy. The closer the Micro Accuracy to 100% or 1.0 the better it is.

Another important metric for Multi-class classification is Macro-accuracy, similar to Micro-accuracy the closer to 1.0 the better it is. A good way to think about these two is:

Micro-accuracy -- how often does an incoming ticket get classified to the right team?
Macro-accuracy -- for an average team, how often is an incoming ticket correct for their team?

For more details on understanding model evaluation metrics please refer to this guide which provides details on each of these metrics.

Improve

Train for a longer time, add more data or learn more on the web

There are a few things you can try to improve your model performance

Train for a longer period of time during the train phase. This will provide Automated ML to try out more trainers and find you a better model for your ML scenario
Sometimes the amount of data or quality of data is not sufficient to train a high quality ML Model. Try adding more data for Model Builder to operate on. In case of classification tasks, you can make sure that their is good amount of data available per classification category

Limitations

This is the first preview of Model Builder and has the following limitations. We are actively working on improving the experience in the future releases.

We currently support .tsv, .csv and SQL as the data-types you can use as datasets for the Model Builder. The files should have a header row. For this preview Model Builder does not support *.txt files or ';' separated or ',' separated files
Visual Studio 2017 15.9.12 or greater
Supported Visual Studio SKUs - Enterprise, Professional, and Community
SQL Server has a limit of 100K rows for training
Microsoft SQL Server Data Tools for Visual Studio 2017 is not supported.

Issues

Please file any issues you observe with the Model Builder extension @ https://aka.ms/modelbuilderissues

Name		Name	Last commit message	Last commit date
parent directory ..
assets		assets
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

readme.md

ML.NET Model Builder Guide

Introduction

Installation

Scenario

Data

Train

Evaluate

Regression (e.g. Price Prediction)

Binary Classification (e.g. Sentiment Analysis)

Multi-Class Classification (e.g. Issue Classification)

Improve

Limitations

Issues

FilesExpand file tree

modelbuilder

Directory actions

More options

Directory actions

More options

Latest commit

History

modelbuilder

Folders and files

parent directory

readme.md

ML.NET Model Builder Guide

Introduction

Installation

Scenario

Data

Train

Evaluate

Regression (e.g. Price Prediction)

Binary Classification (e.g. Sentiment Analysis)

Multi-Class Classification (e.g. Issue Classification)

Improve

Limitations

Issues