-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Training a model with R on ML Compute and Databricks #77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
63 commits
Select commit
Hold shift + click to select a range
cd04bc4
Play with R
4abce3c
playing with r
45a30d3
play with r
348621f
Play with R
938097d
playing with r
43c397b
playing with r
c2f6118
playing with R
20a6a46
playing with r
cbfc882
playing with R
ba21e29
playing with R
f1f7739
playing with r
4395651
playing with docker
762d9fd
Playing with R
2c406fc
Playing with R
ed84811
run R on Databricks
05ab714
Playing with R
3ef020d
Play with R on Databricks
727c5cf
Playing with R
1abbef8
playing with R
57d6f99
playing with R
fd6ebe7
playing with r
fbabebf
play with R
a01de6c
play with R
ddf0399
play with R
61d455e
play with R
4350785
play with R
df93cd0
play with R
43b8f6c
play with R
2f3f90e
playing with R
5c13088
play with R
009d08e
playing with R
6daf7ff
playing with R
ce83510
playing with R
7c9f9ae
play with R
7614b0b
playing with R
59864b1
play with R
182ac1c
playing with R
77795f7
Playing with R
3683948
Playing with R
8a7d0f9
Linting
8c719e9
Playing with R
7e34be2
Playimng with R
f7fe561
linting
4abbad0
linting
686d167
Playing with R
5ef7a01
Playing with R
0283fb2
Playing with R
cd6f9e5
Playing with R
b05b0cd
Playing with R
7c488c6
Playing with R
e8875b1
Playing with R
710f5cf
Playing with R
9e9fcc8
Playing with R
7593dc5
public image
4828147
remove garbage
d0262db
azureml is needed for data test
4e9cf48
doc update
5d7d277
Doc update
e12f957
doc update
5c2a758
doc update
586e468
Doc update
e56499c
Review fixes
a55b744
Fix typo
dtzar File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,41 @@ | ||
| print(R.version.string) | ||
|
|
||
| # COMMAND ---------- | ||
|
|
||
| path="weight_data.csv" | ||
| print(paste("Reading file from",path)) | ||
|
|
||
| routes<-read.csv(path, header=TRUE) | ||
|
|
||
| # The predictor vector (height). | ||
| x <- routes$height | ||
| # The response vector (weight). | ||
| y <- routes$weight | ||
| # Apply the lm() function. | ||
| model <- lm(y~x) | ||
|
|
||
| # COMMAND ---------- | ||
|
|
||
| routes | ||
|
|
||
| # COMMAND ---------- | ||
|
|
||
| # Make Predictions | ||
| df_test_heights <- data.frame(x = as.numeric(c(115,20))) | ||
| result <- predict(model,df_test_heights) | ||
| print(result) | ||
|
|
||
| # COMMAND ---------- | ||
|
|
||
| # Save the model to blob storage | ||
| model_path="model.rds" | ||
| saveRDS(model, model_path) | ||
|
|
||
| # COMMAND ---------- | ||
|
|
||
| # View model details | ||
| print(model) | ||
|
|
||
| # COMMAND ---------- | ||
|
|
||
| print('Completed') |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,3 @@ | ||
| import os | ||
|
|
||
| os.system("Rscript r_train.r && ls -ltr model.rds") |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,15 @@ | ||
| import os | ||
| import argparse | ||
|
|
||
| parser = argparse.ArgumentParser("train") | ||
| parser.add_argument( | ||
| "--AZUREML_SCRIPT_DIRECTORY_NAME", | ||
| type=str, | ||
| help="folder", | ||
| ) | ||
|
|
||
| args, unknown = parser.parse_known_args() | ||
| folder = args.AZUREML_SCRIPT_DIRECTORY_NAME | ||
|
|
||
| os.system("cd " + "/dbfs/" + folder + | ||
| " && Rscript r_train.r && ls -ltr model.rds") |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,30 @@ | ||
| height,weight | ||
| 79,174 | ||
| 63,250 | ||
| 75,223 | ||
| 75,130 | ||
| 70,120 | ||
| 76,239 | ||
| 63,129 | ||
| 64,185 | ||
| 59,246 | ||
| 80,241 | ||
| 79,217 | ||
| 65,212 | ||
| 74,242 | ||
| 71,223 | ||
| 61,167 | ||
| 78,148 | ||
| 75,229 | ||
| 75,116 | ||
| 75,182 | ||
| 72,237 | ||
| 72,160 | ||
| 79,169 | ||
| 67,219 | ||
| 61,202 | ||
| 65,168 | ||
| 79,181 | ||
| 81,214 | ||
| 78,216 | ||
| 59,245 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,5 +1,6 @@ | ||
| pytest==4.3.0 | ||
| requests>=2.22 | ||
| azureml>=0.2 | ||
| azureml-sdk>=1.0 | ||
| python-dotenv>=0.10.3 | ||
| flake8 | ||
|
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,78 @@ | ||
| from azureml.pipeline.steps import PythonScriptStep | ||
| from azureml.pipeline.core import Pipeline # , PipelineData | ||
| from azureml.core.runconfig import RunConfiguration, CondaDependencies | ||
| # from azureml.core import Datastore | ||
| import os | ||
| import sys | ||
| from dotenv import load_dotenv | ||
| sys.path.append(os.path.abspath("./ml_service/util")) # NOQA: E402 | ||
| from workspace import get_workspace | ||
| from attach_compute import get_compute | ||
|
|
||
|
|
||
| def main(): | ||
| load_dotenv() | ||
| workspace_name = os.environ.get("BASE_NAME")+"-AML-WS" | ||
| resource_group = os.environ.get("BASE_NAME")+"-AML-RG" | ||
| subscription_id = os.environ.get("SUBSCRIPTION_ID") | ||
| tenant_id = os.environ.get("TENANT_ID") | ||
| app_id = os.environ.get("SP_APP_ID") | ||
| app_secret = os.environ.get("SP_APP_SECRET") | ||
| vm_size = os.environ.get("AML_COMPUTE_CLUSTER_CPU_SKU") | ||
| compute_name = os.environ.get("AML_COMPUTE_CLUSTER_NAME") | ||
| build_id = os.environ.get("BUILD_BUILDID") | ||
| pipeline_name = os.environ.get("TRAINING_PIPELINE_NAME") | ||
|
|
||
| # Get Azure machine learning workspace | ||
| aml_workspace = get_workspace( | ||
| workspace_name, | ||
| resource_group, | ||
| subscription_id, | ||
| tenant_id, | ||
| app_id, | ||
| app_secret) | ||
| print(aml_workspace) | ||
|
|
||
| # Get Azure machine learning cluster | ||
| aml_compute = get_compute( | ||
| aml_workspace, | ||
| compute_name, | ||
| vm_size) | ||
| if aml_compute is not None: | ||
| print(aml_compute) | ||
|
|
||
| run_config = RunConfiguration(conda_dependencies=CondaDependencies.create( | ||
| conda_packages=['numpy', 'pandas', | ||
| 'scikit-learn', 'tensorflow', 'keras'], | ||
| pip_packages=['azure', 'azureml-core', | ||
| 'azure-storage', | ||
| 'azure-storage-blob']) | ||
| ) | ||
| run_config.environment.docker.enabled = True | ||
| run_config.environment.docker.base_image = "mcr.microsoft.com/mlops/python" | ||
|
|
||
| train_step = PythonScriptStep( | ||
| name="Train Model", | ||
| script_name="train_with_r.py", | ||
| compute_target=aml_compute, | ||
| source_directory="code/training/R", | ||
| runconfig=run_config, | ||
| allow_reuse=False, | ||
| ) | ||
| print("Step Train created") | ||
|
|
||
| steps = [train_step] | ||
|
|
||
| train_pipeline = Pipeline(workspace=aml_workspace, steps=steps) | ||
| train_pipeline.validate() | ||
| published_pipeline = train_pipeline.publish( | ||
| name=pipeline_name + "_with_R", | ||
| description="Model training/retraining pipeline", | ||
| version=build_id | ||
| ) | ||
| print(f'Published pipeline: {published_pipeline.name}') | ||
| print(f'for build {published_pipeline.version}') | ||
|
|
||
|
|
||
| if __name__ == '__main__': | ||
| main() |
70 changes: 70 additions & 0 deletions
70
ml_service/pipelines/build_train_pipeline_with_r_on_dbricks.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,70 @@ | ||
| from azureml.pipeline.core import Pipeline | ||
| import os | ||
| import sys | ||
| from dotenv import load_dotenv | ||
| sys.path.append(os.path.abspath("./ml_service/util")) # NOQA: E402 | ||
| from workspace import get_workspace | ||
| from attach_compute import get_compute | ||
| from azureml.pipeline.steps import DatabricksStep | ||
|
|
||
|
|
||
| def main(): | ||
| load_dotenv() | ||
| workspace_name = os.environ.get("BASE_NAME")+"-AML-WS" | ||
| resource_group = os.environ.get("BASE_NAME")+"-AML-RG" | ||
| subscription_id = os.environ.get("SUBSCRIPTION_ID") | ||
| tenant_id = os.environ.get("TENANT_ID") | ||
| app_id = os.environ.get("SP_APP_ID") | ||
| app_secret = os.environ.get("SP_APP_SECRET") | ||
| vm_size = os.environ.get("AML_COMPUTE_CLUSTER_CPU_SKU") | ||
| compute_name = os.environ.get("DATABRICKS_COMPUTE_NAME") | ||
| db_cluster_id = os.environ.get("DB_CLUSTER_ID") | ||
| build_id = os.environ.get("BUILD_BUILDID") | ||
| pipeline_name = os.environ.get("TRAINING_PIPELINE_NAME") | ||
|
|
||
| # Get Azure machine learning workspace | ||
| aml_workspace = get_workspace( | ||
| workspace_name, | ||
| resource_group, | ||
| subscription_id, | ||
| tenant_id, | ||
| app_id, | ||
| app_secret) | ||
| print(aml_workspace) | ||
|
|
||
| # Get Azure machine learning cluster | ||
| aml_compute = get_compute( | ||
| aml_workspace, | ||
| compute_name, | ||
| vm_size) | ||
| if aml_compute is not None: | ||
| print(aml_compute) | ||
|
|
||
| train_step = DatabricksStep( | ||
| name="DBPythonInLocalMachine", | ||
| num_workers=1, | ||
| python_script_name="train_with_r_on_databricks.py", | ||
| source_directory="code/training/R", | ||
| run_name='DB_Python_R_demo', | ||
| existing_cluster_id=db_cluster_id, | ||
| compute_target=aml_compute, | ||
| allow_reuse=False | ||
| ) | ||
|
|
||
| print("Step Train created") | ||
|
|
||
| steps = [train_step] | ||
|
|
||
| train_pipeline = Pipeline(workspace=aml_workspace, steps=steps) | ||
| train_pipeline.validate() | ||
| published_pipeline = train_pipeline.publish( | ||
| name=pipeline_name + "_with_R_on_DB", | ||
| description="Model training/retraining pipeline", | ||
| version=build_id | ||
| ) | ||
| print(f'Published pipeline: {published_pipeline.name}') | ||
| print(f'for build {published_pipeline.version}') | ||
|
|
||
|
|
||
| if __name__ == '__main__': | ||
| main() |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.