Skip to content

Conversation

@eedorenko
Copy link
Contributor

@eedorenko eedorenko commented Mar 2, 2020

  1. A pipeline has the following parameters added:
  • caller_run_id - for tracking purposes to identify who called the pipeline (e.g. ADF pipeline run id)
  • dataset_version - makes it possible to rerun the pipeline with a specific dataset version. By default "latest"
  • data_file_path - if provided, a pipeline registers a new version of the dataset pointing to the data_file_path
  1. It is trackable now what dataset version (pointing to specific data file) was used to train the model
  2. It is trackable now what models were trained with a given dataset version

@eedorenko eedorenko requested review from dtzar and sudivate March 2, 2020 17:46
@eedorenko eedorenko merged commit 466800e into master Mar 2, 2020
@eedorenko eedorenko deleted the eedorenko/adf-dataset-version branch March 2, 2020 22:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants