operatordocumentation.md

sort	3
title	Operator Documentation

Dashboard

The Dashboard displays the current state of the system by showing a number of status parameters related to wagons, trains and grid jobs. Additionally, the user can see the summary of the last week: average completion time, number of finished trains and wagon tests.
Below, a display of the grid jobs state during the previous week is displayed, for every site.

By clicking the number of wagons waiting to be included in a train, the user can directly open the Train Submission. Similarly, a link to the Train Runs is available by clicking the number of trains to be submitted to the grid, the number of running tests, or the number of finished trains.

Train Submission

For a user, the Train Submission page displays a read view only of datasets which have enabled wagons.
For a train operator, the Train Submission page displays only datasets which have enabled wagons, and allows train composition, as well as submitting, modifying and killing a train.

Train Composition

Trains are composed per dataset. Only wagons which have a test status of success 🌟 or warning ❗️ can be composed in a train.
By default, wagons that were enabled at most one week ago are shown. In order to display all enabled wagons, click on off in the Enabled column.
In order to compose a train, click ☑️ in the Compose column. The package tag will be automatically chosen, and other wagons that can be included in the train run are signalized with 🟢, and the ones which are not compatible with 🔴. All wagons that are compatible can be automatically chosen by clicking on ✅ Select all compatible wagons, or by selecting them one by one.
☑️ automatic composition: Train composition schedule is defined in the dataset settings. If the dataset has a defined schedule, the trains will be automatically composed at the specified time if the tests have finished without a warning.

If a wagon has Derived data tables activated, it will be signalized with the icon 🗂️ in the Test status column.
☑️ slow train: If enabled, the express train features are disabled. This means that you may have up to 2% more jobs which finish but the train run may take several days more.
☑️ derived data: If enabled, this train produces derived data to be used for further analysis. The results will not be merged and can be used as input for future train runs.
☑️ automatic submission: If enabled will submit the train automatically after the test is done and succeeds 🌟.
Target: Sets the facility/cores where the train will be run.
Finally, after defining the configuration, click on Compose 🚂 to compose a train. After composing a train run, the wagons selected cannot be selected for a different train run unless the current train run is decomposed. After the train run is submitted, the wagons will be disabled.
If a user changes a configuration between train composition and submission, the new configuration is not taken into account. The train runs with the wagons and dataset configuration corresponding to the time at which it was created.
The train will be automatically tested, and its progress can be followed in the Train Runs table, or in the Train Runs page by clicking on the TRAIN_ID link.

Staged Submission

Linked datasets are subsets of a big dataset
First, a train run needs to be Done on a smaller linked dataset before being run on a bigger dataset
Any user who is part of the analysis can request a long train, by going on the train result to the Request long train tab
Choose the dataset, target and wagons you want to include and click the Compose and request approval button.
Approval from the participating analyses PWGs conveners is required in order to submit a long train

Train Runs

For a user, the Train Runs page displays a read view only of all train runs available in the system.
For a train operator, the Train Runs page displays all trains available in the system, and allows submitting, modifying and killing a train.
To compare two trains, select them in the Compare column and click Compare. This will open a new tab displaying the differences between the two trains.

The train run detail can be accessed by clicking on the TRAIN_ID, or with the url https://alimonitor.cern.ch/hyperloop/train-run/TRAIN_ID.

The actions allowed in a train run:
- Adding a comment.
- Decomposing a train run by clicking on Decompose ☠️. When decomposing a train run, the wagons in it will be allowed to be composed in a new train run.
- Submitting a train run. Only train runs which test status is success 🌟 or warning ❗️ can be submitted. By default, the train run is submitted to the Grid - single core, but a different target can be chosen. To submit a train run, click on Submit 🚂.
- When a dataset has activate final merge, a button Submit final merge appears after all the masterjobs are finished successfully.
- Killing a train by clicking on Kill ☠️. This will stop all running jobs.
- Closing a train by clicking Close ❌. This will stop further activity, but will keep the jobs running.
- Cloning a train in the tab Clone. When cloning a train, the wagons and dataset configuration used will be the same as of the original train. Other settings can be changed: package tag, target facility, slow train, derived data, automatic submission.

Train Run Result

The General tab displays the summary of the train's progress, direct links to dataset and participating wagon configuration, as well as direct links to the test output and the speedscope profiling of the task.

The Test results tab shows the performance metrics per device (reader, workflows, writer), along with the expected resources. You can use the interactive graphs (per device) to zoom into the area of interest (click and drag) or zoom out (double-click).

In the Test Graphs tab, you can plot the available metrics for the specific Train run. By hovering over the graph, the corresponding values are displayed in a dynamic window, stating the value for each participating wagon.
- The metric can be selected from the upper-left dropdown, and the graph will change accordingly.
- To plot the metric data per device, select the Per Device checkbox near the dropdown.
- In order to plot the highest ten graphs, that means the graphs with the highest average, click the Show top 10 largest checkbox.
- You can zoom into the graph by clicking and dragging the mouse along the area of interest. For zooming out, double-click on the graph.
In Submitted jobs, you can see the summary of the master jobs, along with links to the IO Statistics and Stack trace.

Click the IO Statistics button to be redirected to the site activity information.

Click the Stack trace button to be redirected to the stack trace information in MonALISA. Here you can see a summary of failures of your jobs.

This information is collected when the masterjobs have finished from all ERROR_V jobs. Some information is already available while the train is running but make sure to check again when the train is in a final state. Common errors are grouped and counted. This allows you to investigate failures and debug them using the provided stack trace.
The Grid statistics tab presents a summary of the jobs performance and plots the Files/Job, CPU time/Job and Wall time/Job statitics.

If the train is run as a derived data production and there are activated tables, the Derived data tab will be showed. This displays the tables which are produced by the task and saved to the output.

Merged output displays the jobs status after submitting the train. The mergelists are defined in the dataset settings.

When the final merge is started manually by the operator, some of the runs may not be merged. You can copy the list of merged runs or the total list of runs by clicking on the (red) number.
Here you can also track the submission process, and debug issues that may have taken place.

You can use the Clone train tab to clone the train. The cloned train will have the same wagon timestamp of the original train, with the current dataset configuration. This means that if the users have changed the wagon configuration in the meanwhile, this is not taken into account (this is different from the LEGO trains).
Other settings can be modified: package tag, target facility, slow train option, derived data, automatic submission.

The Request long train tab allows users to request a long train after the train ran on a linked dataset. Linked datasets are subsets of a big dataset (set up in the Dataset settings). First, a train run needs to be Done on a smaller linked dataset before being run on a bigger dataset.
Any user who is part of the analysis can request a long train. Approval from the participating analyses PWGs conveners is required in order to submit a long train. Train operators or admins can also approve a long train, but it is usually done by the PWG.

Once the long train is approved:
- If Automatic Submission is enabled and the train test finished without memory warning and within the CPU time limit, the train is submitted
- Otherwise the Submit button is enabled and the operator can submit the train

Trains with issues

This page displays the list of train runs that are in state submitted but could not be merged due to specific problems:
- More than 30% of the jobs have errors
- The Analysis job has all jobs in a final state, but there is no merging job
- There is a merging job which is in a final state, but there is no final merge job
- There is a final merge job in final state, but the merging is not declared as done in the database
The operator must analyse this cases and decide upon resubmitting some of the jobs, launching the final merging submission where the errors are not significant, or killing the train when there are too many errors.

Datasets

The Datasets page displays all the datasets available. The operator can add, remove, activate or deactivate a dataset.
The user can browse and click on the Dataset they want to add to their analysis.

Inside of the Dataset view page, click on the button ✚ Add dataset to analysis. It will display a list of all the analyses you belong to. Select the Analysis you want to add the dataset to, and click on 💾 Save.
By clicking the 📝 button, the operator is able to modify the dataset in the Edit Dataset page.
The runlists will be received programmatically from the DPG.

Edit Dataset

Allows the operator to update the dataset properties. Firstly, the operator can update the name and description of the dataset, and activate or deactivate it by clicking the ❌ / ✅ button. In order to save the changes you made, click the Save all changes button.

In the Options box, you can add linked datasets to the current dataset, which will be used for the staged submission. Enabling Run final merging over all runs in this dataset will merge all the runs of all the productions during the final merging.
In the Analysis Facility Staging, the operator is able to stage or unstage the data to the available targets displayed in the dropdown.
- Choose a target from the dropdown list, and you will be notified of the amount of data required for the staging process. Confirm by clicking OK. The staging process will start once clicking the Save all changes button at the top right of the page.
- You can hover over the staging percentage to get a detailed status of the staging progress. By clicking on the percentage, you will be lead to a new tab where you can see the elaborate overview of the transfer requests on MonALISA.
- To unstage the data to a specific target, click the Unstage button. The unstaging process will start once clicking Save all changes.
In the Automatic Composition box, the operator is able to enable the automatic train composition. Choose the composition type, the maximum CPU time that can be consumed and the maximum number of trains that can be composed per week for an analysis.
- Scheduled: If within schedule, trains will be composed with the compatible wagons in the dataset, regardless of the occupation in target memory.
- Train full: The train will be composed only if the compatible wagons add up to at least 75% of the target memory.
- Scheduled and train full: The train will be composed if it is within schedule. If not in schedule, the train will only be composed if the compatible wagons occupy at least 75% of the target memory.
For all these cases, the trains will only be composed if the tests finished without a warning and if they do not store derived data.
Choose the days and times at which the trains should be composed.

Deciding on data to be processed depends on the dataset type

For RUN 2 data, the operator can add or remove a RUN 2 conversion train run.
For RUN 3 data and MC, the operator can add or remove a production. In order to create a new production, click on the +Production button. After choosing the collision type, anchor and MC Tag, select the runlist defined by the DPG and click +Add. If no runlist is available, contact the DPG specialists for creating one.
For derived data, you can add or remove a production. Create a production by selecting Data, choose the desired Period and select the required Derived train from the dropdown list.

Within the dataset production you can update the list of runs to be excluded.
The mergelist defines which runs are merged into one file at the end of the train running. The operator can add, update, activate or deactivate a mergelist in the dataset.

DPG Runlists

The DPG Runlists page is dedicated to the DPG experts and displays all the DPG runlists created for the datasets. The DPG expert can add, edit or remove a runlist.

Clicking on the the 📝 button will lead to the edit view, where the DPG expert can change the list of runs.

DPG experts can create a new runlist by clicking the +Add runlist button. In order to create the list of runs, the correct data type, anchor, tag and production must be selected.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dashboard

Train Submission

Train Composition

Staged Submission

Train Runs

Train Run Result

Trains with issues

Datasets

Edit Dataset

Deciding on data to be processed depends on the dataset type

DPG Runlists

FilesExpand file tree

operatordocumentation.md

Latest commit

History

operatordocumentation.md

File metadata and controls

Dashboard

Train Submission

Train Composition

Staged Submission

Train Runs

Train Run Result

Trains with issues

Datasets

Edit Dataset

Deciding on data to be processed depends on the dataset type

DPG Runlists