Monitoring module allows to inject user defined metrics and monitor the process itself. It supports multiple backends, protocols and data formats.
- Installation
- Getting started
- Features and additional information
- Code snippets
- System monitoring and server-side backends installation and configuration
Click here if you don't have aliBuild installed
- Compile
Monitoringand its dependecies viaaliBuild
aliBuild init Monitoring@master
aliBuild build Monitoring --defaults o2-daq
- Load the enviroment for Monitoring (in the
alicedirectory)
alienv load Monitoring/latest
In case of an issue with aliBuild refer to the official instructions.
Manual installation of the O2 Monitoring module.
- C++ compiler with C++17 support, eg.:
gcc-c++package fromdevtoolset-7on CentOS 7clang++on Mac OS
- Boost >= 1.56
- libcurl
- ApMon (optional)
git clone https://github.com/AliceO2Group/Monitoring.git
cd Monitoring; mkdir build; cd build
cmake .. -DCMAKE_INSTALL_PREFIX=<installdir>
make -j
make install
The recommended way of getting monitoring instance is Geting it from MonitoringFactory by passing backend's URI(s) as a parameter (comma separated if more than one).
The factory is accessible from o2::monitoring namespace.
#include <MonitoringFactory.h>
using namespace o2::monitoring;
std::unique_ptr<Monitoring> monitoring = MonitoringFactory::Get("backend[-protocol]://host:port[/verbosity][?query]");See table below to find out how to create URI for each backend:
| Backend name | Transport | URI backend[-protocol] | URI query | Default verbosity |
|---|---|---|---|---|
| InfluxDB | HTTP | influxdb-http |
?db=<db> |
info |
| InfluxDB | UDP | influxdb-udp |
- | info |
| InfluxDB | Unix datagram | influxdb-unix |
- | info |
| ApMon | UDP | apmon |
- | info |
| StdOut | - | stdout, infologger |
- | debug |
| Flume | UDP | flume |
- | info |
[METRIC] <name>,<type> <value> <timestamp> <tags>
A metric consist of 5 parameters: name, value, timestamp, verbosity and tags.
| Parameter name | Type | Required | Default |
|---|---|---|---|
| name | string | yes | - |
| value | int / double / string / uint64_t | yes | - |
| timestamp | chrono::time_point<std::chrono::system_clock> | no | current timestamp |
| verbosity | Debug / Info / Prod | no | Info |
| tags | vector | no | - |
A metric can be constructed by providing required parameters (value and name):
Metric{10, "name"}There are 3 verbosity levels (the same as for backends): Debug, Info, Prod. The default verbosity is set using: Metric::setDefaultVerbosity(verbosity).
To overwrite verbosity on per metric basis use third, optional parameter to metric constructor:
Metric{10, "name", Verbosity::Prod}Metrics need to match backends verbosity in order to be sent, eg. backend with /info verbosity will accept Info and Prod metrics only.
Each metric can be tagged with any number of predefined tags.
In order to do so use addTag(tags::Key, tags::Value) or addTag(tags::Key, unsigned short) methods. The latter method allows assigning numeric value to a tag.
See the example: examples/2-TaggedMetrics.cxx.
send(Metric&& metric, [DerivedMetricMode mode])See how it works in the example: examples/1-Basic.cxx.
The DerivedMetricMode is optional and described in Calculating derived metrics section.
In order to send more than one metric in a packet group them into vector:
monitoring->send(std::vector<Metric>&& metrics);It's also possible to send multiple, grouped values (only Flume and InfluxDB backends are supported); For example cpu metric can be composed of cpuUser, cpuSystem values.
void sendGroupped(std::string name, std::vector<Metric>&& metrics)See how it works in the example: examples/8-Multiple.cxx
In order to avoid sending each metric separately, metrics can be temporary stored in the buffer and flushed at the most convenient moment. This feature can be operated with following two methods:
monitoring->enableBuffering(const std::size_t maxSize)
...
monitoring->flushBuffer();enableBuffering takes maximum buffer size as its parameter. The buffer gets full all values are flushed automatically.
See how it works in the example: examples/10-Buffering.cxx.
The module can calculate derived metrics. To do so, use optional DerivedMetricMode mode parameter of send method:
DerivedMetricMode::NONE- no action,DerivedMetricMode::RATE- rate between two following metrics,DerivedMetricMode::AVERAGE- average value of all metrics stored in cache.
Derived metrics are generated each time as new value is passed to the module. Their names are suffixed with derived mode name.
See how it works in the example: examples/4-RateDerivedMetric.cxx.
Glabal tags are tags that are added to each metric. The following tags are set to global by library itself:
hostnamename- process name
You can add your own global tag by calling addGlobalTag(std::string_view key, std::string_view value) or addGlobalTag(tags::Key, tags::Value).
enableProcessMonitoring([interval in seconds]);The following metrics are generated every interval:
- cpuUsedPercentage - percentage of a core usage over time interval
- involuntaryContextSwitches - involuntary context switches over time interval
- memoryUsagePercentage - ratio of the process's resident set size to the physical memory on the machine, expressed as a percentage (Linux only)
Sometimes it's necessary to provide value every exact interval of time (even though value does not change). This can be done using AutoPushMetric.
ComplexMetric& metric = monitoring->getAutoPushMetric("exampleMetric");
metric = 10;See how it works in the example: examples/11-AutoUpdate.cxx.
This guide explains manual installation. For ansible deployment see AliceO2Group/system-configuration gitlab repo.