Monitoring

Monitoring module allows to inject user defined metrics and monitor the process itself. It supports multiple backends, protocols and data formats.

Installation

aliBuild

Click here if you don't have aliBuild installed

Compile Monitoring and its dependecies via aliBuild

aliBuild init Monitoring@master
aliBuild build Monitoring --defaults o2-daq

Load the enviroment for Monitoring (in the alice directory)

alienv load Monitoring/latest

In case of an issue with aliBuild refer to the official instructions.

Manual

Manual installation of the O² Monitoring module.

Requirements

C++ compiler with C++17 support, eg.:
- gcc-c++ package from devtoolset-7 on CentOS 7
- clang++ on Mac OS
Boost >= 1.56
libcurl
ApMon (optional)

Monitoring module compilation

git clone https://github.com/AliceO2Group/Monitoring.git
cd Monitoring; mkdir build; cd build
cmake .. -DCMAKE_INSTALL_PREFIX=<installdir>
make -j
make install

Getting started

Monitoring instance

The recommended way of getting monitoring instance is Geting it from MonitoringFactory by passing backend's URI(s) as a parameter (comma separated if more than one). The factory is accessible from o2::monitoring namespace.

#include <MonitoringFactory.h>
using namespace o2::monitoring;
std::unique_ptr<Monitoring> monitoring = MonitoringFactory::Get("backend[-protocol]://host:port[/verbosity][?query]");

See table below to find out how to create URI for each backend:

Backend name	Transport	URI backend[-protocol]	URI query	Default verbosity
InfluxDB	HTTP	`influxdb-http`	`?db=<db>`	`info`
InfluxDB	UDP	`influxdb-udp`	-	`info`
InfluxDB	Unix datagram	`influxdb-unix`	-	`info`
ApMon	UDP	`apmon`	-	`info`
StdOut	-	`stdout`, `infologger`	-	`debug`
Flume	UDP	`flume`	-	`info`

StdCout output format

[METRIC] <name>,<type> <value> <timestamp> <tags>

Metrics

A metric consist of 5 parameters: name, value, timestamp, verbosity and tags.

Parameter name	Type	Required	Default
name	string	yes	-
value	int / double / string / uint64_t	yes	-
timestamp	chrono::time_point<std::chrono::system_clock>	no	current timestamp
verbosity	Debug / Info / Prod	no	Info
tags	vector	no	-

A metric can be constructed by providing required parameters (value and name):

Metric{10, "name"}

Verbosity

There are 3 verbosity levels (the same as for backends): Debug, Info, Prod. The default verbosity is set using: Metric::setDefaultVerbosity(verbosity). To overwrite verbosity on per metric basis use third, optional parameter to metric constructor:

Metric{10, "name", Verbosity::Prod}

Metrics need to match backends verbosity in order to be sent, eg. backend with /info verbosity will accept Info and Prod metrics only.

Sending metric

send(Metric&& metric, [DerivedMetricMode mode])

See how it works in the example: examples/1-Basic.cxx.

The DerivedMetricMode is optional and described in Calculating derived metrics section.

Advanced features

Sending more than one metric

In order to send more than one metric in a packet group them into vector:

monitoring->send(std::vector<Metric>&& metrics);

It's also possible to send multiple, grouped values (only Flume and InfluxDB backends are supported); For example cpu metric can be composed of cpuUser, cpuSystem values.

void sendGroupped(std::string name, std::vector<Metric>&& metrics)

See how it works in the example: examples/8-Multiple.cxx

Buffering metrics

In order to avoid sending each metric separately, metrics can be temporary stored in the buffer and flushed at the most convenient moment. This feature can be operated with following two methods:

monitoring->enableBuffering(const std::size_t maxSize)
...
monitoring->flushBuffer();

enableBuffering takes maximum buffer size as its parameter. The buffer gets full all values are flushed automatically.

See how it works in the example: examples/10-Buffering.cxx.

Calculating derived metrics

The module can calculate derived metrics. To do so, use optional DerivedMetricMode mode parameter of send method:

DerivedMetricMode::NONE - no action,
DerivedMetricMode::RATE - rate between two following metrics,
DerivedMetricMode::AVERAGE - average value of all metrics stored in cache.

Derived metrics are generated each time as new value is passed to the module. Their names are suffixed with derived mode name.

See how it works in the example: examples/4-RateDerivedMetric.cxx.

Global tags

Glabal tags are tags that are added to each metric. The following tags are set to global by library itself:

hostname
name - process name

You can add your own global tag by calling addGlobalTag(std::string_view key, std::string_view value) or addGlobalTag(tags::Key, tags::Value).

Process monitoring

enableProcessMonitoring([interval in seconds]);

The following metrics are generated every interval:

cpuUsedPercentage - percentage of a core usage over time interval
involuntaryContextSwitches - involuntary context switches over time interval
memoryUsagePercentage - ratio of the process's resident set size to the physical memory on the machine, expressed as a percentage (Linux only)

Automatic metric updates

Sometimes it's necessary to provide value every exact interval of time (even though value does not change). This can be done using AutoPushMetric.

ComplexMetric& metric = monitoring->getAutoPushMetric("exampleMetric");
metric = 10;

See how it works in the example: examples/11-AutoUpdate.cxx.

System monitoring, server-side backends installation and configuration

This guide explains manual installation. For ansible deployment see AliceO2Group/system-configuration gitlab repo.

Collectd
Flume
InfluxDB
Grafana
MonALISA (external link)

Name		Name	Last commit message	Last commit date
Latest commit History 387 Commits
cmake		cmake
config		config
doc		doc
examples		examples
include/Monitoring		include/Monitoring
src		src
test		test
.clang-format		.clang-format
.codecov.yml		.codecov.yml
.gitignore		.gitignore
.travis.yml		.travis.yml
CMakeLists.txt		CMakeLists.txt
COPYING		COPYING
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Monitoring

Table of contents

Installation

aliBuild

Manual

Requirements

Monitoring module compilation

Getting started

Monitoring instance

StdCout output format

Metrics

Verbosity

Tags

Sending metric

Advanced features

Sending more than one metric

Buffering metrics

Calculating derived metrics

Global tags

Process monitoring

Automatic metric updates

System monitoring, server-side backends installation and configuration

About

Uh oh!

Releases 86

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Monitoring

Table of contents

Installation

aliBuild

Manual

Requirements

Monitoring module compilation

Getting started

Monitoring instance

StdCout output format

Metrics

Verbosity

Tags

Sending metric

Advanced features

Sending more than one metric

Buffering metrics

Calculating derived metrics

Global tags

Process monitoring

Automatic metric updates

System monitoring, server-side backends installation and configuration

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 86

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages