Prometheus alerts repository

In this repository you will find the Prometheus-based alerts deployed to production, split by team.

The alerts will be deployed to all site-local Prometheus instances by default (i.e. ops, k8s, etc)

For more information refer to Alertmanager's wikitech page: https://wikitech.wikimedia.org/wiki/Alertmanager

Testing

CI will run tox on this repository at code review time. You can also run tests locally by calling tox (python 3). You'll also need to have the following tools in your $PATH:

promtool Available in Linux distributions, or https://github.com/prometheus/prometheus/releases (>= 2.10 required)
pint Available as a Debian package from https://wikitech.wikimedia.org/wiki/APT_repository or a single binary from https://github.com/cloudflare/pint/releases

On Debian systems the promtool binary is part of prometheus package, which will also start the Prometheus server. To stop the server and stop it from starting at boot issue the following:

systemctl stop prometheus
systemctl mask prometheus

To also disable the timers for various node exporters, run:

systemctl list-timers prometheus* | perl -ne 'print "$1\n" if /(prometheus-.+\.timer)/' | \
    xargs sudo systemctl disable

Finally, to also disable pint at startup run the following:

systemctl stop pint
systemctl mask pint

Testing with Docker

Tests can run locally using the CI image blubber file provided with this repository.

Build an image from .pipeline/blubber.yaml with:

DOCKER_BUILDKIT=1 docker build --target test -t alerts-tests -f .pipeline/blubber.yaml .

Run the test container with:

docker run -u $(id -u) --entrypoint tox -v $(pwd):/srv/app alerts-tests

Deploying

The repository is self-service for wmf LDAP group users. In other words, a +2 will trigger CI tests and merge (if tests pass). Post-merge the alerts will be deployed at the next Puppet run (i.e. in 30 min).

Name		Name	Last commit message	Last commit date
Latest commit History 1,087 Commits
.pipeline		.pipeline
team-collaboration-services		team-collaboration-services
team-data-engineering		team-data-engineering
team-data-persistence		team-data-persistence
team-data-platform		team-data-platform
team-dcops		team-dcops
team-ml		team-ml
team-netops		team-netops
team-o11y		team-o11y
team-perf		team-perf
team-search-platform		team-search-platform
team-sre		team-sre
team-structured-data		team-structured-data
team-traffic		team-traffic
team-wmcs		team-wmcs
.dockerignore		.dockerignore
.gitignore		.gitignore
.gitreview		.gitreview
.pint.hcl		.pint.hcl
LICENSE		LICENSE
README.md		README.md
diff		diff
pytest.ini		pytest.ini
test_alerts.py		test_alerts.py
tox.ini		tox.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Prometheus alerts repository

Testing

Testing with Docker

Deploying

About

Uh oh!

Contributors 53

Uh oh!

Languages

License

wikimedia/operations-alerts

Folders and files

Latest commit

History

Repository files navigation

Prometheus alerts repository

Testing

Testing with Docker

Deploying

About

Resources

License

Code of conduct

Uh oh!

Stars

Watchers

Forks

Contributors 53

Uh oh!

Languages