PuppetDB is a "fast, scalable and reliable" data warehouse for Puppet.
It saves data generated by Puppet: nodes' facts, catalogs and reports.
It can also work as a performant backend for exported resources being a recommended alternative to the older ActiveRecord storeconfigs interface.
It provides a faster backend also for Puppet's Inventory Service.
PuppetDB is opensource, written in Clojure (it requires >= JDK 1.6 and PuppetMasters with Puppet >= 2.7.12), and is delivered as an autonomous software.
The best source from where to retrieve it are the official Puppetlabs repositories.
For a complete overview of the installation process check the official documentation and choose between installatio from puppetdb module or from packages.
PuppetDB can persist data either on an embedded HSQLDB database or on PostgreSQL. The latter is definitively recommended on production environment where there are a few dozens of servers or more.
The configuration files are:
/etc/sysconfig/puppetdb (/etc/default/puppetdb on Debian) is the init script configuration file, here we can set JAVA settings like JAVA_ARGS or JAVA_BIN
/etc/puppetdb/conf.d/ is the configuration directory, here we may have different .ini files where to configure [global] settings, [database] backends, [command-processing] options, [jetty] parameters for HTTP connections and [repl] settings for remote runtime configurations (used for development/debugging).
/etc/puppetdb/log4j.properties is the logging config file based on Log4j.
/etc/puppet/puppetdb.conf is the configuration file for Puppet with the settings to be used by the PuppetDB terminus
PuppetDB provides a performances dashboard out of the box, we can use it to check how the software is working: http://<puppetdb.server>:8080/dashboard/. This is integral part of the PuppetDB software.
A nice frontend that allows interrogation of the PuppetDB from the web is PuppetBoard. This software is contributed from the community.
An incredibly useful module that provides faces, functions and hiera backends that work with PuppetDB is the puppetdbquery module. We can install it also from the Forge:
puppet module install dalen-puppetdbquery
This software is contributed from the community (and is becoming a standard de facto for PuppetDB querying inside Puppet modules).
PuppetDB exposes an HTTP API that uses a Command/Query Responsibility Separation (CQRS) pattern:
A standard REST API is used to query data.
The current (API v3) available endpoints are: metrics, fact-names, facts, nodes, resources, reports, events, event-counts, aggregate-event-counts, server-time
Explicit commands are used (via HTTP using the /commands/ URL) used to populate and modify data.
The current commands are: replace catalog, replace facts, deactivate node, store report.
There are different versions of the APIs as they evolve with PuppetDB versions.
As October 2013, Version 1 of the API is deprecated, Version 2 and 3 (the latter adds new endpoints and is recommended) are both supported.
We can access a specific version of the API using the relevant prefix:
http[s]://puppetdb.server/v#/<endpoint>/[NAME]/[VALUE][?query=<QUERY STRING>]
Queries to the REST endpoints can define search scope and limitations for the given endpoint. The query are sent as an "URL-encoded JSON array in prefix notation". Check the online tutorial for details.
The PuppetDbQueryModule is developed by a community member, Erik Dalen, and is the most used and useful module available to work with PuppetDB: it provides command lines tools (as Puppet Faces), functions to query PuppetDB and a PuppetDB based Hiera backend.
All the queries we can do with this module are in the format
Type[Name]{attribute1=foo and attribute2=bar}
by default they are made on normal resources, use the @@ prefix to query exported resources.
The comparison operators are: =, !=, >, < and ~
The expressions can be combined with and, not and or
The module introduces, as a Puppet face, the query command:
puppet help query
puppet query facts '(osfamily=RedHat and operatingsystemversion=6)'
The functions provided by the module can be used inside manifests to populate the catalog with data retrieved on PuppetDB.
query_nodes takes 2 arguments: the query to use and (optional) the fact to return (by default it provides the certname). It returns an array:
$webservers = query_nodes('osfamily=Debian and Class[Apache]')
$webserver_ip = query_nodes('osfamily=Debian and Class[Apache]', ipaddress)
query_facts requires 2 arguments: the query to use to discover nodes and the list of facts to return for them. It returns a nested hash in JSON format.
query_facts('Class[Apache]{port=443}', ['osfamily', 'ipaddress'])