Puppet Infrastructure Design Guidelines

Puppet is a powerful configuration management tool. Here we define some good practice guidelines useful to deploy a mid-large scale puppet installation.

Designing a Puppet infrastructure is a matter of knowledge, method, contingency and somehow fantasy.
First of all you must know Puppet's logic and it's main language features, then you should define a general method to manage points in common and differences in the configurations you apply to your hosts, this is mostly dependent on your own infrastructure and needs, finally you can add a bit of creativity to handle different situations and singularities.
As usual in Unix world there are different ways to achieve the wanted results and there is not an unique solution or recommendation worth for every case, still we try to define here different scenarios and the relevant "good practices", well aware that there might totally different and still good practices to handle the same cases.

Some preliminary notes:
- Here with "role" we intend the function of a host. Defining a role has a sense when there are at least 2 nodes having the same role.
It can be an arbitrary string, such as "webserver" and should be shared for all the host that have exactly the same services running, where configurations general tend to be similar and can have differences in details as local hostname, IP and similar.
For example a battery of frontend web servers can share the same role (ie role: "webserver"), they can be balanced by a couple of load balancers in HA (ie role: "loadbalancer"), use a backend database cluster (ie role: "database"), being monitored by one or more monitoring host ("monitor"), send syslog messages to one or more syslog servers ("syslog") and so on.
It's worth to underline that if you use the concept of role it's better to always use roles, also when there are cases of roles used only by a single host.
- A "zone" can be generally seen as a separated network. In different zones you can define variables for different parameters that change from zone to zone. For example the network IP/subnet, the default gateway but also the dns/ntp/syslog/whatever server that all nodes in the same zone share. A zone can identify also development / testing / staging / production environments, eventually divided in different sub-zones if each of them span over different networks.
- The general logic is that every node (host) inherits a more general node (more precisely a (sub)zone, which could then inherit a "wider" zone) and includes a single role (more precisely a class defining the role).
- The examples here are based on a module based logic, as defined in Module Organisation

The practices used here have been applied successfully in different companies ranging from few nodes to, in the biggest case, about 200 nodes sharing different roles (more than 20) and different zones (about 10). It should apply seamlessly to wider installations, where the number of nodes could be of several hundreds, sharing dozens of roles and zones.
We'll not face here the issues of planning a distributed and redundant puppetmaster infrastructure, the delegation of editing permissions to different groups or how to cope with testing/production puppet configurations (but we'll face cases of a infrastructure with development/testing/production nodes).
We'll start from simple cases and then try to face more complex scenarios.

Very simple infrastructure: Few nodes, no roles, no zones

If you have few nodes to manage, all sharing the same network and without the need of defining roles, the logic is simple and can be reduced to defining nodes in a similar way:

node basenode {             $puppet_server = "10.42.0.10"             $local_network = "10.42.0.0/24"             $syslog_server = "10.42.0.11"             $ntp_server = "10.42.0.12"     }     node 'www.example42.com' inherits basenode {             include general             include httpd::php             include mysql::server     }

Note that on basenode you can define variables used in the templates of your classes, these variables can be overriden at host node level to manage exceptions. For example:

node 'ntp.example42.com' inherits basenode {             $ntp_server = "0.pool.ntp.org"             include general     }

Note that is important to declare variables BEFORE including the classes that use them.

It's a good practice to define a class that provides general configurations applied to every node. This class should just include all the common classes. Something like:

class general {             include yum             include hosts             include puppet             include iptables             include sysctl             include nrpe             include ntp             include syslog     }

In a simple environment you can decide to prefer sourcing static files instead of templates, since their content is not likely to change within your infrastructure.
A syslog class, for example, can be:

class syslog {             package {                 "syslogd":                     ensure  => present,                     name    => $operatingsystem ? {                             default => "sysklogd",                             },             }             file {                 "syslog.conf":                     owner   => "root",                     group    => "root",                     mode    => "640",                     require  => Package["syslogd"],                     path     => $operatingsystem ? {                                default => "/etc/syslog.conf",                                },                     ## If you want to use a template:                     content => template("syslog/syslog.conf.erb"),                     ## If you want to source a static file:                     ## source => "puppet://$server/syslog/syslog.conf",         }             service {                 "syslog":                     enable    => "true",                     ensure    => "running",                     hasstatus => "true",                     require   => File["syslog.conf"],                     subscribe => File["syslog.conf"],                     name => $operatingsystem ? {                             default => "syslog",                             },             }     }

In this case you can either define the content of your syslog.conf in the template MODULEDIR/syslog/templates/syslog.conf.erb or in the static file MODULEDIR/syslog/files/syslog.conf, of course the two options are mutually exclusive.

Simple infrastructure with roles

If you have various nodes with similar function it's worth to consider the use of roles (note that the concept of role in not intrinsic in Puppet but just an arbitrary way to summarize functions), shared by different nodes. Something like:

node 'www1.example42.com' inherits basenode {             include role_webserver # (the role_ prefix is arbitrary and not strictly necessary)     }     node 'www2.example42.com' inherits basenode {             include role_webserver     }     node 'www3.example42.com' inherits basenode {             include role_webserver     }     node 'lb1.example42.com' inherits basenode {             include role_loadbalancer     }     node 'lb2.example42.com' inherits basenode {             include role_loadbalancer     }
You then define roles in normal classes, with something like:

class role_webserver {             $role = "webserver"             include general             include httpd::php     }     class role_loadbalancer {             $role = "loadbalancer"             include general             include lvs     }

Note the definition of the $role variable at the beginning of the class.
It's recommended to define such a variable because it can be useful in different situations, where you must define totally different configurations according to the role of the host.
For example iptables rules can be crafted to be the same for all the nodes of the same role:

class iptables {             service {                 "iptables":                     name => $operatingsystem ? {                             default => "iptables",                             },                     ensure => running,                     enable => true,                     hasrestart => false,                     restart => $operatingsystem ? {                             default => ""iptables-restore < /etc/sysconfig/iptables",                             },                     hasstatus => true,                     subscribe File["iptables"],             }             file {                     "iptables":                     mode => 600, owner => root, group => root,                     ensure => present,                     path => $operatingsystem ?{                                 default => "/etc/sysconfig/iptables",                             },                     source => [ "puppet://$server/iptables/iptables-$role" , "puppet://$server/iptables/iptables" ],             }     }

Here you can define the rules for webservers in MODULEDIR/iptables/files/iptables-webserver, the rules for loadbalancers in MODULEDIR/iptables/files/iptables-loadbalancer and a default ruleset, applied if not role-specific files have been defined, in MODULEDIR/iptables/files/iptables.
You can easily manage host based exceptions changing the source definition in something like:

source => [ "puppet://$server/iptables/iptables-$hostname" , "puppet://$server/iptables/iptables-$role" , "puppet://$server/iptables/iptables" ],

and then, where necessary, creating a file like MODULEDIR/iptables/files/iptables-lb1 to apply specific settings for the host lb1.

Another way to use a variable like $role is directly in templates. You can change the above line in:

content => template("iptables/iptables.erb"),

and create a MODULEDIR/iptables/templates/iptables.erb with something like:

*filter     :INPUT DROP [0:0]     :FORWARD DROP [0:0]     :OUTPUT DROP [0:0]     -A INPUT -i lo -j ACCEPT     -A INPUT -p icmp -j ACCEPT     -A INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT     # SSH allowed only from management console     -A INPUT -s 10.42.0.200 -j ACCEPT     # Role specific settings     <% if role=="webserver" %>     -A INPUT -p tcp --dport 80 -j ACCEPT     -A INPUT -p tcp --dport 443 -j ACCEPT     <% end %>     <% if role=="dbserver" %>     -A INPUT -s 10.42.0.0/24 -p tcp --dport 3306 -j ACCEPT     <% end %>     -A INPUT -m pkttype --pkt-type UNICAST -j LOG --log-prefix "[INPUT DROP] : "     -A FORWARD -j LOG --log-prefix "[FORWARD DROP] : "     -A OUTPUT -m state --state NEW,RELATED,ESTABLISHED -j ACCEPT     -A OUTPUT -m pkttype --pkt-type UNICAST -j LOG --log-prefix "[OUTPUT DROP] : "     COMMIT

Infrastructure with different roles and zones

More complex scenarios can involve the presence of several nodes (scaling up to hundreds) using different roles and being placed in different networks with different functions (ie: development/testing/production... ).
In these cases it's recommended to work on nodes' inheritance managing relevant variables at different levels, according to custom needs. For example:

node basenode {             $puppet_server = "10.42.0.10"             $syslog_server = "10.42.0.11"             $ntp_server = "10.42.0.12"     }     node devel inherits basenode {             $local_network = "192.168.0.0/24"             $syslog_server = "192.168.0.11"             $zone = "devel"     }     node test inherits basenode {             $local_network = "10.42.1.0/24"             $syslog_server = "10.42.1.11"             $zone = "devel"     }     node prod inherits basenode {             $local_network = "10.42.0.0/24"             $zone = "prod"     }     node 'www1.example42.com' inherits prod {             include role_webserver     }     node 'www1.example42.devel' inherits devel {             include role_webserver     }

A similar approach leaves you freedom to define per zone settings but also to keep the possibility to override them at more specific levels.
The inheritance tree can have more intermediate nodes, according to your own infrastructure, but it's important, to avoid headaches and overcomplexity, to have for each host a single and linear inheritance tree (ie: node inherits subzone inherits zone inherits basenode).
Note also that zones (as roles these are not a Puppet internal concept) can be related to IP networks but also to functional levels (prod/test/devel...) or geographical locations (headquarters, branch office...). The use of a $zone variable has the same advantages of the $role variable, it can be used in many different places to manage differences based on different zones. Another example:

class general {             include yum             include hosts             include puppet             include iptables             include sysctl             include nrpe             include ntp             include syslog             case $zone  {                 prod: { include hardening }                 test: { include hardening }                 default:  {  }             }     }

So, for each node, you have 2 main characterizations:
- The zone (network or ) where it stays (inherited from an higher level node)
- The role (function) it has (included as a class)
these should be enough to cover many different scenarios with different complexity keeping both the needs of high-level standardization and host-level characterization.

The guidelines defined here are being applied to the Example42 Puppet Infrastructure (a sample infrastructure that can be used as starting point for customization) by Lab42. Regards and credits to Francesco Crippa of Byte-Code for the initial architectural approach.

Puppet Infrastructure Design Guidelines

Authoring

More info:

Linking

Discuting