I like to use an analogy to explain how sites that have been poorly managed are working today.
My grandmother made the best chicken soup stock. She made large batches of it a few times a year to freeze for later use. Some of these she gave to my mother. The secret of her stock died with her, so we'd take some of the her Original Stock to mix with our imitation stock. And that way we could pad it for a while longer. Eventually we would have had to dilute what was left and repackage it to continue the adventure. Thawing the frozen stock to add our imitation stock, then re-freezing the result was deemed a `bad idea'.
Building new instances by coping existing instances and mixing in replacement finger-files is just like that. You are making use of grandma's stock to make your soup, and eventually you'll find it is too diluted (or tainted) to consume. Every site needs to be able to build new instances of each type from raw materials.
I promise you are not wasting your time here. Even if you never install any of my code, you might learn how to organize your configuration management structures better just by reading this document.
"People who know what they're talking about don't need PowerPoint." -- Steve JobsNow for some PowerPoint I shouldn't need:
git
,
RCS,
CVS, or
the like). That structure may (may not) manage folders or the migration
of files to other names. But that structure must keep track "milestones".
That is, automation needs to extract each file at a stable revision, even
while new revisions are in process.
make
, ant
, or the like)
to form a set of files that represent the product. This may require
compilation, linking, or any other process that can be done strictly
with automation. No fingers allowed after the process starts.
rpm
, pkg-add
,
or the like), but could be a simple archive that is installed by some
local tool.
All files were input by people, in one way or another. So having been input by fingers is not the issue: having to be input again by fingers is the issue.
A file that is committed to a revision control structure gains more substance than a finger-file by virtue of the ability to recall an exact copy via automation. A product built entirely from revision controlled files can be rebuilt from those files. A package of such products, constructed from a revision controlled recipe is as repeatable as the parts that made it. Any instance built from those files, products, and packages is just as repeatable as any other artifact.
To make it clear: no finger-files are used to configure any layer. Finger-files are the raw material to build more revision controlled structures, in the longer term. This larger commit-loop is called `progress'.
The gross simplification most technicians enjoy is that these two different sources of data can be grouped into a single bundle. It is true that site policy is just more files in the source repository. But it is not true that we make `everyday changes' to those files. Changing the signature of a production database host, web-server, or application instance without a way to manage and track those changes is a sure way to make management of your site impossible.
That's the whole goal of configuration management: build what you promised from what you have, without errors, and in a timely manner.
A corollary to that: never build any managed element twice. If we can build it well once, we can use that same process to build it as needed.
The most important part of operational configuration management is on-going updates, not the first build or first boot. Having data that was right long ago (but might not be now) is worse than knowning you don't know.
Given that we are not going to reset the clock to get the same timestamps on the contents of a build, we must either ignore the timestamps, or never rebuild an artifact with the same identifier, but different contents. This is a matter for you to decide in your local site policy. (I almost never construct a new product or release with the same name as a previous one.)
Since I extract my revisions with automation (via
rcsvg
(1),
cvs
, or git
),
I do not worry about random changes made by peers.
We only move to stable symbolic labels at known intervals.
Moreover some configured files are different for every instance, for example the name of the instance itself, its serial number, and at most sites the IP address. Even though these elements change for every instance, they are tracked in local site policy files.
And some configuration files are different on each instance due to
differences in the applications and services provisioned
For example, sudo
and op
configuration files should include only the escalation rules needed to
manage each host. Sending unrequired escalation rules to a host is a
security incident waiting to happen.
I record all of the site policy for layers 1-4 in a few common ways. This lets a team of less than 10 people run more than 3,200 instances without breaking themselves or production. Much higher scale-factors are possible with more support from development groups.
We need a way to record recipes to avoid finger mistakes when driving automation. I used 2 ways because there are 5 layers and no single way works for all of them. The file tactic is to record recipes in the in-line comments in each file, for the other (multiple-file) layers I use a separate recipe, script, or feed-back-loop to automate each process. Every file can be marked-up with comments, every process can be automated with a recipe. Every locally built processor should accept comments, for that reason alone.
At layers 1 and 5 we manage a single file by revision.
Separate recipe files would double the number of files we manage, and imply
a link between revisions and files that you do not want to manage.
If you think you've found a file that can't be marked-up, you've never
used
At layers 2 and 3 I use
At layer 4 I use the master source structure that is contained in
this directory (
For all the forms of pull logic I used
For the forms of push logic I use
The key issue is knowing that the state of the resources you are
about to use is stable. That is that you are not
going to incorporate a file that is partially committed (in effect
a finger-file) into your deployment. This is the first requirement of
any configuration process: "start with what you have". If you start
with something you didn't expect to have, you will get results you
didn't expect to get.
Use a process to advance (or regress) labels that makes sense in your
environment.
Close-the-loop by always viewing all the uncommitted changes before
any update to production. Any uncommitted change should stop the process.
Files that have not been committed are (by definition) finger-files and
must not be part of
a production update. They could build a test environment,
but that's site policy -- any local policy allowing uncommitted changes to
move to production is a bad one.
I use
It is also poor form to leave uncommitted files in the revision control
structure. These are trip-hazards for other engineers.
So I run a recurring
I always gather files by symbolic name with
In addition to that I check the control recipe used by
Once built or packaged the release of the package
is known by the name of the package and a unique identifier. The
identifier could be a number, a date, or any other unique key that
distinguishes that build from any other. For example "msrc_base-2.31"
would be a good specification for an older release of these tools.
The same close-the-loop checks that
Not much different from a package recipe at this level.
I have built instances from source code (FreeBSD), from
ISO images of mostly RPM files (Linux), from
network boot images (Solaris, HP-UX), and from
boot tapes (AIX).
Given all those tactics, I can tell you that the details do not matter as
much as the structure underneath. Having a manifest of parts and knowing
that that manifest is complete and stable makes the process work.
The mechanics of getting an instance booted are widely available:
kickstart, jumpstart, DHCP boots, as well as using a remote protocol to
mount an ISO image from the boot ROM (iLO, ALOM, virtual provider, or
the like). Once you get the image booted, it is quite possible to
configure the whole of the configuration with automation.
There is no file in a computer that is not made of
bits, and bits are easy to write.
You just have to have a policy for the contents of each file, and
the order to build and install each part.
If you need to gather site policy, it is going to be for an
internal presentation layer (a web site, or the like).
Which just means converting all the documents to HTML or
some other format. So encode your policy in files that are easy to
convert to HTML, and easy to process mechanically.
I use the
Each configuration for your hosts, routers, switches, disk arrays, and
other IT instances decays just as the rice does. Trying to do a lot of
up-front configuration when you install the instance just means it gets stale.
Stale because it is continuous updates that keep instances
fresh.
Computational, application, and capacity demands change, software evolves;
these factors continuously move the goals your site structure has to meet.
Moving goals means changing configuration, it is really that simple.
If it doesn't get stale it gets lost.
Lost when you lose the configuration of your instances by:
I have confidence, because I know how to make the correct file
all the time. And if it doesn't work I know
where to fix it, and how to test it. That let's my whole team work faster and
with much more agility than anyone using their fingers alone.
That confidence doesn't mean we are careless. We build back-out copies of
files I change. In fact
There are a few mistakes or failures that cause an instance to
become inaccessible from the network:
That is about 20 files that could lock you out of a running instance.
There are a few more that can keep an instance from booting, depending on
the operating system.
In all these cases having console access via
a
Think about the number of files that you might update without
locking yourself out of the instance. Pretty much all the other
20,000 files installed on my workstation, which is more than 99%. Don't
let the 20 files stop you from automating the 20,000.
I would argue that updating the 20 files with fingers is actually worse than
updating all the rest, since the time-to-recover is higher for mistakes in
the 20.
Since you've automated the risky ones, you would certainly automate the
less-risky ones. I don't see any valid argument to not automate as much
of the configuration of an instance as possible.
All the files used to update an instance are always from from
a revision control structure, with the recipes from the same source.
What else might impact the results of a build process?
I want to consider 3 contextual factors that create differences when building
and updating the configuration of an instance.
So let me describe those 3 factors, then tackle how to install updates.
The first is which update-target is selected.
We may need to update a configuration file under
The next factor is the meta-data about the desired state of
the target instance used to configure the directory. Since instances
run different mixes of application, data services, command and control
services, and other IT facilities, we need an authoritative data source that
tells the configuration structure which to configure on every
managed instance.
That meta-data is a layer 5 policy which is machine readable.
Changes to key meta-data elements may require many
configuration directory updates, it is hard to predict which directory
uses which meta-data, and the purpose of that mapping. This is why
changes to site-policy take more care and skill.
The last factor is the build environment: the version of
any compiler, library, or other tool that impacts the exact rendering of
the source files into a product. Include support for any cross-compiler and
machine architecture flags in this group, but the need for such
options is meta-data. Any update to the build environment might
imply both a major update to the client instances, and a rebuild of
all the binary files currently installed.
The usual case is that most modern instances are provisioned with
a compiler, most build tools, and any run-time libraries likely to
be needed when they are installed. But it is possible to run all
the preparation work on a specially provisioned `build farm'.
In that case, all the client instances simply install
packages of files (via RPM, PKG. or
the like).
The difference between push and pull is tactical at that scale.
The larger issue is package verses not. When you package an update
you must assure that any automated installation accounts for
all possible transitions from the existing state to
the desired state.
A failed installation, via a package installation script, has little
choice other than a non-zero exit for any failure. Recovery is
much harder to automate if you lack invariants with reguard to
the availability of required prerequisites.
By packaging only those directories without
instance-specific elements, only platform specific configurations,
you may get a higher success rate for package installations.
Stand-alone packages installations simply fail when any prerequisites are
not up to their needs, which is all they can do.
An active push or pull of a product may be able to discover missing
prerequisites to trigger the automated update of out-of-date ones.
Similarly, a structure around packages (like
That binary file might be installed someplace under
There are 4 combinations of policy and environment that we should
be able to create. First we'll look at the push
model.
When it is finished it removes any temporary files, so there is no
cleanup required. Save a copy of the build directory, when required,
with
It leaves the remote copy on the host. This helps debug any failed
builds. A cleanup task on each client could remove the (usually small)
shadow directory after some delay. I usually just remove the whole
shadow hierarchy as a clean-up task after a major update has been in
production for a month.
It leaves the configured copy of the source in directory specified in
the
It is possible to project a copy of the master source for any product
(via
Since many master source directories may have file caches or other files
created at build-time, read-only mounts over NFS might not work. To fix
this, use a modern union mount to allow a transparent overlay of
a local filesystem over the read-only NFS mount. This allows the client to
build on top of the read-only directory.
That mitigates some of the pain of this role reversal.
If I did need this, then I would install
The Genesis tactics uses a complete copy the master source to build
every tool in the correct order. This is often used to bring a fail-over
copy of the master source host up at a new datacenter. Or to create
an archive disk that is known to be "pure". See the
Genesis HTML document.
The RPM build process turns a single product or package into an
RPM file. These files are used in a pull structure to update many
clients, or speed the local build process. My local policy requires that
I name the RPM recipe file
Personal builds, in a mortal login's home directory, contain a
(mostly complete) instance of the local tools. This copy references
a configuration subspace which is wholey contained under the home
directory. I use this tactic to test new versions, new products, and to
show other admins how nifty the structure is. See the build
plan in ksb's HTML document.
Basically we pull each level 2 product via
While RPMs may be removed from the target system, and a mortal
install could be removed, there is no inverse operation for
the Genesis build. Genesis is intended to permanently convert an instance into
a master source repository. But the list of products to install could
be changed by site policy to create other layer 4 signatures.
The
When you look at the
With just the
But the core of the structure is stable with just the
In the worst case not installing
In either case, good luck!uudecode
, m4
, or
you are limiting yourself in some other unreasonable way.
For each file I add any required recipe to the comments within the file.
I even use comments to markup parts of a file I need to extract later.
We'll talk about
mk
and
explode
later in this document.
make
recipe files.
This is the obvious choice because it works on every platform I
manage, and I don't have to use any of the `advanced' features of
any specific version. I stick with (almost) plain old V7 recipes.
mmsrc
(8),
msrc
(8),
hxmd
(8),
wrapw
(1),
xapply
(1),
xclate
(1),
ptbw
(1),
explode
(1),
and mkcmd
(1)) plus
the tools from install_base
.
Then some close-the-loop processes which check the signature of
each instance against either a known-good signature or
the last known signature to look for regressions, failures, or human mistakes.
msrcmux
(7),
mpull
(8),
muxcat
(1), and
rsync
(1).
msrc
(8),
ssh
(1),
scp
(1), and
rdist
(1),
Always know what stable is, and what is stable
Traditionally most people think of either pushing data to
a target host, or pulling data from
a central service to the target host. But that is not
the key issue when updating the configurations under your control.
At layer 1 revision control by label (versus numbers)
Any good file revision structure allows for
revisions to be known by a symbolic name (all
the way back before RCS). As you make commits to
the file you may elect to assign a symbolic name to
the revision to mark for other processes to recover. If your revision control
doesn't support some clear way to move a symbolic name, then get a better one.
Local site policy controls how names are selected, reused, and retired.
rcsdiff
(1)
to check for layer 1 issues.
Files with no symbolic label are usually excluded from the build.
If they are not in this layer's context, then a check that
displays them and stops the process is part of the update.
tickle
(8)
task to e-mail engineers that have idle locks older than a few weeks.
If that doesn't prompt them I take more direct action, by shaming
them before their peers.
At layer 2 gather files by labeled revisions
From the stable source files committed under a known
revision we build a version of a product. The
version identifier allows us to select the right build of the
product later. A product is always specified by the name and
the version identifier.
rcsvg
(1).
Any production build stages a copy of the source under a temporary directory.
That directory is where the build process runs, not any other working copy.
Under git
we'd use a known 160-bit SHA1 hash (a 5 to
41 digit hex number), but in either case we'd extract the stable source as
directly as possible.
msrc
with
msync
(8)
(or see the At layer 3 gather products by labeled versions
The build process for a package collects all the product directories
by their symbolic label into a hierarchy that mocks their position in
the original source tree. That extraction allows related build
directories to reference each-other's stable source.
The whole stage directory has a recipe file that builds every product in
the correct order. It also can be archived as a source package to
be distributed to other sites.
(This msrc_base
package was built that way.)
msync
uses are
used at this level. A hook in the recipe file allows recursion into
the product directories to check them as well.
At layer 4 gather packages, products and files by labels
The signature of an instance is a list of elements installed
to provision it: the packages and their release numbers,
the products (not contained in a package) and their version numbers,
and specific files and their revision numbers. Some may be pulled
by a symbolic name (like "Current" or "Stable"), others may be pulled
by a known good number (which is a little gross).
That is all local site policy: but a policy you must have.
Just pulling packages by some random factor (aka by the current date) is
not going to build a repeatable instance.
At layer 5 we wrap around to layer 1
Since site policy is just files, we use the same management for site
policy as we did at layer 1. If we require a whole directory to
represent a policy, it is kept as layer 2. All site policy could
be gathered into a package, but I've never needed to do that.
hxmd
format for almost all of my
automated policy, and HTML for the people-policy.
How this model helps make management possible
If you buy 2 years worth of rice and hide it from yourself, you will
still be hungry. Rice keeps about 2 years, so when you find it a
decade later it is useless. More than that, you must dispose of it.
All of these really mean that you never put the file on a backup media,
never put it in any CM structure, or didn't use the ones you had.
If you have existing structures to configure instances, why didn't you use them?
Mostly I hear 2 reasons: it is `harder than my fingers', and it doesn't
`scale out' to our needs.
I can find evidence of there at most sites; things in
/etc
like: group.2006-12-12
,
or resolv.conf.old
. Files like these mean
that admins (with a superuser shell) have no confidence that
they can fall-back to a previous revision of that configuration file.
So they leave trash in the filesystem, rather than chance a much
harder recovery.
install
puts them in
a directory named OLD
for us. But we do not
keep those files forever. A recurring purge
task
removes all the backup clutter from the filesystem. This keeps junk from
accumulating, but allows a quick recovery from a fat-finger error, or
even a bad commit.
/etc/resolv.conf
/etc/services
ssh
access to
the instance:
/etc/password
missing the privilege separation
login for sshd
/var/empty
/etc/ssh/sshd_config
/etc/ssh/ssh_host*
keys
/etc/pam.d/*
or /etc/pam.conf
SHELL
(e.g. nonexistent)
ssh
keys may be corrupt
conserver
with serial port access, out-of-band
management iLO access, or other remote console access will save you from
anything short of a hardware failure.
How to automate the update of an instance
/etc
or a binary file under
/usr/local/bin
, but we rarely need to
rebuild every possible update in a single change.
We select a target to update under master source by directory and possibly by
the update command we apply to that directory.
Thus we must create a unique directory for every target application.
Most applications install the program, the manual page, and any default
configuration (if none exists), as asked. Once we have a way to install
every configuration directory, we can automate installing them all in
the proper order.
Getting updates to instances: push, pull, or package
For each combination of target, meta-data, and environment it is possible to
either pull updates to the target instance, push them to it, or build
a package and use those (via a push, a pull, or external media).
apt
,
yum
, or pca
) manages
the prerequisites and failures to provide better service.
But such a service alone doesn't update every file on
a host, because it doesn't have a source of meta-information.
Four data-flow models for a build+update process
I'm going to use layer 2 for my examples, because it is easy to
get a handle on that one.
Let's say that we want to build the source for a local product into
a binary file, then install that application and a manual page.
This is a good example because it is clear that the CPU type of the
instance, the installed libraries, and the version of the compiler
may all contribute to the outcome. Just as clear, to me, is that
any text file built could be impacted by the build-host's state in
much the same way.
/usr/local
or
/opt/
depending on the type of operating system.
The manual page could be under application
/usr/local/man
or some place else.
Master's policy, master's build environment
The tool mmsrc
builds a shadow directory
under /tmp
(aka $TMPDIR
),
then uses the local build environment to run the build recipe.
This is exactly what we need to build a product on the master host with
the local environment.
cp
or tar
.
Master's policy, client instance's build environment
The tool msrc
builds the shadow directory
on the client host (in the directory specified in
the make
macro INTO
),
then ssh
's to the client instance to
run the build process.
Client instance's policy, client instance build environment
The tool mpull
builds a local copy of the
master directory using rsync
to fetch it from
the master server. Then it uses mmsrc
to
build the directory using the site policy visible on the client.
make
macro INTO
,
just as msrc
would.
Client instance's policy, master's build environment
This is the hardest one. A recipe to do this must project a copy of
the master source to the client, configure the directory with
mmsrc
, copy the configured directory back to
the master server, build with the local tools, then mock the install
process to see what needs to be updated
(or package the directory for later update).
I have never once needed this.
rsync
, rdist
, or
NFS) to the client, then
use an msrcmux
service on that client to request
the configured directory from that client on a build host.
Then the build host would trigger the build portion of the recipe,
copy the directory back to the client for the installation or packaging.
msrcmux
on
each client and mount a read-only NFS mount of a local cache of
the master source on each client, with a union mount of
a tmpfs
over it. Then allow a (local) build
server to request a configured copy. That host would then
remote install the resultant files back to the client, from
a temporary directory. This is a lot of effort for
a case I've never needed. But I'm sure it would work, since I
just tried it.
For a client initiated pull version of each type.
The list above represents a push from a repository to a target
instance. If we need an instance initiated update we would use
the processes below.
Master's policy, master's build environment
Download a package built on the master, install with the platform's
package manager. If you can't do that, rsync
the INTO
directory from the master server
into the same directory on the client instance and trigger the installation
recipe.
Master's policy, client instance's build environment
The tool msrcmux
allows a client to pull the
configured directory to the client instance of a tcpmux
service. The muxcat
client application is the
usual client application. See the msrcmux
muxcat
(1) manual-page.
Client instance's policy, client instance build environment
Use mpull
to fetch the master directory and build
it with local meta-information. See the mpull
mpull
(8).
Client instance's policy, master's build environment
I have never needed this combination, see the
discussion above.
Package build processes
There are three that we need to talk about: Genesis push builds,
binary package builds (like RPMs or DEBs), or
personal builds (aka. builds in a mortal's home directory).
ITO.spec
.
This assures that automation can find the correct specification file.
Within that file some mk
marked lines have
additional meta-data about the directory which contains the file. See
level2s
(8).
msrcmux
down in tern from a make
recipe, which forces
the correct order and configuration parameters to install all local tools.
Cleanup
Genesis leaves a complete shadow source tree, an RPM build leaves
the RPM packages and the installed RPM data, the personal build leaves
a complete shadow source tree, most of which we be removed from
the target instance.
Why the
I like to think that I'm clever. I put just enough code in the
first package to mostly build itself. The source for
msrc_base
package contains these toolsmmsrc
come as a plain C source, which is
built with make
and a C compiler. It may be
reconfigured with autoreconf
.
We then use that to build all the tools required to reconstruct it:
explode
, mkcmd
.
With those we can build the wrappers:
ptbw
, xclate
,
xapply
, wrapw
.
With those we can build and run the push version of the master source
structure: hxmd
, msrc
.
And with those installed we can build a new version of
mmsrc
to complete the loop.
mkcmd
tools allows us to reuse the
definition of options between programs that share the same specification.
While the explode
tools allows us to select
just parts of a larger layer 1 file to incorporate into another
product. Together these two allow my tools to share and mix options,
common source code, and recipes so that they are never out of step with
each-other. See the primary corollary.
-V
output of
mmsrc
and hxmd
you'll
see they share the same hostdb.m
module.
This is the power of mkcmd
: that file contains
the options and the C code required to support both programs. At the
same time mmsrc
shares make.m
with msrc
. This is more than a "trick", it
allows a level of code reuse others only dream they had.
msrc_base
tools you could
install the pull version of the source structure:
msrcmux
, mpull
,
muxcat
. Or you could continue with
install_base
to
add better management of target environments.
msrc_base
package.
It is up to you to make something of it, or not. I always add
install_base
, oue
, level2s
,
rcsvg
, and msync
to
any machine I build.
Summary
In the worst case installing msrc_base
wastes
a few hours of your time and 5MiB of disk space.
In the best case you find a path to avoid service failures, lots
of typing, and wasting your time on processes you should have
automated long ago.
msrc_base
then de-provisioning a critical
service could leave you `without income'.
I hope you learned something, even if you never install the software.
-- KS Braunsdorf, September 2012
$Id: msrc.html,v 1.17 2012/09/27 17:23:36 ksb Exp $