rsync
,
and some scripting skills. It helps a lot if you know what a
configuration management signature for a host looks like.
In ksb's model of configuration management the most important idea is that every level pushes information to adjacent levels to feed-back relevant information. This is how bugs get fixed, how site policy gets applied, how better code gets deployed. In truth, the whole configuration management structure doesn't function without mechanisms to both send and receive feedback between structural elements.
Hostlint
creates a report that reveals the
difference between an ideal host and the host it is frisking.
hostlint
hostlint
does is rsync
a directory loaded with shell and perl
scripts from
a locally trusted service run by Operations to a temporary directory
on the target host.
Then it cd
's into the temporary directory to
run a check script called site
.
Site
reports issues with the
instance's configuration signature in an easy to parse format.
The site
application's working parts are just as simple.
It looks in current directory for scripts that end in
.hlc
(for "host lint check"), and runs each
one, capturing the output from the list. If the output from every
check script is empty, then site
outputs an all clear
message, else it outputs the list of differences reported.
Thus the output is never empty for any host.
The site
script comes from a repository I call
hostlint-policy
.
It is actually globally
visible inside the production network, as any site policy should be.
Knowing what is expected is how you meet expectations, right?
crontab
runs
hostlint
at least once a week on every host.
E-mailed output from that tasks is processed on a central reporting host to
collate and prioritize the messages. The Admins review the feedback report
every Monday to prevent minor errors from becoming bigger issues.
(The jobs are staggered across a 4 hour window, so the reports do
not all come in at the same time, and we even-out the
rsync
service's load.)
When a new instance is created, after the process finishes the
final reboot, it runs hostlint
to report
any out-of-date items the installation process has installed. This is
actually quite often the case in some minor way (version of a manual
page or script). This offers the admin a chance to update the build
process, as well as fixing the instance just created.
Part of the triage list for a production issue is to run
hostlint
. Some application could have been
mistakenly removed, back-revisioned or upgraded. This is a quick check
that can be compared to the last e-mail report to see what may have
changed.
-V
option to local tools
-V
switch. This makes
checking the versions of most local tools as easy as running them
with that switch and parsing the first line of output. In fact that
is exactly what versions.hlc
does.
ident
on local configurations and manual pages
ident
to pull the RCS
identification string out of the comments in each page. If your site
uses some other revision control, you'd have to use the apropos tool
to extract the correct token's value.
rpm
-i
uniq
's manual page.
Changes at that level really don't need human attention, you'll pick
up the new one when you build a replacement instance.
/etc/resolv.conf
.
If this file is misconfigured your life gets harder really quickly.
Checking the options
and
search
lines for sanity has saved me many
hours of debugging.
/var/log/security
). In those
cases you might us op
to escalate a check
command with an in-line script. See
op
's HTML document
for details.
In effect hostlint
checks level 4 (running hosts)
against level 5 (site policy) to assure that every host conforms.
If you have site policy statements that do not impact the contents of any file,
version of a product, release of a package, or existence of a login,
group, netgroup or network route -- then you can't check it with
hostlint
.
hostlint
helps the Adminhostlint
has saved me more effort
than any other script I've installed.
Since petef and I wrote hostlint
to
check the versions of every local tool it has found regressions and
missing tools for me, which has saved me a lot of debugging.
Regressions happen when you restore a filesystem from backup media
(with tapes, sbp
, or even a copy from a host
you thought was identical). Or when you replace a mother board,
network card, or other component with firmware or a tracked
serial number.
Build processes get out-of-date when upstream packages change, or when peer groups update their site policies without telling you.
A list of local tools, packages, and expected configuration files makes
an excellent outline for teaching new staff what they need to know.
Adding new checks to the hostlint
repository
is a great warm-up to getting superuser access.
I've put injunctions in the accounting checks to forbid accounts for people that have left the organization. The makes auditors very happy.
hostlint
is the tool for you. Most
of the checks do not require superuser access, those that do might
be given an op
rule.
If you've not read it, then you should read about
netlint
in the
HTML document.
$Id: hostlint.html,v 1.8 2012/07/11 17:20:44 ksb Exp $