To understand this document

I hope you read the msrc_base HTML document before you read this document. This document describes how I use the tools in this package (install(1l), install.cf(5l), purge(8l), instck(8l), installus(8l), vinst(1l), mk(1l), mk(5l), and op(1l)) to carry out the day-to-day operations of a host I've built with the tactics described in the msrc_base document (above).

Having used the standard install program to update files, having used su and/or sudo a bit, and having made a few mistakes with a superuser shell would help put this document in context for you. If you've used a package manager (yum, apt, pca or any other) to update many instances then you get bonus points.

Having built some custom products from source would be a four of a kind, while management of locally developed code would be a royal flush.

Moving your IT structure forward

Zipcode™ moves the mail, but that doesn't move better bits to your managed instances (hosts, switches, routers, storage devices, etc.). What keeps your IT services fresh for your Customers is timely updates and the certainty that tomorrow's service will be at least a good as today's, and probably better.

If you are not improving your service, then you are falling behind. That's just the way Moore's law drives Customer expectations in our world. If it is not more speed they expect, then it is higher quality (viz. fewer defects in code, more features in interfaces, fewer manual steps in processes, or more ubiquitous access). If it is not that, then it is capacity demands driving you to manage more instances, more services per instance, and more space for each application.

You need a strategy that leverages Moore's law, so as the size of your problems grow, the solution's power grows proportionally. Rather than falling behind (like your keyboarding skills have), automation gets faster as machines get faster. That is the point of this set of tools: use automation to manage automation.

That doesn't mean we can remove the people. It means we can build an information loop into each process, so the people in the process have the right information to plan for failures, to diagnose failures, to take corrective actions, and to know when they are winning. Any process that doesn't give you appropriate feed-back is not going to make you happy in the long-run.

Feed-back derived from the configuration management layers

I'm going to explain how this package augments the ideas from msrc_base. We don't really replace any of the CM ideas, we just add details and fill in the gaps in the base package with the solutions I've found make the most sense, at the least cost. Most of these tools address layer 4, because nothing in msrc_base really sets local site policy. "Set it and forget it" is a tag-line for a desktop cooker, but last time you forgot your dinner, you went hungry.

Install files into production

The most basic operation we need is the ability to update a file in a safe way. We want an atomic update (or as close as we can get), we want to be able to back-out of the update, and we may need to change the contents, the permissions (mode, group, owner, flags) at the same time. That's exactly what install(1l) does.

Even better this version of install moves the originally installed file into an OLD subdirectory of the destination directory. This makes backing-out of a bad installation a lot easier. It also means that any changes made to the target file with someone's fingers can be recovered and put into the revision control system. (Assuming there is a failure that points you to the mistake.)

Since install would eventually fill the filesystem with backup copies (under the OLD directories), there is a loop closing application called Purge(8l). It removes install's backup copies after a time limit, or on request by target file name. It is quite common to run purge after operational backups to remove all the backups that are now saved off-line.

But even purge doesn't close all the loops. Thanks to hard links in the filesystem a file may be accessible from a name under OLD and from at least 1 link still installed in the target directory. If purge were to remove the link in OLD, the record of the original structure would be lost. So purge won't remove the backup link in this case -- instead it refuses and asked the operator to run instck.

Instck(8l) examines a directory (a whole filesystem really) to find partially updated files, repair incorrect or unsafe modes, or to generate a rule-base for install. These are all close-the-loop operations to assure that install doesn't leave a mess on any production host.

As mentioned above, install has a configuration file (see install.cf). That file contains a list of the precious files which you might need to assure always get installed with the correct permissions, like the files that can lock you out of an instance. There is no check in install for content. If you need a sanity check for content before a file update, put it in the make recipe which drives the installation process.

User-supported applications

Some file updates might be done by mortal accounts (not system admins). I've managed several sites where unpaid volunteer workers kept FOSS products up-to-date in a common "user-supported directory". I've never had an issue with Trojan programs or malicious applications. I've never had an unauthorized update put into production.

That's because I grant the privilege to update each application to a specific owner. That mortal account uses installus(1l) (for "install user-supported") to update the application on their own schedule. Trouble tickets for the application are diverted to the owner, and the product (or package) is removed from all instances when the owner ends their support. (Usually someone else picks up the torch before the owner leaves the organization.)

The same close-the-loop operations work for installus, as it actually calls install to do each update. The only management required by the administrator is the upkeep of the owners(5l) file, which controls who may install what for each user-supported directory. Since this file is easy to generate with standard msrc tactics, that's not a lot of work.

The free man-power to keep up with the rapidly changing versions of some FOSS products is well worth the support of the structure.

Making finger-file updates a little safer

Very rarely an administrator must make a temporary change to a revision-controlled file. This is almost always in response to failure of some kind. In such cases it would be best to leave an audit trail of the changes made, and a backup of the original file to be restored after the event is over.

The obvious solution is to copy the file to /tmp, edit the file there, then install the temporary file back to the production location. Actually, almost nobody takes the time to type that in a crisis: they just vi the file in-place, suspend the editor and test the change. They might leave the suspended vi process around, thinking they can undo the change and restore the file from the editor.

That hack works sometimes, but loses horribly in other cases.

The vinst(1l) application does exactly what you should have done. That also means it does all the checks install does (because it copies the file, updates the copy with your $EDITOR, then install's the file into the original location). Plus that only works if you have update permissions to the target directory.

If you don't have update permissions (because you use installus to update the target files), then you may use vinstus (aka "vinst -U) to replace install with installus. This allows user-supported applications the same finger-file liberty that admins enjoy. In the best world vinst almost never gets run: but in the real world I use it about twice a year, usually to add my ssh key to an application account to debug a file transfer, restoring the original file when I'm done.

Data-driven configuration management

The basic msrc tactic is to use a make recipe file to manage configuration processes. That works best at layers 2-4. At layers 1 and 5 we don't have a separate recipe file (of any sort), since we just have 1 file.

I'm going to pick a simple example, because it is easier to grasp quickly. A manual page marked-up in nroff (aka groff) macros is a pretty common file to find in a source directory. It is also harder to read than the formatted version. So most people have to read the manual page for groff to remember the spell to format the page

$ groff -Tascii -man oue.man | less

By embedding the formatting command within the file's comments we can turn it into a recipe. If we markup the file this way:

.\" $Display: groff -Tascii -man %f | ${PAGER:-less}

then the mk(1l) application can locate the command in the comments and run it for us with:

$ mk oue.man

which is way less typing, and thinking, for any manual page I've written (even if the formatting is not done with the standard macros). It also works even if you rename the file to oue.nro, because the markup %f is replaced by the path to the file you provided on the mk command-line.

Mk is a do-what-I-mean application, as it selects a command from those listed in the comments of any file you give it. We are not limited to formatting commands: we may embed compilation instructions, installation instructions, regression tests, and pretty much any other recipe you'd like to keep with the file, we bind different actions to different markers (like Display above). For a lot more about mk (including why I chose to a embedded $Display command over any others) see the HTML document.

The other way around

Mk may also be used to embed layer 5 information inside any other layer's file. Create a command that outputs the correct policy (as simple as an echo command), then let mk search the file for it. This turns any comment-capable file into a source of local site policy. The msync application uses this tactic to find the correct group-owner for each source directory.

Additional access policy

Beyond the access granted by installus we may allow some logins access to certain privileged operations. For example a production support team may have access to stop and restart a service. They might update the service with installus, then need to "bounce" it to pickup with changes.

The op(1l) privilege escalation application allows a limited set of logins very fine grained access to set of commands. These commands run with the appropriate credentials for the task.

This is a higher precision tool than sudo or older versions of op. The access granted can be controlled in almost any respect, and if you want more control you can code a jacket to manage the process, or a helmet to authorize the access.

This implements the last part of the the base tools: it provides the platform to define the "who can do what" policy at layer 5. Once we have the tool installed, each instance needs the correct configuration files to define the policy for the node(s). That loops back to msrc, where we build a configuration process to deploy the correct policy files to each instance.

Not the end

The tools described in this package are not the end. To continue the adventure you'll need to read about these add-ons:

op's lib - example op configuration structure
jacket - example op jackets and helmets
snoopy - audit op escalations more deeply: This is a huge turbo for getting op rule-bases built for a large population of instances. You might build some rules yourself before you read these, just so you grok how nifty life can be.
tcpmux - tcpmux service, when inetd is lacking
recvmux - tcpmux receiver for muxsend
muxsend - tcpmux push client
muxcat - tcpmux pull client: I use the RFC 1078 mux a lot more than anyone. These make using the mux a lot easier.
efmd - extract from meta data
msrcmux - the pull service for msrc
mpull - the pull client for msrc: Ways to export msrc data and control directories, if you need more than the default push method. Efmd is just a lot faster than hxmd when all you need is an attribute macro value.
msync - close the loop between layers 1-2-3
level2s - manipulate layer 2 msrc directories
rcsvg - extract files from layer 1 to layer 2-3
tickle - close the loop on infinite revision locks at layer 1: Developer tools for msrc. I almost always install these on each master source cache.
kicker - on-ramp from cron to batch
flock - lock a file to run a process
haveip - limit execution to VIP holds: All job scheduling or excluding tools. Run the right thing at the right time on the right host without overloading the instance. With Moore's law driving CPU speeds it seems that one wouldn't need a batch queue anymore, but that's not really the case.
hostlint - close the loop between layers 4-5
netlint - close the loop between layers 1-5: When you get serious about layer 5, you'll want to use these to check your progress.
curly and uncurly - a filter to implement csh-style path compression
glob - a filter to process very large file matches
oue - replacement for uniq base on xapply's dicer
since - log file tail with resume features
tmbuf - snip off time-limited sections of a stream
sapply - screen aware add-on for xapply: For very large lists of instances and files the standard shell tools fall a little flat. These add some power you never knew you needed.
sbp - layer 4 on-host backups: Disk space is cheap, disks fail, and upgrades go pear-shaped: why not keep a "known good" image of your instance's data handy? I've used sbp for 25 years, and never been sad I did.
entomb_base - avoid most lost file restoration tickets
untmp - clean /tmp on logout
kruft - purge for /tmp and anonymous FTP spools
quot - a replacement for missing implementations: Manage disk space better: entomb Customer files, cleanup on logout.

$Id: install.html,v 1.7 2012/07/03 20:35:57 ksb Exp $