To understand this document

This document assumes you standard UNIX™ shell, sh(1), have an understanding of the UNIX process model, exit codes, have coded several scripts, used gzip(1), at(1) or built a cron table.

It also assumes that you can read the manual page for any other example command.

What is kicker?

kicker is a clever way to allow multiple applications to share a single instance without stepping on each-other's resources. This is accomplished by sequencing heavy unrelated tasks by virtue of the batch scheduler and system nice values.

One of the most common points of contention I've seen, which is not likely until an application base grows over time, is botched updates to application login's cron tables. A new task added to an existing table accidentally deletes some (or all) previous entries. This is a rake you do not want to step on.

Kicker periodically inject new tasks into the various batch queues. A cron table usually specifies 25 possible enqueue times (00 hours to 23 hours and eod which is usually 23:55). The system administrator may add additional triggers, which is clearly local site policy (like night and day shift changes).

The dependencies on cron are limited to these injection intervals. Jobs may also be injected into the batch queues by normal means, or by an escalated process. This allows one task to daisy-chain subsequent tasks as it completes their prerequisites.

Each task may assume the run-time credentials of the owner of the file (see -S). This allows mortal application accounts to cleanup their own log files, archive data, and the like. (Mayhap application restarts and upgrades as well.) The manual page has a copy of the recommended op rule, which is repeated here:

# Allow the kicker login to run tasks as any mortal login
batch	/usr/bin/batch -q $1 ;
	%f.path=^/var/kicker/.*$
	!f.uid=^0$		# comment to allow root tasks
	users=^kicker$
	$1=^[a-zA-Z]$
	$#=1
	stdin=<%f
	uid=%f gid=%f initgroups=%f
	$KICKER_Q_TIME
	$HOME $USER $LOGNAME $SHELL $PATH
This rule should be configured for local site policy. For example the users specification may be replaced with a groups or netgroups limit. No password should be required as kicker is often run via automation detached from any terminal (viz. via cron).

Basic examples

Usually the superuser's cron table contains 25 boiler-plate lines: The administrator must add 25 lines to the superuser cron configuration to root's (kicker's) crontab -- once, this avoids breaking many crontabs over the life of the host: nothing is really free.
# kicker support if run as root
0  00 * * *  root /usr/local/sbin/kicker top 00
0  01 * * *  root /usr/local/sbin/kicker top 01
...
0  22 * * *  root /usr/local/sbin/kicker top 22
0  23 * * *  root /usr/local/sbin/kicker top 23
55 23 * * *  root /usr/local/sbin/kicker eod
The top token allows any job that triggers every hour be injected before the specific hourly tasks. The order of the tokens could be swapped to force hourly tasks to run after the specific hourly tasks. (In fact a new token (e.g. follow) could also be created.)

Additional tokens for the months of the year, or weeks of the year could be created, but I've never needed any of those. Some tasks might run every day, but simply exit when it is not the last Monday of the month. So there is little reason to add specific day or month triggers.

Configure batch queues

The batch structure has a simple design, like most good UNIX™ structures. And like most features it is not enabled by default. Start by building 1 queue, grow as you need to.

The Solaris implementation uses /etc/cron.d/queuedefs to define each queue

# $Id:...
# q.[njobj][nicen][waitw]
# compress 2 at a time, nice +8, with no delay between them.
Z.2j8n
# run reports 1 at a time, nice +2, with no delay between them.
R.1j2n
Others (like FreeBSD) have fewer configuration options, or practically no options. In that case having more queues than CPUs is not really a good idea. FreeBSD's (aka Vixie's cron) implementation depends on atrun to also launch batch jobs, so you better enable that too. Note that the name of a Vixie queue specifies the nice priority (a=0, b=1, c=2, ...). There is no need to build queue directories, as the queue is part of the job name.

Vixie's atrun takes -l to specify the maximum load average which allows new jobs to start. Be sure to make that higher than the number of CPUs the instance provides (more or less).

Vixie's atrun also starts multiple batched tasks at in a given run. This pretty much defeats the purpose of the batch queue. So that limits the usefulness of kicker quit a bit.

Drop a test script into the batch queue to put date output in a file under /tmp. See that it works before you get all complex here.

$ echo 'date >/tmp/batch.works' | batch
Job 985 will be executed using /bin/sh
$ atq
Date                            Owner           Queue   Job#
Wed Aug 22 08:30:00 CDT 2012    ksb             E       985
$ sleep 377
$ cat /tmp/batch.works
Wed Aug 22 08:35:00 CDT 2012
$ rm /tmp/batch.works
If you can't make that work for yourself you are not going to make kicker work for anyone. Note that the default interval for atrun execution under Vixie cron is 5 minutes. I would change this to every minute if I depended on a lot of batch jobs.

How many queues do you need

I usually build at least 2 queues (as above). One for CPU intensive compression and archive tasks, and one for reports that need to be built and forwarded. For example the netlint summary is a report, and compressing apache logs is CPU intensive.

A report job might trigger a compression task. In that case the last command in the script might be a call to kicker, kicker -S batch, or op kicker queue with the name of the task to release (or whole queue to process).

There is almost no benefit to having too many queue options. That just muddies the waters when a Customer needs to pick which queue to use: a clean local site policy to guide recurring tasks into the correct queue helps both the Customer and the Admin.

The boot queue

If you like you can put a command in the boot sequence to run a kicker task at system boot time. Vixie's version of cron as a time for exactly this purpose @reboot. (Which I suggested.) This is a great way to start mortal applications.
# allow mortal services to start via daemon and kicker
@reboot       root /usr/local/sbin/kicker boot
Which doesn't work so well if there is a 5 minute gap due to atrun granularity. If so, then you could specifically force an atrun call as part of the commad (&& /usr/libexec/atrun).

Day math

Some tasks only need to run once a week. For those you could use cron directly to inject the job into the batch queue (with a call to kicker or batch). Or you could leave it in the kicker spool and check inside the script for the day of week:
#!/usr/bin/env ksh
# $Id: ...
# Only needed on Sat morning, which is day 6, see strftime's %w.
[ `date +%w` -nq 6 ] && exit 0

# the rest of the script...
Similarly one might check the Julian day %j, the month, or the week of the year (%U). Blue moons and Easter are slightly more complex.

This example is the most complicated request I've ever had. We want to run this task on the last Monday of each month, but we need to avoid the last day of the month. In other words if the last day of the month is a Monday, don't count it. This even works in leap years, since cal is cool with those.

#!/usr/bin/env ksh
# $Id: ...
# Find this month's work-days (not Su or Sa, should we remove Mo holidays?)
# Assumes that weeks start on Su.  The -h option below
# is not standard on all versions of cal, remove it if you don't need it.
WORKD=$(cal -h |sed -e 's/^../XX/' -e 's/^...\(..............\).../\1/'\
	-e 's/XX//' -e '/[^ 0-9]/d' -e '/^ *$/d' |tr -s ' \n' '  ' |\
	sed -e 's/ $//')
# Find the last Mo that is not the last day of the month
LASTM=$(cal -h |sed -n -e 's/^.. \(..\).*/\1/p'|sed -e '/^ *$/d' | \
	grep -v ${WORKD##* }|tail -1)
TODAY=$(date +%d |sed -e s/^0//)

[ ${LASTM:-28} -eq $TODAY ] || exit 0
# the rest of the script...

Looking at the process table is less useful

Some jobs have used pgrep to look for services that need some cleanup action take. That has proved problematic because of race conditions in pgrep and aborted processes leaving junk around that still needed to be cleansed. It is far better to touch a flag file (or empty log file) to indicate the need for the cleanup, then remove or zero the file after the job has started (or finished).

See flock(1)'s manual page (or HTML document) for more about locking files to prevent duplicate compression runs or cleanup tasks.

See also

The haveip IP address owner check program. When a task needs to follow a VIP, then run the task on every member of the cluster, only to exit at the top of the script, unless the local host presently has the VIP configured up.
$Id: kicker.html,v 1.2 2012/10/31 15:29:55 ksb Exp $