Xapply
expands your shell powers by adding
parallel processing to iterative loops. Items from files, the command
line, and ptbw
instances form shell commands
that are managed in a wrapper stack. It also allows you to cache
the else
case, where
no items were provided.
sh
(1), and
have an understanding of the UNIX process model, exit codes,
and have coded several scripts, used gzip
(1),
and find
(1).
It also assumes that you can read the manual page for any other
example command.
Having some exposure to printf
(3) or
some other percent-markup function would help a little.
I use this "expander markup" in many of my programs.
xapply
?xapply
is a generic loop. It iterates
over items you provide, running a customized shell command for
each pass though the loop.
One might code this loop as something like:
and feel pretty good about it, so why would you needfor Item in $ARGV do
body-part
$Item done
xapply
?
The number 1 reason to use xapply
is that it runs some
of the body-part
's in parallel. It starts as many
as you ask it to (using the -P
option),
then, as processes finish, it launches the next iteration of
body-part
, until they are all started.
It waits for the running ones to finish before it exits.
The benefit is that we might take advantage of more CPUs resources (either as threads on CPU cores, or multiple CPU packages in a host).
Even better, it can manage the output from those parallel
tasks so that each is not all mixed with the others.
Without the -m
switch, xapply
assumes you
can figure out which iteration of body-part
output each line.
Under -m
, xapply
groups
the output from each iteration together such that
one finishes completely before the next one starts.
Like most loops, xapply
can skip though the list more
than 1 item at time.
The -
count
option allows you to
visit the items in the argument list in pairs (or groups of count).
This is handy for programs like
diff
(1) that need 2 targets.
Unlike common loops, xapply
keeps track of
critical resources for each iteration.
A body can be bound to a token which it uses
for the life of its task. That resource token (for example a modem) won't
be issued to another iteration until the owner-task is complete, then
it will be allocated to a waiting task.
This allows xapply
to make very efficient use of
limited resources
(and it honors -P
as an upper limit as well).
Xapply
has other friends.
In fact, it is the core node that connects
xclate
(1), ptbw
(1) and
hxmd
(8) to each other.
We'll come back to the usefulness of that fact in a bit.
In summary, xapply
lets you take advantage of
all the CPU resources on a host while keeping the tasks and resources
straight.
To raise the overall torque even more it reaches out to share resources,
to collate output, and reuse configuration data.
These features are all coordinated across multiple instances of
xapply
and the related tools.
gzip
utility can be pretty expensive in terms of CPU.
If we want to compress many output files (say *.txt), we could run
something like:
gzip -9 *.txt
Most modern hosts have more than the single CPU that is going to use.
We might break the list up with some shell magic (like split
(1)),
then start a copy of gzip
for each file.
That won't balance the CPUs, as 1 list will inevitably
have most of the small files.
This short list finishes long before the others, leaving an idle
CPU while the larger task still has files left to compress.
The shell code to split the list up is also pretty complex. Given a temporary file, it might look like this:
/bin/ls -1 *.txt >$TMP_FILE LINES=`wc -l <$TMP_FILE` split -l $((LINES/4+1)) $TMP_FILE $TMP_FILE, for Start in $TMP_FILE,* do xargs gzip -9 <$Start & done wait rm $TMP_FILE $TMP_FILE,*
With xapply
, we can keep 4 processes running in parallel with:
The quotedxapply -P4 "gzip -9" *.txt
gzip
command is the template used
to build a shell command for each file that matches the glob. If no
files match the glob then the literal string "*.txt" is passed to
the shell, which passes it to gzip
, which
complains that it cannot stat
(2) that file.
With a few dozen files matching the glob, we would keep our machine busy for
a while! If there are less than 4
files we just start as many as we can. More than that will queue until
(the smallest or first) one finishes, then start another. This actually
sustains a load average on my test machine right at 4.0.
The xapply
process itself is blocked in the
wait
system call and therefore uses no CPU until
it is ready to start another task.
When the list of files might be too long for an argument list,
provide them on stdin
(or from a file)
with the xapply
's -f
:
This is also good because it won't try to compress a file named "*.txt" in the case where the glob doesn't match anything. The other great thing about that is that the firstfind . -name \*.txt -print | xapply -f -P4 "gzip -9" -
gzip
task starts as soon as find
sends the first
filename though the pipe!
When find
has queued enough files to block on
the pipe, it gives up the CPU to the gzip
s,
which is exactly what you want. Just before that there are actually
5 tasks on the CPU, which is OK as find
is largely
blocked on I/O while gzip
is busy on the CPU.
In other cases, we'll need to specify the name of the matched file
someplace else, or more than once, in the template command.
We use the markup %1
to specify where in
the template command the current parameter should be expanded.
find . -name \*.txt -print | xapply -f -P4 "gzip -9 %1; ls -ls %1" -
But it is never a good idea to create more processes than
you really need: compare these two spellings of the same idea.
We want to find all the RCS semaphore files that are under
/usr/msrc
:
Compare that to reducing the number of$ find /usr/msrc -name RCS -type d -print | xapply -f 'glob -s "%1/,*" "%1/*_"' - # timed at 11.56s real 2.85s user 10.82s system
glob
processes
fork
'd:
By using a filter ($ find /usr/msrc -name RCS -type d -print | xapply -fn '"%1/,*" "%1/*_"' - | fmt 8192 10240 | xapply -f 'glob -s' - # timed at 1.08s real 0.27s user 0.35s system
fmt
) to group arguments into
bigger bundles we saved 90% of the time. Over larger tasks the
savings may be much larger. Remember fork
(2) is
a really expensive system call, no matter how fast your machine might be.
In rare cases, we may want to discard the parameter; then we
use %0
to expand the empty string, and
suppress the default expansion on the end of the template.
xapply
as a filter, reading from stdin
and writing
to stdout
, like awk
would. We'll see in the custom
command section that this is closer to the truth than it looks.
For now, just play along.
Because of the parallel tasks, xapply
has some unique
issues with I/O.
On the input side, we have issues with processes competing for input
from stdin
. We take several measures to keep the books balanced.
-
count
switch and stdin
xapply
command folds input lines 1 and 2 to a single line,
then 3 and 4, then 5 and 6 and so on to the end of the file:
The 2 occurrences ofxapply -f -2 'echo' - -
stdin
, spelled dash "-
" like most
UNIX filters, share a common reference. That is, the code knows to
read 1 thing from stdin
for each dash, for
each iteration, rather than reading all of stdin
for
the first dash leaving nothing for the second.
In other words, it does what you'd expect.
Using -3
and 3 dashes reformats the output to
present 3 lines as a single output line.
Find
has the -print0
option for just this reason.
Xapply
has the -z
option to read
-print0
output. Some other programs, like
hxmd
,
also use the nul
terminated format.
So the compress example might become:
find . -name \*.txt -print0 | xapply -fz -P4 "gzip -9" -
-i input
stdin
for
all the inferior tasks.
Under -f
, the default value is /dev/null
.
This lets the parent xapply
use stdin
for
input without random child processes consuming bits from it.
To provide a unique word from $HOME/pass.words
to
each of 5 tasks:
This has some limits; when the file is too short for the number of tasks, thexapply -i $HOME/pass.words 'read U && echo %1 $U' 1 2 3 4 5
read
will fail and
the echo won't be executed. (Put 3 words in the
file and try it.) We might want to recycle the words after they've been
used; see below where we explain how
-t
does that.
Since the read
is part of a program, it could be part of
a loop, so a variable number of words from
the input
file could
be read for each task. Under -P
this could be problematic.
-m
option, xapply
tasks each
send output to stdout
all jumbled together. This is not
evident until you try a large -Pjobs
case with
a task that outputs over time (like a long running make
).
If you want an example of this you might compare:
to the collated version:xapply -P2 -J4 'ptbw -' '' ''
xapply -m -P2 -J4 'ptbw -' '' ''
The xclate
processor is xapply
's output
friend. It is not usually your friend, as it is hard to follow all
the rules. In fact some programs, like gzip
, don't
follow the rules very well.
You'll have to compensate for that in
your xapply
spells.
In our example above, we'd like to add the -v
switch to
gzip
to see how much compression we are getting
Which looks OK until you run it. The start of all the compression lines come out all at once (the first 4 of them), then the statistics get mixed up with the new headers as they are output. It is a mess.find . -name \*.txt -print0 | xapply -fz -P4 "gzip -9 -v" -
By adding the -m
switch to
the xapply
, we should be able to collate the output.
However, it doesn't work because
the statistics are sent to stderr
,
so we must compensate with the addition of a shell descriptor duplication:
find . -name \*.txt -print0 | xapply -fzm -P4 "gzip -9 -v 2>&1" -
The logic in xapply
to manage xclate
is
usually enough for even nested calls. When it is not, you'll have
to learn more about xclate
; I'd save that for a major
blizzard, rain storm, or long plane trip.
The xapply
's command line option -s
passes
the squeeze option (also spelled -s
) down
to xclate
. This option allows any task which
doesn't output any text to stdout
to exit without
waiting for exclusive access to the collated output stream.
This speeds the start of the next task substantially in cases
where output is rare (and either long, or evenly distributed).
apply
uses a printf-like percent expander to
help customize commands. As a direct descendant of apply
,
xapply
has a similar expander.
As one of my tools, it has a lot more power in that expander.
In addition to the apply
feature of binding %1
to the
first parameter, %2
to the second, and so forth,
xapply
has access a facility called the
dicer.
The dicer is a shorthand notation used to pull substrings out of
a larger string with a known format. For example, a line in the
/etc/passwd
file has a well-known format which uses
colons (":") to separate the fields. In every password file
I've ever seen, the first field is the login name of the account.
The xapply
command
filters thexapply -f 'echo %[1:1]' /etc/passwd
/etc/passwd
file into a list of login names.
The dicer expression %[1:1]
says "take the first parameter,
split it on colon (:), then extract the first subfield".
Here are several possible dicer expressions and their expansions:
I stuck a nifty one in there: the dollar sign always stands for the last field. The other important point is that
Expression Expansion %1 /usr/share/man/man1/ls.1.gz %[1/2] usr %[1.1] /usr/share/man/man1/ls %[1.1].%[1.2] /usr/share/man/man1/ls.1 %[1/$.1] ls
%[1/1]
would expand to the empty string, since the first field is empty.
The dicer also lets us remove a field with a negative number:
Expression Expansion %1 /usr/share/man/man1/ls.1.gz %[1/-1] usr/share/man/man1/ls.1.gz %[1/-2] /share/man/man1/ls.1.gz %[1.-$] /usr/share/man/man1/ls.1
Because splitting on white-space is so common, the blank character is special in that it matches any number of white-space characters. Escape any of blank, a digit, close-bracket, or backslash with a backslash to force it to be taken literally.
Later versions of xapply
also allow access to the
mixer, which allows the selection of characters from a
dicer expression. That is slightly beyond the scope of
this document. As an example, %(3,$-1)
is the
expression to reverse the characters in %3
.
All my tools use the same mixer and dicer expression syntax:
xapply
, mk
,
oue
, and sbp
.
Because some programs call xapply
, they also
provide a dicer interface (for example, hxmd
).
The dicer documentation is in the explode
library
as dicer.html
(found usually in
/usr/local/lib/explode
).
xapply
provides: viz. shells, escape characters, and padding.
-S shell
option lets you select a shell for
the command built to start each task. I would use ksh
or
sh
, if it were me. You could set $SHELL
to anything you like, but that might confuse other programs that use
xapply
, so stick to -S
.
As a special case, when you set -S perl
,
it changes the behavior of xapply
.
To introduce the command string,
it uses perl -e
rather than the Bourne shell
compatible $SHELL -c
.
It might also setup -A
differently (see below).
count
of 2 and 2 file
parameters under -f
,
xapply
matches the corresponding lines from each file as
parameter pairs. When only 1 of the files runs out of lines, the
empty string is provided as the element from the other. You can change this
pad
string to anything you like, for example -p /dev/null
.
In one of our first examples, we joined pairs of lines. What happens if
there is only 1 line?
The echo
command gets an extra space on the end,
which it trims. To see that, we can replace the default expansion with
a quoted one, and run it through cat -v
:
This outputs "A $" (without the quotes). Tryecho A |xapply -f -2 'echo "%*"' - - | cat -ve
%~*
for a nice Easter Egg, use the -p
option below to understand the output.
There are alternatives. Under -p
we can detect a
sentinel value in for missing line. Say, for example, that a comma on
a line by itself could never be an element of the input, then
-p .
would let us detect the missing even line with
xapply -p , ... if [ _"%2" = _"," ] ; then ...'
It is usually considered good form to exit from task
as soon as possible. With this in mind the above trap might be better
coded as:
... [ _"%2" = _"," ] && exit; ...'
-a
option. Take care that the symbol you pick is quoted
from the shell.
Viz. "xapply -a ~
..." is not what you'd want under
csh
or ksh
, since the tilde gets expanded to a
path to someone's home directory.
Because xapply
is driven from
mkcmd
, it takes the full list of
character expressions (-a "^B"
is
ASCII stx
, -a M-A
is code 230); that
doesn't mean you should use them. Try to stick with percent when you can.
In ksh
, that makes some let
,
$((...))
, and ${NAME
%glob
}
parameter substitutions require %%
to
get a literal percent sign.
xapply
is emulating a generic, loop it stands
to reason that there would be a "loop counter".
The loop counter is named %u
, which stands
for "unique". It also would be nice to be able to
break
out of the loop. For that
we send a signal to xapply
with
%p
, which stands for "pid".
When you use -F
to
load xapply
as an interpreter, then
the markup %c
expands to
the cmd
read from script
,
while %fc
expands to the path to
the script
specified.
Also %ct
expands the the load line, or a
synthetic one representing the shell used to run each task.
%u
xapply 'echo %u %1' A B C D E
A better use of this might be to process data from one iteration to
the next (making generations of a file with the extension .%u
).
Use of ksh
's built-in math operations to build a
function based on %u
is common. To queue many
at
jobs about 5 minutes apart:
Thexapply -x 'at + $((%u*5)) minutes < %1' *.job
-x
option lets you see the commands executed on stderr
.
This emulates set -x
in Bourne shell.
If the unique loop counter is provided by gtfw
then the source of the counter is available as %fu
.
%p
break
the xapply
loop we signal the process with a USR1
signal. Usually this is done conditionally in cmd
with the kill
command.
The markup %p
expands to the pid of
the enclosing xapply
.
For example, to seach a list of integers for a prime number I might code:
In that example thexapply -f 'is-prime %1 || exit; echo %1; kill -USR1 %p' numbers.cl
exit
command acts as
a C continue
statement, and the
kill -USR1 %p
command acts as a
break
statement (or maybe more like a
longjmp
call).
Note that the kill
command does
not terminate any already running tasks. So under
-P
some tasks might already be processing in
parallel with the task that short-circuited the loop.
Since the USR1
sent to
xapply
didn't terminate the current task either,
you may need to exit
explicitly as well.
This feature is allowed under hxmd
as well.
Short-circuited commands are assigned the synthetic status 3000,
as a sentinel value to distinguish them from other failed commands.
As a way to fetch the run-time name xapply
was called the expansion of %fp
is
the program name.
That will expand an unbalanced grave quote in the subject argument. Even worse, it might try to run "Abrose" as a shell command.xapply -f -2 'Mail -s "Hi %1" "%2" <greeting.mail' names.cl address.cl
A program should be safe from such corner cases, like a filename with
a quote or control character in the name.
On input, xapply
can use the -print0
-style, on
output we depend on the shell. To make a parameter safer, there is a
q
modifier that
tells xapply
that you are going to wrap the expansion in shell double-quotes, and that
you'd like the resulting dequoted text to be the original value.
By spelling the expansion as:
We're askingxapply -f -2 'Mail -s "Hi %q1" "%q2" <greeting.mail' names.cl address.cl
xapply
to backslash any of double-quote, grave,
dollar, or backslash in the target text, so the command is presented to
the shell as:
Mail -s "Hi Paul d\`Abrose" "[email protected]"...
In versions of xapply
above 3.60, two more quote
forms are available: %Q
quotes all the
shell meta characters with a backslash (\
),
and %W
quotes all the shell meta characters
and all the default IFS
characters (space, tabs
and newlines). These are mostly useful to pass commands to a
remote machine via ssh
. This example sends
commands (from cmds
) round-robin to
a list of hosts (in /tmp/ksb/hosts
):
Note that number of hosts doesn't have to match the number of commands.xapply -ft /tmp/ksb/hosts 'ssh %t1 %Q*' cmds
This is not always enough; sometimes the data should be passed through
a scrubber, or sent to /dev/null
, when
you don't trust it.
%+
shifts the
parameters over one to the left, then expands
the new cmd
(replacing the
%+
), then continues with the rest of
the original cmd
.
An example makes this a little clearer:
Outputsxapply -n -2 "( %+ )" "echo %1 %1" ksb rm /tmp/bob
( echo ksb ksb ) ( rm /tmp/bob )
This is really a lot more useful when the input is a pipe
(viz. under -fz
).
A program can match commands to parameters and send the
paired stream to xapply
for parallel execution.
This is exactly how hxmd
works.
It is also possible to use the first token (from -t
)
as a command template via %t+
. In this
case the token is consumed, so subsequence references
to %t+
will expand the second token (if any).
As an example:
This strange meme is actually really useful, if you use$ cat /tmp/ksb/cmd host %1 id %2 date uptime $ xapply -n -2 -R4 -t/tmp/ksb/cmd '%t+;%t+;%t+;%t+' sulaco ksb host w01;id ksb;date;uptime
ptbw
to hold a list of commands that need to be
applied remotely. But you'll just have to trust me on that until you
see gtfw
and sshw
in
action.
xapply
didn't get any arguments to use
as parameters it shouldn't run anything (unlike busted xargs
).
In a few cases, it might be nice to have an "else" part (like a Python
while
loop). The -N else
option allows
a command to run when we didn't get any tasks started.
Let's rework our compression filter; we'll misspell the extension we are looking for (so we don't match anything) and put in a message when we do not find anything to compress.
find . -name \*.text -print0 | xapply -fzm -P4 -N "%0echo Nothing to compress. 1>&2" "gzip -9 -v 2>&1" -
This is mostly used in scripts to give the Customer a warm feeling that we looked, but didn't find anything to do.
In the else
command %u
expands to 00
(double zero). This allows
other processors (like hxmd
) to tell the
difference between the first task (0
) and
the else
clause. It doesn't help to
send a USR1
to xapply
from
the else
command, because no commands will be
run, in any event.
As an example of how to recover the notification stream, I'll make a list
of just the else
clause:
$ xclate -m -rN'>res' xapply -xm -f -N 'kill -USR1 %p; exit 44' /dev/null kill -USR1 6765; exit 8 xapply: caught USR1, stop processing $ tr '\000' '\n' <res 8,00
The -x
showed us the shell command built from
else
and told us that xapply
did get the signal. The -m
option forced
xapply
to send notification to
the xclate
we started, which diverted those
notifications to the local file res
. We converted
the NUL terminated records into text via tr
.
xapply
is very predictable.
When we run the examples on the same input, we are apt to get the same
output. All that changes when we allow xapply
to
start a ptbw
to manage a resource.
Each line of a ptbw
resource file represents a
unique resource that is allocated to a single task at any one time.
A resource could be anything, a CPU, filesystem,
VX disk group or network address. I picked a modem in these
examples because the exclusive use to dial a phone number is clear.
If we have 3 modems connected to a host
on /dev/cuaa0
, /dev/cuaa1
, and
/dev/ttyCA
, we can put those strings in a file
called ~/lib/modems
. Then we can ask xapply
to reserve 1 modem for each command:
No matter how many phone numbers are inxapply -f -t ~/lib/modems -R 1 'myDialer -d %t1 %1' phone.list
phone.list
, we
will never try to dial different numbers on the same modem.
This is because xapply
and ptbw
know how
to work with each other to keep the books straight.
We can force a new ptbw
instance into our
process tree by using the -t
option, the -J
,
or a -R
option with any value greater than 0.
If we don't use any of those options, xapply
uses
the internal function iota
just as ptbw
does, but doesn't insert an instance in the process tree, so any
enclosing ptbw
will be directly visible to each task.
The new expander form, %t1
, expands to the modem selected.
The -R
options specifies how many resources to allocate
to each task.
All of the dicer forms we saw above might be applied to a resource:
given that %t1
expands to /dev/cuaa1
:
Expression Expansion %t[1/$] cuaa1 %t[1/-$] /dev %t[1.-$] /dev/cuaa1
If we use the resource to allocate CPUs we might want to get
more than 1 to a task. In that case we can tell ptbw
to just bind unique integers as the resources. On a 16 CPU machine
we could divide the host into 5 partition of 3 CPUs:
Thexapply -J5 -R3 -f -P5 'myWorker %t*' task.cl
-J5 -R3
is passed along to ptbw
to
build a tableau that is 5 by 3, then xapply
consults that to allocate resources. The %t*
passes
the names of the CPUs provided down to myWorker
.
The markup %ft
expands to the source of
the tokens. If the tokens are the internal default the name expanded
is iota
.
xapply
in xclate
-e
var
=dicer
option allows any
environment variable to be set to a dicer expression.
To specify the modem in $MODEM
(rather than in an option):
xapply -f -t ~/lib/modems -R 1 -e "MODEM=%t1" 'myDialer %1' phone.list
This is also really useful to send options down to xclate
in
XCLATE_1
to set headers and footers on collated output.
For more on the use ofXCLATE_1='-T "loop %{L}"' xapply -m -e L=%u 'echo' A B C
XCLATE_
n
see the
xclate HTML document.
Here is why xapply
has to set the variable: the xclate
output filter is launched as a peer process to the echo
command,
so changing $L
in the command won't give it a new value
in the (already running) process. We can't set it in the parent shell
as it won't change for each task, so xapply
needs to be able
to set it.
I have a list of SHA512 sinatures for a set of files I just downloaded, and I want to check those against the files themselves. The list is directly from the OpenBSD.org website, with lines like:
So I need to snip out the filename withSHA512 (base.tgz) = 2b450f4bfe3b0b7c73a8b81c912b0818357cdf41ad6fc34949367e466de6790ec4a0582716e0f80246ba9121c41d531524768df150e3c8ce79f1c566cf4a3567 SHA512 (comp.tgz) = 15a8229d651feb9714cf524cf65c26baf8416a362589d7e6b9e343fa7c0a834b7a9c0ecc23f37f5108a1bae7e0c27ff3240784700ec946bcb198eca43bc49a8b ...
%[1(2)1]
(the first line, snip at open-paren, choose the second field, snip at the close,
choose the first element). Then I need to use openssl
to compute the SHA512 and compare that output to the line itself.
Since all versions of openssl
don't put the spaces
in the same way we'll have to delete the blanks to make the lines match.
xapply -e F='%[1(2)1]' -f '# set -x [ "`openssl sha512 $F|tr -d \"$IFS\"`" = "`echo %W1| tr -d \"$IFS\"`" ] || echo Corrupt $F' /tmp/SHA512
%u
-u
forces xapply
to pass the value
of %u
to any output xclate
as the xid
.
Using that, the above example becomes
but that's not the reason this option exists.XCLATE_1='-T "loop %x"' xapply -m -u 'echo' A B C
When another processor (say hxmd
) wants to know which of
several tasks has completed, it can call xapply
with
-u
and xclate
with -N
.
Then, notify
xclate
reports the completion of each task with
the number of the task as the xid
on the resource
given to -N
.
This makes xapply
an excellent "back-end" program to manage
parallel tasks, although it works best from a C or perl program.
Here is an example where we use notify
to
show the order of complete tasks:
This would be more useful if we could get thexclate -m -N '|tr -u \\000 \\n|while read N; do echo fini $N; done' -- \ xapply -m -u -P5 'sleep' 3 2 5 2 3
exit
code from each task, and we can under -r
.
Try that same with a -r
switch passed to xclate
(-Nr
).
The 2 numbers are the exit status
, and
the xid
.
Also, try both of those without
the -u
option to xapply
, in
1 case, you get the number of the task, in the other the number of
seconds slept (which is the value of %1
).
The observant student might think
this looks like it was designed to be given as input to an instance
of xapply -fz
.
Another possible use is hxmd
's retry logic.
One last corner case: the -r
output for -N
's
command is encoded as task "00". Thus, it is distinguishable, as a
string, from the first task (given as "0"). This is the same hack
the new rmt
program uses to tell the client it has
a new more advanced command set.
ptbw
to meptbw
program allows a shorthand to
access the recovered
resources as shell positional parameters. For historical reasons, this
option is also provided by xapply
. In the xapply
case the shell parameters ($1
, $2
, ...) become
run-time versions of the expander names (%t1
, %t2
, ...).
That makes our command line modem example look like:
We don't have to specify axapply -f -t ~/lib/modems -R 1 -A 'myDialer -d $1 %1' phone.list
-e MODEM
, we can just force
the name into $1
and use it from there. This even works
when the -S
option selects
perl
as the shell, or
even worse tcsh
.
See the ptbw HTML document for more ideas about how to setup resource pools and using them from the command-line and from scripts.
xapply
as a co-processksh
manual page
under Co-Processes, if you've never heard of these before.
Because of the way xapply
is designed,
it makes a really great co-process. It manages a list of tasks
given to it on stdin
, and outputs a list
of results on stdout
-- which is exactly
what a co-process service should do.
For a real turbo, let's start our gzip
loop
as a co-process in a fair mockup of a workstation dump structure.
Say we want to dump many workstations in parallel to a large file server.
We are going to ssh
to each client to run
dump
(8) over a list of filesystems.
But we need to limit the impact to each workstation owner's
desktop, so let's run the compression for the files
locally on the file server. For a start, I'm going to assume
that the file server can run at least 4 processes at a time.
I'm going to simplify the code a little to show the inner loop
for a single host here.
We'll start a co-process that keeps 3 gzip
tasks
running. To do that, it reads the names
of the files to compress from stdin
, so
the main script outputs each completed dump archive to the co-process
with print -p
; if it is marked in the list
as gzip
. After all the hosts are
finished, we close the co-processes input, then wait
for
it to finish.
#!/bin/ksh # comments and some argument processing : ${SSH_AUTH_SOCK?'must have an ssh agent to run automated backups'} unset TAPE RMP RSH ... nice xapply -P3 -f 'gzip -7v %1 1>&3 2>&3' - 3>gzip.log |& ... for $TARGET in ... ; do ... while read FS WHERE COMPRESS junk ; do ssh root@$TARGET -x -n su -m operator -c "'/bin/sync; exec /sbin/dump -0uL -C16 -f - $FS'" >$WHERE.dump [ _${COMPRESS:-no} = _gzip ] && print -p $WHERE.dump done <<-\! / slash gzip /var var gzip /usr usr gzip /home home gzip /var/ftp var_ftp no ... ! done exec 3>&p;exec 3>&- wait # cat gzip.log exit 0
In the real code, we run several hosts in parallel. Also, the list of
target filesystems is not from a here document, but that would be
much harder to explain here. I put in a comment where one might
display (or process) the log from all the gzip
processes. This might be used to feed-back and tune the compression
levels or exclude dumps that grow when compressed (viz. compressed tar files
tend to do that from /var/ftp
).
The reason this is a good structure is that the number of compression
tasks is controlled with a single -P3
specification; when we move the process to a newer host, we can tune it
up to use most of the CPU, saving just enough to run ssh
to fetch backups from our client hosts. In the production script,
the parallel factor is a command-line option, and an outer loop also
processes multiple client hosts in
parallel with xapply
.
Conversely, when we need more resources for the incoming dump streams we
can reduce -P
, or
tune the nice
options to
focus more effort on the ssh
encryption tasks.
And to simplify the code, we could use a pipeline to compress the dumps
as they stream in from the client, but that slows down the over-all
throughput of the process to the speed of the backup host, which may
have more disks than brains.
xapply
as a co-process, you might look at
a pstree
(aka. ptree
) of
the processes doing the work. What you should see is the peer
instance of xapply
with some workers below it,
and sometimes a defunct
process or 2
waiting to be reaped. These don't hurt anything; it is just the way
xapply
blocks reading input before it checks
for finished tasks. Here is a simple example, using your own
ksh
as the master process:
$ nice xapply -f -P3 'sleep %1; date 1>&3' - 3>log.$$ |& $ jobs [1] + Running nice xapply -P3 -f "sleep %1; date 1>&3" - 3> $ print -p 10 $ ptree -n $$ 1380 ksh -i -o vi -o viraw 31057 xapply -P3 -f sleep %1; date 1>&3 - 31058 /bin/ksh -c sleep 10; date 1>&3 _ 31063 sleep 10 31059 ptree -n 1380 $ print -p 20 ; print -p 22 ; print -p 21 $ ptree -n $$ 1380 ksh -i -o vi -o viraw 31148 xapply -P3 -f sleep %1; date 1>&3 - 31149 /bin/ksh -c sleep 20; date 1>&3 _ 31161 sleep 20 31150 /bin/ksh -c sleep 22; date 1>&3 _ 31163 sleep 22 31152 /bin/ksh -c sleep 21; date 1>&3 _ 31162 sleep 21 31164 ptree -n 1380 $ sleep 30 $ ptree -n $$ 1380 ksh -i -o vi -o viraw 31148 xapply -P3 -f sleep %1; date 1>&3 - 31150 () 31152 () 31168 ptree -n 1380 $ exec 4>&p ; exec 4>&- [1] + Done nice xapply -P3 -f "sleep %1; date 1>&3" - 3> $ wc -l log.$$ 4 log.1380 # cat the file to see the output $ rm log.$$
The reason we see 2 exited children under
the co-process xapply
is that xapply
was blocked waiting for a child to
exit
until one did (to free up a slot), then it
noticed that there were no more tasks to launch (when we moved and closed the
p
descriptor). So it waited for the
other children, then exit
'd itself.
Always remember that the co-process can be an entire pipeline, which is
better than just a single xapply
.
I use the nice
to start my co-processes command
and the |&
to end it as
structural documentation in the script.
The nice
also puts the main script at an advantage,
but you could do the opposite and use op
(or
sudo
) to get better scheduling priority,
a different effective uid, or some other escalation for
the co-process. If you need the exit codes from the processes see
a note above about using
a wrapped xclate
to do that.
xapply
as an interpretercmd
in
a file, rather than specify it literally on the command-line. In that
case the -F
forces the specification of the
command as a filename. The file is read into memory, then the first
line is removed if it looks like an
execve(2)
interpreter line.
The resulting text is used as normal cmd
.
That is to say that all the common markup is replaced for every
set of parameters.
Note that we set the parallel factor to 1 to override the default in the loader line. We could also set$ cat /tmp/flag #!/usr/bin/env -S xapply -P -F echo "%q*" $ chmod +x /tmp/flag $ /tmp/flag 1 2 3 3 1 2 $ /tmp/flag -fP1 /etc/motd the contents of the message of the day file, in order $ exit
PARALLEL
to force a known value.
A example, which draws on those above, would be to gzip
each file presented. This is called "gzall":
$ cat /tmp/gzall #!/usr/bin/env -S xapply -P -F exec nice gzip -9 %W1
Also note the use of
env
(1) to
do split-string processing for the arguments to xapply
.
This is not supported on some versions of env
,
which makes this facility less useful.
xapply
as an interpreter?cmd
you need a very complex, or hard to
remember you should put it in a script. If the script just calls
xapply
, you might as well make it the interpreter.
Why not? Setting -N
and -S
on the loader line is often helpful, and it really reduces the shell
quoting needed to make the script.
-F
cmd
is specified literally to
the xapply
command. So I don't use
-F
to hide the template in a file most of
the time.
I believe this makes my code more clear to the reader. Really
my code it almost never clear to the reader, but I do try.
The exception to this is when the command I need is (itself) built
by a make
recipe or another process. In that
case I'll use -F
as the last option in
the command-line specification, which places
the script
where cmd
would normally separate the options from the args
(or files
).
Note that -F
will not take
a dash as stdin
. That would be hard to
justify, since there is really no use-case for it.
(Just read
the command into a shell variable,
then specify that variable as cmd
.)
-V
to
output a useful version banner, and -h
to
output a brief on-line help message. So xapply
does.
%1
) have quick reference ouptut under
-H
, so xapply
does.
xapply
in
the hxmd
HTML document
and the msrc
HTML document.
And a few notes on converting from GNU parallel
to xapply
, hxmd
,
and/or msrc
.
$Id: xapply.html,v 3.47 2013/07/19 20:42:24 ksb Exp $ by ksb.