To understand this document

This document assumes you are quite familiar with the standard UNIX shell, sh(1), and have an understanding of the UNIX™ process model, exit codes, and have coded several scripts. Some understanding of advisory file locking (see flock(2)) would be helpful, but not required.

Since flock may use file descriptors by number it would be helpful to have some understanding of their mapping (viz. stdin is 0, stdout is 1, and stderr is 2), and the exec shell internal, which we use to manipulate descriptors.

What is `flock`?

Updates to a common file by multiple process sometimes have unexpected results because the operations on the file may interleave badly. For example when two shell processes write to the same output file the lines from them may be mixed together, or one may finish before the other starts. That nondeterministic result is usually not what you want from a computer program.

When more than 1 process needs to update a common resource we lock either the resource itself, or a file built just to represent the resource with flock. Then, if everyone plays by the rules, we know it is safe to make our update. Every other process that wants to update the resource is blocked by our lock, when we release our lock one of the others gets a turn.

Command line usage

flock [-cfn] [-EX|SH|UN|NB] file|fd [cmd]
flock -h
flock -V

The first usage is the most common: request a lock or unlock operation on an already open descriptor (via fd) or on a file or directory via file. See the manual page flock(1) for details.

Sample usage

Here is an example ksh function which needs a lock:

function autoseq {
	[ -f $SEQ ] || echo 0 >$SEQ
	read Cur <$SEQ
	echo $((Cur+1)) >$SEQ
	print $Cur
}

SEQ=$(mktemp ${LOGNAME}XXXXXX)
autoseq &
autoseq &
autoseq &
autoseq &
autoseq &
autoseq &
autoseq &
autoseq &
rm -f $SEQ
exit

Sometimes that code outputs 8 numbers, sometimes they are all different, mostly they are 1 or 2. It races for access to the sequence file, and near the end the last few processes race with the rm command, which means the sequence file might still exist after the script exits.

To fix that code we need 3 enhancements:

function autoseq {
	exec 9<&0 <$SEQ
	flock -EX 0
	read Cur
	echo $((Cur+1)) >$SEQ
	flock -UN 0
	exec 0<&9 9<-
	print $Cur
}

SEQ=$(mktemp ${LOGNAME}XXXXXX)
touch $SEQ
autoseq &
autoseq &
autoseq &
autoseq &
autoseq &
autoseq &
autoseq &
autoseq &
wait
rm -f $SEQ
exit

We made sure the file exists before we start the first worker process.: This allows us to use the file itself as the lock.
We wait for the workers before we cleanup the sequence file.: This prevents the race where we try to open the file for read, but the main script has already deleted it.
We flock the sequence file while we update it.: This is the key, we open the file as stdin then ask for an exclusive lock on the file. Each process blocks on that call until it is the only one that holds the lock (they take turns like you learned to do in kindergarten). We read and update the sequence file, then release the lock for the next peer.

We really could just close the locked stdin to release the lock, but I thought it was more clear this way.

Alternatively we could bundle the update into a shell command-string and hold the lock only for the life of the sub-shell:

function autoseq {
	flock -EX 0 <$SEQ ksh -c \
		"read Cur; echo \$((Cur+1))>$SEQ; print \$Cur"
}

This takes advantage of flock's cmd parameter to run a process while holding a lock.

You don't have to lock the resource

Less obviously we might be able to lock the stdout of the script, rather than the sequence file itself. Any resource that the program uses as a local convention is fine.

function autoseq {
	flock -EX 1 ksh -c \
		"read Cur <$SEQ; echo \$((Cur+1))>$SEQ; print \$Cur"
}

The most common reason this might fail is when some other process connected to the same output uses the same locking protocol. That may be avoided trivially by wrapping the autoseq processes in a sub-shell with a pipe to cat. (This allows our process to lock the pipe, rather than the common output descriptor.)

So a process might create a unique file (or file descriptor) to represent the resource (resources) that it needs to manage. There is little reason for the code to lock the actual file, since it controls the locking protocol. Personally I like to lock the whole data directory when my process uses multiple files, which only works with advisory locking. I find fcntl locking harder to use, less flexible, and much slower.

Directory or context locking

I use 2 shell aliases to lock directories while I'm working on them:

alias lockdir='exec 7<. && flock -NB -EX 0 0<&7 2>/dev/null || echo "ksh: $PWD: already locked" && false'
alias unlockdir='exec 7<&-'

these allow me to assue that I don't have more than a single shell open in the directory, so I don't botch critial updates.

Very few operations require such care, but it is nice to know you can do it. This also stops other engineers or admins from stepping on eachother's critial updates. Use this spell sparingly, as it quickly get to be more of a joke when every operation you need to do it blocked by an active shell.