134 lines
6.8 KiB
Markdown
134 lines
6.8 KiB
Markdown
---
|
||
created_at: '2016-05-30T21:52:01.000Z'
|
||
title: Things Unix can do atomically (2010)
|
||
url: https://rcrowley.org/2010/01/06/things-unix-can-do-atomically.html
|
||
author: turrini
|
||
points: 207
|
||
story_text:
|
||
comment_text:
|
||
num_comments: 62
|
||
story_id:
|
||
story_title:
|
||
story_url:
|
||
parent_id:
|
||
created_at_i: 1464645121
|
||
_tags:
|
||
- story
|
||
- author_turrini
|
||
- story_11803431
|
||
objectID: '11803431'
|
||
year: 2010
|
||
|
||
---
|
||
This is a catalog of things UNIX-like/POSIX-compliant operating systems
|
||
can do atomically, making them useful as building blocks for thread-safe
|
||
and multi-process-safe programs without mutexes or read/write locks.
|
||
The list is by no means exhaustive and I expect it to be updated
|
||
frequently for the foreseeable future.
|
||
|
||
The philosophy here is to let the kernel do as much work as possible.
|
||
At my most pessimistic, I trust the kernel developers more than a trust
|
||
myself. More practically, it’s stupid to spend CPU time locking around
|
||
an operation that’s already atomic. Added 2010-01-07.
|
||
|
||
## Operating on a pathname
|
||
|
||
The operations below are best left to local filesystems. More than a
|
||
few people have written in crying foul if any of these techniques are
|
||
used on an NFS mount. True. When there are multiple kernels involved,
|
||
the kernel can’t very well take care of all the locking for us. Added
|
||
2010-01-06.
|
||
|
||
- `mv -T <oldsymlink> <newsymlink>` atomically changes the target of
|
||
`<newsymlink>` to the directory pointed to by `<oldsymlink>` and is
|
||
indispensable when deploying new code. Updated 2010-01-06: both
|
||
operands are symlinks. (So this isn’t a system call, it’s still
|
||
useful.) A reader pointed out that `ln -Tfs <directory> <symlink>`
|
||
accomplishes the same thing without the second symlink. Added
|
||
2010-01-06. Deleted 2010-01-06: `strace(1)` shows that `ln -Tfs
|
||
<directory> <symlink>` actually calls `symlink(2)`, `unlink(2)`, and
|
||
`symlink(2)` once more, disqualifying it from this page. `mv -T
|
||
<oldsymlink> <newsymlink>` ends up calling `rename(2)` which can
|
||
atomically replace `<newsymlink>`. Caveat 2013-01-07: this does not
|
||
apply to Mac OS X, whose
|
||
[`mv(1)`](https://developer.apple.com/library/mac/#documentation/Darwin/Reference/Manpages/man1/mv.1.html)
|
||
doesn’t call `rename(2)`. [`mv(1)`](http://linux.die.net/man/1/mv).
|
||
- `link(oldpath, newpath)` creates a new hard link called `newpath`
|
||
pointing to the same inode as `oldpath` and increases the link count
|
||
by one. This will fail with the error code `EEXIST` if `newpath`
|
||
already exists, making this a useful mechanism for locking a file
|
||
amongst threads or processes that can all agree upon the name
|
||
`newpath`. I prefer this technique for whole-file locking because
|
||
the lock is visible to `ls(1)`.
|
||
[`link(2)`](http://linux.die.net/man/2/link).
|
||
- `symlink(oldpath, newpath)` operates very much like `link(2)` but
|
||
creates a symbolic link at a new inode rather than a hard link to
|
||
the same inode. Symbolic links can point to directories, which hard
|
||
links cannot, making them a perfect analogy to `link(2)` when
|
||
locking entire directories. This will fail with the error code
|
||
`EEXIST` if `newpath` already exists, making this a perfect analogy
|
||
to `link(2)` that works for directories, too. Be careful of
|
||
symbolic links whose target inode has been removed ("dangling"
|
||
symbolic links) — `open(2)` will fail with the error code `ENOENT`.
|
||
It should be mentioned that inodes are a finite resource (this
|
||
particular machine has 1,245,184 inodes).
|
||
[`symlink(2)`](http://linux.die.net/man/2/symlink). Added
|
||
2010-01-07
|
||
- `rename(oldpath, newpath)` can change a pathname atomically,
|
||
provided `oldpath` and `newpath` are on the same filesystem. This
|
||
will fail with the error code `ENOENT` if `oldpath` does not exist,
|
||
enabling interprocess locking much like `link(oldpath, newpath)`
|
||
above. I find this technique more natural when the files in
|
||
question will be `unlink`ed later.
|
||
[`rename(2)`](http://linux.die.net/man/2/rename).
|
||
- `open(pathname, O_CREAT | O_EXCL, 0644)` creates and opens a new
|
||
file. (Don’t forget to set the mode in the third argument\!)
|
||
`O_EXCL` instructs this to fail with the error code `EEXIST` if
|
||
`pathname` exists. This is a useful way to decide which process
|
||
should handle a task: whoever successfully creates the file.
|
||
[`open(2)`](http://linux.die.net/man/2/open).
|
||
- `mkdir(dirname, 0755)` creates a new directory but fails with the
|
||
error code `EEXIST` if `dirname` exists. This provides for
|
||
directories the same mechanism `link(2)` `open(2)` with `O_EXCL`
|
||
provides for files.
|
||
[`mkdir(2)`](http://linux.die.net/man/2/mkdir). Added 2010-01-06;
|
||
edited 2013-01-07.
|
||
|
||
## Operating on a file descriptor
|
||
|
||
- `fcntl(fd, F_GETLK, &lock)`, `fcntl(fd, F_SETLK, &lock)`, and
|
||
`fcntl(fd, F_SETLKW, &lock)` allow cooperating processes to lock
|
||
regions of a file to serialize their access. `lock` is of type
|
||
`struct flock` and describes the type of lock and the region being
|
||
locked. `F_SETLKW` is particularly useful as it blocks the calling
|
||
process until the lock is acquired. There is a “mandatory locking”
|
||
mode but Linux’s implementation is unreliable as it’s subject to a
|
||
race condition. [`fcntl(2)`](http://linux.die.net/man/2/fcntl).
|
||
- `fcntl(fd, F_GETLEASE)` and `fcntl(fd, F_SETLEASE, lease)` ask the
|
||
kernel to notify the calling process with `SIGIO` when another
|
||
process `open`s or `truncate`s the file referred to by `fd`. When
|
||
that signals arrives, the lease needs to be removed by `fcntl(fd,
|
||
F_SETLEASE, F_UNLCK)`. `fcntl(fd, F_NOTIFY, arg)` is similar but
|
||
doesn’t block other processes, so it isn’t useful for
|
||
synchronization. [`fcntl(2)`](http://linux.die.net/man/2/fcntl).
|
||
- `mmap(0, length, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0)` returns
|
||
a pointer from which a file’s contents can be read and written by
|
||
normal memory operations. By making frequent use of `msync(addr,
|
||
length, MS_INVALIDATE)`, data written in this manner can be shared
|
||
between processes that both map the same file.
|
||
[`mmap(2)`](http://linux.die.net/man/2/mmap),
|
||
[`msync(2)`](http://linux.die.net/man/2/msync).
|
||
|
||
## Operating on virtual memory
|
||
|
||
- `__sync_fetch_and_add`, `__sync_add_and_fetch`,
|
||
`__sync_val_compare_and_swap`, and friends provide a full barrier so
|
||
“no memory operand will be moved across the operation, either
|
||
forward or backward.” These operations are the basis for most (all?)
|
||
lock-free algorithms. [GCC Atomic
|
||
Builtins](http://gcc.gnu.org/onlinedocs/gcc-4.1.2/gcc/Atomic-Builtins.html).
|
||
|
||
Something I should add to my repertoire? Race condition? Let me know
|
||
at <r@rcrowley.org> or [@rcrowley](http://twitter.com/rcrowley) and I’ll
|
||
fix it.
|