hn-classics/_stories/2010/11803431.md

134 lines
6.8 KiB
Markdown
Raw Permalink Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
created_at: '2016-05-30T21:52:01.000Z'
title: Things Unix can do atomically (2010)
url: https://rcrowley.org/2010/01/06/things-unix-can-do-atomically.html
author: turrini
points: 207
story_text:
comment_text:
num_comments: 62
story_id:
story_title:
story_url:
parent_id:
created_at_i: 1464645121
_tags:
- story
- author_turrini
- story_11803431
objectID: '11803431'
year: 2010
---
This is a catalog of things UNIX-like/POSIX-compliant operating systems
can do atomically, making them useful as building blocks for thread-safe
and multi-process-safe programs without mutexes or read/write locks. 
The list is by no means exhaustive and I expect it to be updated
frequently for the foreseeable future.
The philosophy here is to let the kernel do as much work as possible. 
At my most pessimistic, I trust the kernel developers more than a trust
myself.  More practically, its stupid to spend CPU time locking around
an operation thats already atomic.  Added 2010-01-07.
## Operating on a pathname
The operations below are best left to local filesystems.  More than a
few people have written in crying foul if any of these techniques are
used on an NFS mount.  True.  When there are multiple kernels involved,
the kernel cant very well take care of all the locking for us.  Added
2010-01-06.
- `mv -T <oldsymlink> <newsymlink>` atomically changes the target of
`<newsymlink>` to the directory pointed to by `<oldsymlink>` and is
indispensable when deploying new code.  Updated 2010-01-06: both
operands are symlinks.  (So this isnt a system call, its still
useful.)  A reader pointed out that `ln -Tfs <directory> <symlink>`
accomplishes the same thing without the second symlink.  Added
2010-01-06.  Deleted 2010-01-06: `strace(1)` shows that `ln -Tfs
<directory> <symlink>` actually calls `symlink(2)`, `unlink(2)`, and
`symlink(2)` once more, disqualifying it from this page.  `mv -T
<oldsymlink> <newsymlink>` ends up calling `rename(2)` which can
atomically replace `<newsymlink>`.  Caveat 2013-01-07: this does not
apply to Mac OS X, whose
[`mv(1)`](https://developer.apple.com/library/mac/#documentation/Darwin/Reference/Manpages/man1/mv.1.html)
doesnt call `rename(2)`.  [`mv(1)`](http://linux.die.net/man/1/mv).
- `link(oldpath, newpath)` creates a new hard link called `newpath`
pointing to the same inode as `oldpath` and increases the link count
by one.  This will fail with the error code `EEXIST` if `newpath`
already exists, making this a useful mechanism for locking a file
amongst threads or processes that can all agree upon the name
`newpath`.  I prefer this technique for whole-file locking because
the lock is visible to `ls(1)`. 
[`link(2)`](http://linux.die.net/man/2/link).
- `symlink(oldpath, newpath)` operates very much like `link(2)` but
creates a symbolic link at a new inode rather than a hard link to
the same inode.  Symbolic links can point to directories, which hard
links cannot, making them a perfect analogy to `link(2)` when
locking entire directories.  This will fail with the error code
`EEXIST` if `newpath` already exists, making this a perfect analogy
to `link(2)` that works for directories, too.  Be careful of
symbolic links whose target inode has been removed ("dangling"
symbolic links) — `open(2)` will fail with the error code `ENOENT`. 
It should be mentioned that inodes are a finite resource (this
particular machine has 1,245,184 inodes). 
[`symlink(2)`](http://linux.die.net/man/2/symlink).  Added
2010-01-07
- `rename(oldpath, newpath)` can change a pathname atomically,
provided `oldpath` and `newpath` are on the same filesystem.  This
will fail with the error code `ENOENT` if `oldpath` does not exist,
enabling interprocess locking much like `link(oldpath, newpath)`
above.  I find this technique more natural when the files in
question will be `unlink`ed later. 
[`rename(2)`](http://linux.die.net/man/2/rename).
- `open(pathname, O_CREAT | O_EXCL, 0644)` creates and opens a new
file.  (Dont forget to set the mode in the third argument\!) 
`O_EXCL` instructs this to fail with the error code `EEXIST` if
`pathname` exists.  This is a useful way to decide which process
should handle a task: whoever successfully creates the file. 
[`open(2)`](http://linux.die.net/man/2/open).
- `mkdir(dirname, 0755)` creates a new directory but fails with the
error code `EEXIST` if `dirname` exists.  This provides for
directories the same mechanism `link(2)` `open(2)` with `O_EXCL`
provides for files. 
[`mkdir(2)`](http://linux.die.net/man/2/mkdir).  Added 2010-01-06;
edited 2013-01-07.
## Operating on a file descriptor
- `fcntl(fd, F_GETLK, &lock)`, `fcntl(fd, F_SETLK, &lock)`, and
`fcntl(fd, F_SETLKW, &lock)` allow cooperating processes to lock
regions of a file to serialize their access.  `lock` is of type
`struct flock` and describes the type of lock and the region being
locked.  `F_SETLKW` is particularly useful as it blocks the calling
process until the lock is acquired.  There is a “mandatory locking”
mode but Linuxs implementation is unreliable as its subject to a
race condition.  [`fcntl(2)`](http://linux.die.net/man/2/fcntl).
- `fcntl(fd, F_GETLEASE)` and `fcntl(fd, F_SETLEASE, lease)` ask the
kernel to notify the calling process with `SIGIO` when another
process `open`s or `truncate`s the file referred to by `fd`.  When
that signals arrives, the lease needs to be removed by `fcntl(fd,
F_SETLEASE, F_UNLCK)`.  `fcntl(fd, F_NOTIFY, arg)` is similar but
doesnt block other processes, so it isnt useful for
synchronization.  [`fcntl(2)`](http://linux.die.net/man/2/fcntl).
- `mmap(0, length, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0)` returns
a pointer from which a files contents can be read and written by
normal memory operations.  By making frequent use of `msync(addr,
length, MS_INVALIDATE)`, data written in this manner can be shared
between processes that both map the same file. 
[`mmap(2)`](http://linux.die.net/man/2/mmap),
[`msync(2)`](http://linux.die.net/man/2/msync).
## Operating on virtual memory
- `__sync_fetch_and_add`, `__sync_add_and_fetch`,
`__sync_val_compare_and_swap`, and friends provide a full barrier so
“no memory operand will be moved across the operation, either
forward or backward.” These operations are the basis for most (all?)
lock-free algorithms.  [GCC Atomic
Builtins](http://gcc.gnu.org/onlinedocs/gcc-4.1.2/gcc/Atomic-Builtins.html).
Something I should add to my repertoire?  Race condition?  Let me know
at <r@rcrowley.org> or [@rcrowley](http://twitter.com/rcrowley) and Ill
fix it.