OCFS2: Racy rename(2)
On UNIX systems, the rename() system call is atomic. That is, once written to a filesystem, a file can have its name changed within that filesystem (including from one directory to another) as a single uninterruptible operation that will either succeed or fail but not result in a half-complete operation.
OCFS2 is a cluster filesystem that is now part of the mainline Linux kernel. It’s used in certain environments where multiple computers are plugged into the same storage device, typically a SAN. It was originally developed by Oracle, but is small in terms of kernel modifications and general enough in scope to be used by other people, so the kernel folk (including Linus) merged it into the tree a while ago.
And guess what? It’s rename() operation has a race condition. It’s still atomic, mind you, but it suffers from random failures.
The Race Condition
If two processes on different nodes of the cluster attempt to rename different files to the same filename, one of them may fail with EACCES.
This manifests itself as spontaneous permissions errors.
The Cause
/usr/src/linux-2.6.21.1/fs/ocfs2/namei.c +1213
/* In case we need to overwrite an existing file, we blow it
* away first */
if (new_de) {
/* VFS didn't think there existed an inode here, but
* someone else in the cluster must have raced our
* rename to create one. Today we error cleanly, in
* the future we should consider calling iget to build
* a new struct inode for this entry. */
if (!new_inode) {
status = -EACCES;
mlog(0, "We found an inode for name %.*s but VFS "
"didn't give us one.\n", new_dentry->d_name.len,
new_dentry->d_name.name);
goto bail;
}
I’m not sure how this can happen, but I’m pretty sure this is the cause. OCFS2 has a master renaming lock, and it seems there’s an issue where the rename operation can change shape between figuring out what to do and acquiring the lock.
The Fix
I have no idea.
Ask the OCFS2 guys.
The Workaround
Retry the rename() call if it returns EACCES up to some limit. There’s no real way to distinguish between a racing rename() and a genuine permissions error, other than the rename() will eventually succeed if retried.
And yes, this was fun to figure out.
Comments
Trackback URL:
too much time on their hands may fix the issue
spenak…bleh!
you may want to implement a captcha or something for future posts… yikes.
New comprehensive news aggregator. http://rssnewsdigest.com
wsdefgbvcbhgmkj,hgfffffffffffffffffffgx awsfdvnbggjhhhjkjossssssssroe0f7fgnfgvcdzsk
wsdefgbvcbhgmkj,hgfffffffffffffffffffgx awsfdvnbggjhhhjkjossssssssroe0f7fgnfgvcdzsk