Linux: Atomic Operations on Files

Atomicity is important for system calls that work on shared state. When working with files, we sometimes need to execute atomically to avoid race conditions. A race condition, is a behaviour when working with shared state, where we get different results depending on the order of execution of multiple threads/process. In Linux, few systems calls are executed atomically. That is, the kernel guarantees that all the steps in the system call are executed before the scheduler switches to another thread/process.

Now, we will look at working with files atomically to avoid race conditions to avoid unintended behaviour.

Appending to a File

Imagine maintaining a log to your application. For every write, you will want to append to the file. One way to do it would be with the following code,

Although this code looks fine, there is a race condition in the above method. In the presence of multiple threads/process, you might end up overwriting each other’s data. If the first thread/process that is executing the above code gets interrupted after the call to lseek() and another thread/process executes the same code with the second thread/process writing to the end of file. Then, when the first thread/process is scheduled again, its offset is stale and will end up overwriting the second thread/process’s data after calling write().

The solution would be to use the O_APPEND status flag when opening the file and use the file descriptor returned, to write to the file. The kernel guarantees that the write is atomic and will be always be appended to the end of the file. The correct code to write to the end of the file is as follows,

Creating Exclusive Files

What if we want a thread/process to be the sole creator of a file? Maybe, you need to tightly control who creates a file versus who uses them. The code to do it would look something like,

The above code runs into the same kind of race condition we discussed before. Imagine a situation when the first thread gets interrupted between the two open calls (line 3 & 9) and another thread/process comes along and creates the file. And when the first thread/process gets scheduled again, it will carry on thinking that it is the creator of the file.

The solution here would be to use the flag O_EXCL along with the O_CREAT flag. This way, we specify that if the file already exists, open() should fail with the error EEXIST. By this, we make sure that we are indeed the creator of the file and carry on. The new code will now look like,

Duplicating File Descriptors

Sometimes, you might want to duplicate a file descriptor. For example, the terminal might want to duplicate stderr to stdout for a command like some_script 2>&1. You might use the dup() method to do something like,

// Always check for error
close(2);
int dup_fd = dup(1);

In the above, you are trying to take advantage of the fact the file descriptor returned by Linux guarantees to return you the smallest non-negative number available. This is fine when operating on a single thread/process but again exhibits race conditions when executed by multiple threads/process. When a thread/process blocks after close(), and if another thread uses dup() or open(), the descriptor is going to be taken and your assumptions becomes invalid. The right way here would be to use dup2(). The dup2() method call takes as the old fd and the new fd as it parameters and atomically duplicate the old fd to the new fd. The new fd is automatically closed for you. The new code will now be,

int dup_fd = dup2(1, 2);

Reading/Writing Contiguous Blocks

Imagine wanting to read contiguous blocks of data from a file into different typed buffers. You can write code like the below to perform this read,

Here, again we run into problems where the thread/process can be interrupted between one of the read() calls and another thread/process comes along and change the read/write offset of the file. And, when the first thread/process gets scheduled again, the data that is read will not be the data that is expected. The right way to do this would be to use readv() method call that guarantees that the read is going to happen atomically. The code would now be,

The above problem will also occur when writing to a file. The code to do contiguous writes would be,

Close-On-Exec

Sometimes we do not want to pass file descriptors to unsafe programs. As, file descriptors are passed to children during fork() -> exec(), we might need a way to not pass the descriptors during an exec(). The non-atomic way is to use the fcntl() F_GETFD and F_SETFD method to mark close fds on exec. Race conditions can occur when multiple thread/process are in play. The atomic way to do this is to use the O_CLOEXEC flag during open as,

int fd = open(filename, O_RDWR | O_CLOEXEC);

That’s it. For more details on Linux’s workings, refer to this wonderful book from Michael Kerrisk.

For any discussion, tweet here.