In this lecture, we mainly talk about files, as this is the easiest example of I/O to manipulable. However, note that the content of the first 3 sections apply to I/O other than files (eg sockets).
A file is a series of contiguous bytes stored in a medium (for example, a disk) under a name (the “name of the file”).
We distinguish several types of the files:
10
while on Windows, ASCII
code character 10
followed by a character of ASCII code
13
);On Unix, the commands hexdump -C filename
,
bless filename
or xxd filename
show the exact
content of a file. Use them to
compare the contents of helloWorldUnix.c
and
helloWorldWindows.c
see that the file
default_names_fichierIssuDuTP10DuModuleCSC4103.txt
is not
quite a text file (and, see also how are the accented characters stored
in a file)
When you “open” a file, the operating system provides a notion of current position (sometimes called offset in the rest of this course) for reading or writing.
This current position determines which byte in the file will be read/written during the next I/O operation.
This offset advances each time a read or write operation is performed.
The operating system provides the user with primitives to explicitly change this position (without reading or writing bytes).
The “end of a file” corresponds to the location behind the last byte of the file. When a program reaches the end of file, it cannot read bytes anymore. On the other hand, the program can write bytes (depending on the mode in which the file was opened).
There are 3 ways to access a file:
The Linux system and the C library provide sequential and direct access modes. For an indexed sequential access mode, other libraries are required (Unix NDBM, GDBM, Oracle Berkeley DB, …).
Buffered I/O
\(\rightarrow\) a buffered I/O \(\neq\) an operation on the disk
fopen
, fread
, fscanf
,
fwrite
, fprintf
, etc.FILE*
Unbuffered I/O
open
, read
, write
,
etc.int
\(\dag\) To be exact, an “unbuffered” I/O generates a system call. The OS can then decide to cache the data or no.
int open(const char *path, int flags, mode_t mode)
returns f_id
flags
can take one of the following values:
O_RDONLY
: read onlyO_WRONLY
: write onlyO_RDWR
: read and writeAdditional flags:
O_APPEND
: append data (write at the end of the
file)O_TRUNC
: truncate (empty) the file when opening itO_CREAT
: creation if the file does not exist. The
permissions are \((mode\;\&\;\sim
umask)\)O_SYNC
: open file in synchronous write modeO_NONBLOCK
(ot O_NDELAY
):
open
and subsequent operations performed on the descriptor
will be non-blocking.int close(int desc)
About the O_SYNC
option in open
:
To improve performance, by default, during a write operation, the operating system does not physically write the bytes on disk (they are stored in a kernel cache, waiting to be writen to disk)
Therefore, in the event of a sudden stop of the machine (example: power outage):
Solutions to synchronize file data in memory with the disc:
O_SYNC
option when opening the file;int fsync(int fd)
primitiveNote that we can also create a file using the creat
primitive:
int creat(const char *path, mode_t mode)
: return value
= f_id
open
:open(path, O_WRONLY|O_CREAT|O_TRUNC, mode)
.ssize_t read(int fd, void *buf, size_t count)
returns the number of bytes successfully read
When read
returns, the buf
zone
contains the read data;
In the case of a file, the number of bytes read may not be be
equal to count
:
In the case where the read
function is used on a
descriptor other than a file (e.g. a pipe, or a socket), the fact that
the number of bytes read may not equal count
may have other
meanings:
ssize_t write(int fd, const void *buf, size_t count)
return the number of bytes written
In the case of a file, the return value (without error) of the write operation means that:
O_SYNC
was
specify at file open;O_SYNC
was
specified.In the case of a file, a number of bytes written that is
different from count
means an error (e.g. No space left on
device)
Writing to disk is atomic: if two processes \(P_1\) and \(P_2\) simultaneously write to the same file in the same location, when the two processes have finished their writing, we will find:
Note that when the file is opened with the option
O_APPEND
, if \(P_1\) and
\(P_2\) write simultaneously (at the
end of the file, because of O_APPEND
), when the two
processes will have finished their writing, we will find at the end of
file:
No writing is therefore lost! Attention, this concurrent write at the end of file is not equivalent to two processes simultaneously performing the following operations:
(fd,0,SEEK_END); /* move the cursor to the end of file */
lseek(fd,data,taille); write
In fact, in the latter case, one of the written data may by overwritten by the other.
The copy.c
file on the next page illustrates the use of
open
, read
, write
and
close
.
/************/
/* copy.c */
/************/
#include <stdlib.h>
#include <unistd.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <string.h>
#include <stdio.h>
#define USAGE "USAGE: copy src dest\n"
#define WRITE_ERROR "write error (no space left on device ?)\n"
int source, dest;
int buf;
int nb_read, nb_written;
int main(int argc, char *argv[]) {
if (argc != 3) {
(STDERR_FILENO, USAGE, strlen(USAGE));
writereturn EXIT_FAILURE;
}
= open(argv[1], O_RDONLY);
source if (source < 0) {
(argv[1]);
perrorreturn EXIT_FAILURE;
}
= open(argv[2],
dest |O_CREAT|O_TRUNC,
O_WRONLY|S_IRWXG|S_IRWXO);
S_IRWXUif (dest < 0) {
(argv[2]);
perrorreturn EXIT_FAILURE;
}
while ((nb_read = read(source, (void*)&buf, sizeof(buf))) > 0) {
= write(dest, (void*)&buf, nb_read);
nb_written if (nb_written <= 0) {
if (nb_written == 0) {
(STDERR_FILENO, WRITE_ERROR, strlen(WRITE_ERROR));
write}
else {
("write");
perror}
return EXIT_FAILURE;
}
}
if (nb_read < 0) {
("read");
perrorreturn EXIT_FAILURE;
}
if (close(source) < 0) {
(argv[1]);
perrorreturn EXIT_FAILURE;
}
if (close(dest) < 0) {
(argv[2]);
perrorreturn EXIT_FAILURE;
}
return EXIT_SUCCESS;
}
This operation of copying the contents of one file to another
descriptor is an operation frequently performed in web servers. Indeed,
these servers must in particular send the content of files to client who
have requested them. This is why the linux system offers the
sendfile
primitive
(ssize_t sendfile (int out_fd, int in_fd, off_t * offset, size_t count)
).
It reads count
bytes of in_fd
and write them
to out_fd
(which must match an socket).
sendfile
is more more efficient than the combination
read
/ write
.
The fallocate
function is the Linux specific version of
the portable function posix_fallocate
.
int dup(int old_fd)
new_fd
old_fd
int dup2(int old_fd, int new_fd)
new_fd
to become a synonym of
the old_fd
descriptor. If the descriptor
new_fd
is not available, the system first closes
close(new_fd)
struct flock {
short l_type;
short l_whence;
off_t l_start;
off_t l_len;
};
int fcntl(int fd, F_SETLK, struct flock*lock);
Locks are attached to an inode. So locking a file affects all file descriptors (and therefore all open files) corresponding to this inode
A lock is the property of a process: this process is the only one authorized to modify or remove it
Locks have a scope of \([integer1: integer2]\) or \([integer: \infty]\)
Locks have a type:
F_RDLCK
: allows concurrent read accessF_WRLCK
: exclusive access
The exclusive-lock.c
file illustrates exclusive file
locking:
/***********/
/* exclusive_lock.c */
/***********/
#include <stdlib.h>
#include <unistd.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <stdio.h>
int main(){
int fd;
struct flock lock;
= open("/tmp/ficTest",O_RDWR|O_CREAT, S_IRWXU|S_IRWXG|S_IRWXO);
fd if (fd < 0) {
("open");
perror(EXIT_FAILURE);
exit}
/* Exclusive lock on the 15th byte */
.l_type = F_WRLCK;
lock.l_whence = SEEK_SET;
lock.l_start = 15;
lock.l_len = 1;
lock
/* Because of the F_SETLKW parameter, we get stuck on the fcntl if */
/* the lock cannot be acquired */
("attempt to acquire an exclusive lock by process %d...\n",
printf());
getpidif (fcntl(fd, F_SETLKW, &lock) < 0){
("Acquiring lock");
perror(EXIT_FAILURE);
exit}
("... Exclusive lock acquired by process %d\n", getpid());
printf
/* Here we could do the processing that needed to be protected */
/* by the lock */
(10);
sleep
/* Release the lock */
("Releasing the lock by process %d...\n", getpid());
printf.l_type = F_UNLCK;
lock.l_whence = SEEK_SET;
lock.l_start = 15;
lock.l_len = 1;
lockif (fcntl(fd, F_SETLK, &lock) < 0){
("Releasing lock");
perror(EXIT_FAILURE);
exit}
("...OK\n");
printf
return EXIT_SUCCESS;
}
The shared-lock.c
file illustrates the shared
locking:
/*****************/
/* shared_lock.c */
/*****************/
#include <stdlib.h>
#include <unistd.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <stdio.h>
int main(){
int fd;
struct flock lock;
= open("/tmp/ficTest",O_RDWR|O_CREAT, S_IRWXU|S_IRWXG|S_IRWXO);
fd if (fd < 0) {
("open");
perror(EXIT_FAILURE);
exit}
/* Shared lock on the 15th byte */
.l_type = F_RDLCK;
lock.l_whence = SEEK_SET;
lock.l_start = 15;
lock.l_len = 1;
lock
/* Because of the F_SETLKW parameter, we get stuck on the fcntl if */
/* the lock cannot be acquired */
("attempt to acquire a shared lock by process %d...\n",
printf());
getpidif (fcntl(fd, F_SETLKW, &lock) < 0){
("Acquiring lock");
perror(EXIT_FAILURE);
exit}
("... shared lock acquired by process %d\n", getpid());
printf
/* Here we could do the processing that needed to be protected */
/* by the lock */
(10);
sleep
/* Release the lock */
("Releasing the lock by process %d...\n", getpid());
printf.l_type = F_UNLCK;
lock.l_whence = SEEK_SET;
lock.l_start = 15;
lock.l_len = 1;
lockif (fcntl(fd, F_SETLK, &lock) < 0){
("Releasing lock");
perror(EXIT_FAILURE);
exit}
("...OK\n");
printf
return EXIT_SUCCESS;
}
If we run exclusive-lock
first, running
exclusive-lock
or shared-lock
wait before
locking.
If we run shared-lock
first, another
shared-lock
can set the (shared) lock. On the other hand, a
exclusive-lock
must wait to be able to lock.
Note that exclusive_lock may suffer starvation:
To prevent this starvation, we must add a mutual exclusion.
off_t lseek(int fd, off_t unOffset, int origine)
return the new offset
allows to handle the offset of the file
Warning ! Race condition if several threads manipulate the file
Solutions:
pread
or pwrite
instead of
lseek + read
or lseek + write
int posix_fadvise(int fd, off_t offset, off_t len, int advice)
POSIX_FADV_SEQUENTIAL
,
POSIX_FADV_RANDOM
, POSIX_FADV_WILLNEED
Since January 2011, we know that this function is used in Firefox to
reduce startup time by 40 % to 50 % by loading more efficiently GUI
libraries xul.dll
and mozjs.dll
(more
information here
<https://bugzilla.mozilla.org/show_bug.cgi?id=627591>).
int aio_read(struct aiocb *aiocbp);
int aio_write(struct aiocb *aiocbp);
int aio_suspend(const struct aiocb * const aiocb_list[],
int nitems,
const struct timespec *timeout);
int aio_error(const struct aiocb *aiocbp);
For more information on asynchronous I/O, refer to the documentation
(man 7 aio
).
The current implementation of AIO Posix is provided in
user-land by libc and can cause scalability issues. Another
solution is to use the Asynchronous I/O interface provided by the Linux
kernel (see the system calls io_submit
,
io_setup
, etc.), or the libaio
library which
provides an overlay to Linux system calls.
void *mmap(void *addr,
size_t length,
int prot,
int flags,
int fd,
off_t offset);
int munmap(void *addr, size_t length);
To ensure that the memory accesses have been passed on to the disk,
you can use the msync
function.