Input/output

François Trahay


In this lecture, we mainly talk about files, as this is the easiest example of I/O to manipulable. However, note that the content of the first 3 sections apply to I/O other than files (eg sockets).

Reminder on files:

On Unix, the commands hexdump -C filename, bless filename or xxd filename show the exact content of a file. Use them to

  1. compare the contents of helloWorldUnix.c and helloWorldWindows.c

  2. see that the file default_names_fichierIssuDuTP10DuModuleCSC4103.txt is not quite a text file (and, see also how are the accented characters stored in a file)

The Linux system and the C library provide sequential and direct access modes. For an indexed sequential access mode, other libraries are required (Unix NDBM, GDBM, Oracle Berkeley DB, …).


Buffered / non-buffered IO

\(\dag\) To be exact, an “unbuffered” I/O generates a system call. The OS can then decide to cache the data or no.


I/O primitives


File open / close

About the O_SYNC option in open:

Note that we can also create a file using the creat primitive:


Reading on a file descriptor

In the case where the read function is used on a descriptor other than a file (e.g. a pipe, or a socket), the fact that the number of bytes read may not equal count may have other meanings:


Writing on a file descriptor

Writing to disk is atomic: if two processes \(P_1\) and \(P_2\) simultaneously write to the same file in the same location, when the two processes have finished their writing, we will find:

Note that when the file is opened with the option O_APPEND, if \(P_1\) and \(P_2\) write simultaneously (at the end of the file, because of O_APPEND), when the two processes will have finished their writing, we will find at the end of file:

No writing is therefore lost! Attention, this concurrent write at the end of file is not equivalent to two processes simultaneously performing the following operations:

lseek(fd,0,SEEK_END); /* move the cursor to the end of file */
write(fd,data,taille);

In fact, in the latter case, one of the written data may by overwritten by the other.

The copy.c file on the next page illustrates the use of open, read, write and close.

/************/
/* copy.c */
/************/
#include <stdlib.h>
#include <unistd.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <string.h>
#include <stdio.h>

#define USAGE "USAGE: copy src dest\n"
#define WRITE_ERROR "write error (no space left on device ?)\n"

int source, dest;
int buf;
int nb_read, nb_written;

int main(int argc, char *argv[]) {
  if (argc != 3) {
    write(STDERR_FILENO, USAGE, strlen(USAGE));
    return EXIT_FAILURE;
  }
  source = open(argv[1], O_RDONLY);
  if (source < 0) {
    perror(argv[1]);
    return EXIT_FAILURE;
  }
  dest = open(argv[2],
              O_WRONLY|O_CREAT|O_TRUNC,
              S_IRWXU|S_IRWXG|S_IRWXO);
  if (dest < 0) {
    perror(argv[2]);
    return EXIT_FAILURE;
  }
  while ((nb_read = read(source, (void*)&buf, sizeof(buf))) > 0) {
    nb_written = write(dest, (void*)&buf, nb_read);
    if (nb_written <= 0) {
      if (nb_written == 0) {
        write(STDERR_FILENO, WRITE_ERROR, strlen(WRITE_ERROR));
      }
      else {
        perror("write");
      }
      return EXIT_FAILURE;
    }
  }
  if (nb_read < 0) {
    perror("read");
    return EXIT_FAILURE;
  }
  if (close(source) < 0) {
    perror(argv[1]);
    return EXIT_FAILURE;
  }
  if (close(dest) < 0) {
    perror(argv[2]);
    return EXIT_FAILURE;
  }
  return EXIT_SUCCESS;
}

This operation of copying the contents of one file to another descriptor is an operation frequently performed in web servers. Indeed, these servers must in particular send the content of files to client who have requested them. This is why the linux system offers the sendfile primitive (ssize_t sendfile (int out_fd, int in_fd, off_t * offset, size_t count)). It reads count bytes of in_fd and write them to out_fd (which must match an socket). sendfile is more more efficient than the combination read / write.

The fallocate function is the Linux specific version of the portable function posix_fallocate.


File descriptor duplication


I/O and concurrence


Locking a file

struct flock {
  short l_type;
  short l_whence;
  off_t l_start;
  off_t l_len;
};

int fcntl(int fd, F_SETLK, struct flock*lock);

The exclusive-lock.c file illustrates exclusive file locking:

/***********/
/* exclusive_lock.c */
/***********/
#include <stdlib.h>
#include <unistd.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <stdio.h>

int main(){
  int fd;
  struct flock lock;

  fd = open("/tmp/ficTest",O_RDWR|O_CREAT, S_IRWXU|S_IRWXG|S_IRWXO);
  if (fd < 0) {
    perror("open");
    exit(EXIT_FAILURE);
  }

  /* Exclusive lock on the 15th byte */
  lock.l_type = F_WRLCK;
  lock.l_whence = SEEK_SET;
  lock.l_start = 15;
  lock.l_len = 1;

  /* Because of the F_SETLKW parameter, we get stuck on the fcntl if */
  /* the lock cannot be acquired                                   */
  printf("attempt to acquire an exclusive lock by process %d...\n",
	 getpid());
  if (fcntl(fd, F_SETLKW, &lock) < 0){
    perror("Acquiring lock");
    exit(EXIT_FAILURE);
  }
  printf("... Exclusive lock acquired by process %d\n", getpid());

  /* Here we could do the processing that needed to be protected */
  /* by the lock                                                 */
  sleep(10);

  /* Release the lock */
  printf("Releasing the lock by process %d...\n", getpid());
  lock.l_type = F_UNLCK;
  lock.l_whence = SEEK_SET;
  lock.l_start = 15;
  lock.l_len = 1;
  if (fcntl(fd, F_SETLK, &lock) < 0){
    perror("Releasing lock");
    exit(EXIT_FAILURE);
  }
  printf("...OK\n");

  return EXIT_SUCCESS;
}

The shared-lock.c file illustrates the shared locking:

/*****************/
/* shared_lock.c */
/*****************/
#include <stdlib.h>
#include <unistd.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <stdio.h>

int main(){
  int fd;
  struct flock lock;

  fd = open("/tmp/ficTest",O_RDWR|O_CREAT, S_IRWXU|S_IRWXG|S_IRWXO);
  if (fd < 0) {
    perror("open");
    exit(EXIT_FAILURE);
  }

  /* Shared lock on the 15th byte */
  lock.l_type = F_RDLCK;
  lock.l_whence = SEEK_SET;
  lock.l_start = 15;
  lock.l_len = 1;

  /* Because of the F_SETLKW parameter, we get stuck on the fcntl if */
  /* the lock cannot be acquired                                   */
  printf("attempt to acquire a shared lock by process %d...\n",
	 getpid());
  if (fcntl(fd, F_SETLKW, &lock) < 0){
    perror("Acquiring lock");
    exit(EXIT_FAILURE);
  }
  printf("... shared lock acquired by process %d\n", getpid());

  /* Here we could do the processing that needed to be protected */
  /* by the lock                                                 */
  sleep(10);

  /* Release the lock */
  printf("Releasing the lock by process %d...\n", getpid());
  lock.l_type = F_UNLCK;
  lock.l_whence = SEEK_SET;
  lock.l_start = 15;
  lock.l_len = 1;
  if (fcntl(fd, F_SETLK, &lock) < 0){
    perror("Releasing lock");
    exit(EXIT_FAILURE);
  }
  printf("...OK\n");

  return EXIT_SUCCESS;
}

Offset manipulation


Improving the I / O performance


Giving advices to the kernel

Since January 2011, we know that this function is used in Firefox to reduce startup time by 40 % to 50 % by loading more efficiently GUI libraries xul.dll and mozjs.dll (more information here <https://bugzilla.mozilla.org/show_bug.cgi?id=627591>).


Asynchronous I/O

int aio_read(struct aiocb *aiocbp);
int aio_write(struct aiocb *aiocbp);
int aio_suspend(const struct aiocb * const aiocb_list[],
                int nitems,
                const struct timespec *timeout);
int aio_error(const struct aiocb *aiocbp);

For more information on asynchronous I/O, refer to the documentation (man 7 aio).

The current implementation of AIO Posix is provided in user-land by libc and can cause scalability issues. Another solution is to use the Asynchronous I/O interface provided by the Linux kernel (see the system calls io_submit, io_setup, etc.), or the libaio library which provides an overlay to Linux system calls.


mmap

void *mmap(void *addr, 
           size_t length,
           int prot,
           int flags,
           int fd,
           off_t offset);
int munmap(void *addr, size_t length);

To ensure that the memory accesses have been passed on to the disk, you can use the msync function.