Operating systems

Portail informatique

File system

The goal of this lab is to implement the equivalent of the function mmap in xv6.

The system call (beginner level)

Start by getting the code from xv6:

  • git clone https://gitlab.inf.telecom-sudparis.eu/csc4508/xv6-tsp.git if you start with a new copy
  • or git checkout master if you already have a local copy of the repository

Then create a local branch to work:

git checkout -b my-mmap

In xv6 add the following system calls:
  • void* mmap(void* addr, int size, int prot, int flag, int fd, int offset). This function implements exactly mmap, but during the first part of the lab, you will ignore the prot and flag parameters.
  • and void munmap(void* addr, int size)

These functions must be added in a new file named mmap.c.

For the moment, the mmap function must display its arguments on the terminal before returning the pointer -1, which means that an error has occurred, and the munmap function must display its arguments on the terminal. To simplify error handling, create a mmap.h file in which you will add the following macro:

#define MAP_FAILED ((void *) -1)

Add a program to test the mmap function. Your program should:
  • open the file README in read/write mode (remember to include fcntl.h to have access to the O_RDONLY flags, or O_RDWR),
  • calculate the file size with the fstat function (remember to include stat.h to have the definition of the stat structure),
  • map the whole file to the address 0x10000000,
  • display the content of the memory mapped on the terminal,
  • unmap the file.

For each system call, check the return value and stop the program in the case of an error. For now, the program should terminate with an error since the mmap function returns an error.

To access the solution, you must first add the repository which contains the solution with the following command sequence:

git remote add mmap-soluce https://gitlab.inf.telecom-sudparis.eu/csc4508/xv6-soluces/xv6-soluce-mmap.git git fetch mmap-soluce

Then, you can access the solution with the following command:

git checkout -b mmap-soluce-exo1 -t mmap-soluce/exo1

You can show the difference to the original code with git diff master.

Initial implementation (advanced level)

In this exercise, you do a simple first implementation of the mmap function.

To start with, align addr on a page border, by using PGROUNDDOWN.

In mmap, check that fd matches a file opened by the current process. The list of open files by a process is stored in the ofile array of struct proc. To check that the file is open, your code can be inspired by argfd found in sysfile.c.

In mmap, display the file size on the terminal. Check that is it consistent.

Check that the offset and size arguments of mmap are consistent. If there is an error, display an error message and return -1.

Inspired by your achievement of shared memory in xv6 carried out in TP9, allocate the physical pages allowing to map the file in memory, and associate them with the address addr (parameter of mmap). At this stage, you are not asked to manage errors generated by kalloc and mappages.
To associate a physical page with a virtual address, you must use the mappages function which is defined in vm.c. As, mappages is defined as static, it is unusable from mmap.c. To resolve this issue, remove the keyword static of the mappages declaration and add the signature of mappages in defs.h.

You are now asked to handle errors from kalloc or mappages. In the event of an error, you must:
  • delete the entries in the page table that have already been associated,
  • free the physical pages already allocated.
To help you implement this error handling, add the following function to vm.c, and add its definition in defs.h:
char* unmappage(pde_t *pgdir, const void* vaddr) { pte_t* pte = walkpgdir(myproc()->pgdir, (char*)vaddr, 0); if(!pte || !(*pte & PTE_P)) panic("unmapped page"); char* res = P2V(PTE_ADDR(*pte)); *pte = 0; return res; }

This function removes the association corresponding to the virtual address vaddr of the pgdir page table. This function also returns the virtual address in the kernel at which is associated the physical page which has just been dissociated from vaddr (this address is the one returned by kalloc).

Use the readi function (defined in fs.c) to copy the file content into the memory area you have just allocated. Also modify the return of mmap to return addr.

Verify that your test program correctly displays the contents of the file. In order to speed up the tests you are going to perform, feel free to display only the first 86 characters of the file.

To access the solution, if it is not already done, add the repository which contains the solution with the following command sequence:

git remote add mmap-soluce https://gitlab.inf.telecom-sudparis.eu/csc4508/xv6-soluces/xv6-soluce-mmap.git git fetch mmap-soluce

Then, you can access the solution with the following command:

git checkout -b mmap-soluce-exo2 -t mmap-soluce/exo2

You can show the difference with the original code with git diff master (or git diff mmap-soluce-exo1 to see the difference with the previous exercise).

Software segment (adventurer level)

We now want to lazily map the file. Technically, in mmap, instead of mapping the file, you only need to save the association between the virtual address space (addr, size) and the file (fd, offset, size) in a structure. Then, as no page is physically associated with this structure, access to the area will generate a page fault (trap 14). You must therefore modify how to handle these faults in xv6 to map the requested page if needed.

In the rest of the lab, we define a segment (struct segment) as a virtual memory area associated with a file. Technically, a segment associates a virtual address addr to (i) an inode ip, (ii) an offset offset, (iii) a size size, (iv) a protection and (v) a flag. Define a structure describing the segments of a process in proc.h, and add an array of segments to a process, limiting the size from table to 16 (consider using a macro).

In mmap, without modifying the functional behavior you currently have, record the segment in the process segment table.

Modify the way to map a file so as to give access in neither read (PTE_U) nor write ( PTE_W ) mode to virtual pages. In case of page fault (T_PGFLT), simply add these permissions for the moment. To do this, you need to modify the code of trap.
Note that the faulty address accessed by a process is recorded in the cr2 registers at the time of the fault. You can find this address with the function int rcr2().
You will probably need to use the walkpgdir function to find the page associated with the faulty address. Now this function is defined static in vm.c. Remove the static keyword and declare the signature of the function in defs.h.

Modify your code so that pages are lazily loaded when accessed.

What do you think can happen if the process closes the file associated with the segment? Why, in mmap, instead of directly storing the inode in the segment, it is necessary to actually store the result of the call to idup(ip) in the segment (where ip is the pointer to the inode)?
The process might close the file before accessing the memory segment. At this time, lazy reads would be performed on an closed inode, which would not be functional. By calling idup, we increment the opening counter of the inode, which assures us that the inode will remain in memory even if the process closes it.

Add definitions for the protections PROT_READ and PROT_WRITE in mmap.h, and treat these protections properly. At the time of the fault, if the access has been made in read or write, the processor sets an error code (stored in tf->err). If bit 1 is active (by numbering from 0, i.e. if tf->err & 2 is not equal to 0), the fault is caused by a write access, otherwise, the fault is caused by a read access. This way, before loading the page, you can know if the process has sufficient permissions.

You are now asked to manage concurrent accesses, that is to say the case where two cores are trying to lazily load a page concurrently.

To access the solution, if it is not already done, add the repository which contains the solution with the following command sequence:

git remote add mmap-soluce https://gitlab.inf.telecom-sudparis.eu/csc4508/xv6-soluces/xv6-soluce-mmap.git git fetch mmap-soluce

Then, you can access the solution with the following command:

git checkout -b mmap-soluce-exo3 -t mmap-soluce/exo3

You can show the difference with the original code with git diff master (or git diff mmap-soluce-exo2 to see the difference with the previous exercise).

Duplication of processes (guru level)

During a fork(), we now want to be able to inherit segments from the parent process. This exercise is optional.

In order to be able to manage the sharing of segments between a parent and its child, you have to restructure the code. To do this, instead of directly storing the segments in the processes, we define a global array of segments with a maximum size of 1024. Then, in a process, instead of directly storing segments, we store pointers to segments in this global array segments .

Modify the code accordingly. Consider that, in order to allocate a segment, it is necessary to find a free entry. For this, you can consider that if the segment address is 0, the entry is free.

Modify the fork() function so that a parent and child share their segments. If you had not yet set locks, think that a parent and child can now simultaneously try to lazily map a page of a file.

Add a MAP_SHARED flag to mmap.h. When a segment is noted as MAP_SHARED, the segments are shared between a parent and its child as it is currently the case. Otherwise, the child must duplicate the parent's segment. Modify your program accordingly.

Deleting a mapping (guru level)

The goal of this exercise is to be able to delete mappings. This exercise is optional.

Implement munmap. Think you can actually delete a segment only once no more process uses it. Also remember that you should write the pages that have changed to disk. For this, you must add the definition of the constant PTE_D = 0x40 in mmu.h, and know that this bit is activated in the page table as soon as the processor modifies the content of the page.

Modify exit() so as to automatically delete the mappings of a process before it terminates.

Segmentation (guru level)

The purpose of this exercise is to complete our implementation. This exercise is optional.

Add a MAP_ANONYMOUS flag to mmap.h. When a segment is marked as anonymous, the arguments fd and offset are ignored, and memory should be initially filled with zeros.

Modify the code of xv6 so that your segments are also used to build the initial image of a process. A process must have 5 initial segments (see the end of the proc.h file):
  • A text segment containing the program code,
  • A data segment containing the initialized data of the program,
  • A bss segment containing the data not initialized, i.e. initialized with 0s, of the program,
  • A stack segment of fixed size,
  • A heap segment that can grow.

All these segments must be defined as not being of the type MAP_SHARED so that they are duplicated automatically during a fork().

To build these segments, you need to analyze and modify the exec function (file exec.c), especially starting from line 42.

Also modify the fork() function to stop calling copyuvm which takes care of copying the memory of a process (the memory of the segments which you have just defined). Instead, you should always call setupkvm() which allows to create a new page table in which the kernel is mapped. If your code is correct, depending on the MAP_SHARED flag, the parent's segments will all be copied to the child, are shared with the child.

Modify mmap so that, if addr is equal to 0, mmap automatically finds a free area in the process addressing space.

Copy-on-write fork, (master guru level)

We now want to implement a fast fork(). The principle of this fast fork() is not to copy the parent's memory at fork() time, but only when either the parent or the child modifies it(see Copy-on-write). Do not hesitate to ask questions to your teachers to find out how to design your solution. This exercise is optional.

We only provide a solution to the first three exercises. To access the solution, if it is not already done, add the repository which contains the solution with the following command sequence:

git remote add mmap-soluce https://gitlab.inf.telecom-sudparis.eu/csc4508/xv6-soluces/xv6-soluce-mmap.git git fetch mmap-soluce

Then, you can access the solution with the following command:

git checkout -b mmap-soluce-exo3 -t mmap-soluce/exo3