A process needs to be present in main memory to run
Central memory divided into two parts
Memory management concerns the process space
Memory capacities are increasing, but so are the requirements \(\rightarrow\) Need for multiple memory levels
Principle of inclusion to limit updates between different levels
Regarding the principle of inclusion, in an Intel architecture, the L1 cache (Level 1) is included in L2 cache (Level 2), which is itself included in RAM, which is included in the swap (disk).
Here are the typical access times to data located in the different types of memory on a “classic” machine (Intel Core i5 Skylake processor) in 2017 :
The following table shows the cost difference between the types of memory (and the evolution over the years):
Year | 2008 | 2009 | 2010 | 2014 | 2019 | 2023 |
---|---|---|---|---|---|---|
Hard disk drive, 7200 tr/mn (in €/GiB) | 0,50 | 0,32 | 0,10 | 0,04 | 0.027 | 0.023 |
SSD disk (in €/GiB) | – | – | – | 0,50 | 0.17 | 0.09 |
USB key (in €/GiB) | – | – | 1,64 | 0.62 | 0.27 | 0.06 |
NVMe (in €/GiB) | – | – | – | – | 0.21 | 0.10 |
RAM (in €/GiB) | – | 37,00 | 21,85 | 8.75 | 7.23 | 2.75 |
The memory pages of a process can be
\(\rightarrow\) each process has a contiguous memory space to store its data
The paging mecanism
In Linux, page frames are 4KB in size (defined size by the constants
PAGE_SIZE
and PAGE_SHIFT
in the file
page.h
).
k
bits:
p
bitsd = (k - p)
bits
On 64-bit Intel (x86_64
) or ARM 64 bits architectures
(ARMv8), the addresses are stored on 64 bits
(i.e. size (void *)
is 8 bytes), but only 48 bits are
usable for virtual addresses.
On RISC-V
architectures, a system can choose from four
virtual address sizes, including 48 bits as other architectures, and
also 57 bits.
Some applications use large amounts of data (sometimes several GiB)
that must be placed in a large number of 4 KiB memory pages. In order to
limit the number of pages to handle, some architectures (especially
x86_64
and RISC-V
) allow the use of larger
memory pages (typically 2~MiB and 1~GiB, but also bigger) which are
called huge pages in the Linux kernel (sometimes
megapages and gigapages).
The correspondence between logical address and address physical is done with a page table that contains
x86_64
or RISC-V
, a page table =
4-levels tree
satp
register (cr3
on x86
architectures)n[0..3]
) + 1
offset, then translated using:uint64_t cur = %satp3; // cur = root table physical address
for(int i=0; i<3; i++)
= ((uint64_t*)cur)[n[i]]; // physical memory access, next entry
cur return cur + offset; // add the offset
Intel architectures have Translation Look-aside Buffers (TLB) with 32, 64, or even 256 entries. TLB are sometimes called address translation cache.
.text
,
.data
, etc.)open
mmap
) with the appropriate permissions/proc/<pid>/maps
$ cat /proc/self/maps
5572f3023000-5572f3025000 r--p 00000000 08:01 21495815 /bin/cat
5572f3025000-5572f302a000 r-xp 00002000 08:01 21495815 /bin/cat
5572f302e000-5572f302f000 rw-p 0000a000 08:01 21495815 /bin/cat
5572f4266000-5572f4287000 rw-p 00000000 00:00 0 [heap]
7f33305b4000-7f3330899000 r--p 00000000 08:01 22283564 /usr/lib/locale/locale-archive
7f3330899000-7f33308bb000 r--p 00000000 08:01 29885233 /lib/x86_64-linux-gnu/libc-2.28.so
7f33308bb000-7f3330a03000 r-xp 00022000 08:01 29885233 /lib/x86_64-linux-gnu/libc-2.28.so
[...]
7f3330ab9000-7f3330aba000 rw-p 00000000 00:00 0
7ffe4190f000-7ffe41930000 rw-p 00000000 00:00 0 [stack]
7ffe419ca000-7ffe419cd000 r--p 00000000 00:00 0 [vvar]
7ffe419cd000-7ffe419cf000 r-xp 00000000 00:00 0 [vdso]
void* malloc(size_t size)
size bytes
void* realloc(void* ptr, size_t size)
malloc
void* calloc(size_t nmemb, size_t size)
malloc
, but memory is initialized to 0void *aligned_alloc( size_t alignment, size_t size )
malloc
. The returned address is a multiple of
alignment
void free(void* ptr)
All these functions are implemented in the standard C library (which in some cases make system calls).
The malloc(3)
algorithm is very efficient. It is not
therefore generally not necessary to try to optimize it.
However:
calloc (3)
(it is more efficient than a
malloc(3)
followed by memset(3)
).mallopt
allows to fine tune the behavior
of malloc(3)
__malloc_hook
,
__realloc_hook
and __free_hook
. Be careful,
these mechanisms can lead to reentrancy problems.free
, it is strongly advised
to set the pointer to NULL
. This allows the program to
crash immediatly if, by mistake, we access this (now inexistant) buffer
again using this pointer.The following program illustrates how setting a pointer to
NULL
allows to crash immediatly and how using a debugger
allows to quickly find the origin of the error.
/**********************/
/* resetToNULL.c */
/**********************/
/* This program illustrates the value of assigning a variable to NULL
which contains a pointer to an unallocated buffer.
Indeed, it makes a segmentation fault, which makes it possible to
identify that we have executed an illegal operation. Using a
debugger allows to understand the problem.
*/
/* */
/* 1) Compile the program with the option -g */
/* cc -g -o resetToNULL resetToNULL.c */
/* 2) ./resetToNULL */
/* ==> Segmentation fault */
/* 3) ulimit -c unlimited */
/* 4) ./resetToNULL */
/* ==> Segmentation fault (core dumped) */
/* 5) ddd ./resetToNULL core */
#include <stdlib.h>
#include <assert.h>
void h(char *p){
*p = 'a';
}
void g(char *p){
(p);
h}
void f(char *p){
(p);
g}
int main(){
char *p = NULL;
(p);
f
= malloc(1);
p (p != NULL);
assert
(p);
f
(p);
free= NULL;
p
(p);
f
return EXIT_SUCCESS;
}
Memory alignment depends on the type of data
char
(1-byte), short
(2-bytes),
int
(4-bytes), …A data structure may be larger than its content
A data structure can be packed with
__attribute__((packed))
Memory alignment applies to variables as well as to members of data structures.
The following program illustrates how alignement affects the size of a data structure:
#include <stdio.h>
#include <stdlib.h>
#include <stddef.h>
struct plop {
int a;
char b;
int c;
};
struct plop_packed {
int a;
char b;
int c;
} __attribute__((packed));
int main(void) {
struct plop p1;
struct plop_packed p2;
("struct plop -- size: %lu bytes, address: %p\n",
printfsizeof(struct plop), &p1);
("\t.a -- size: %lu bytes, address: %p, offset: %lu\n",
printfsizeof(p1.a), &p1.a, offsetof(struct plop, a));
("\t.b -- size: %lu bytes, address: %p, offset: %lu\n",
printfsizeof(p1.b), &p1.b, offsetof(struct plop, b));
("\t.c -- size: %lu bytes, address: %p, offset: %lu\n",
printfsizeof(p1.c), &p1.c, offsetof(struct plop, c));
("\n");
printf
("struct plop_packed -- size: %lu bytes, address: %p\n",
printfsizeof(struct plop_packed), &p2);
("\t.a -- size: %lu bytes, address: %p, offset: %lu\n",
printfsizeof(p2.a), &p2.a, offsetof(struct plop_packed, a));
("\t.b -- size: %lu bytes, address: %p, offset: %lu\n",
printfsizeof(p2.b), &p2.b, offsetof(struct plop_packed, b));
("\t.c -- size: %lu bytes, address: %p, offset: %lu\n",
printfsizeof(p2.c), &p2.c, offsetof(struct plop_packed, c));
("\n");
printfreturn 0;
}
$ ./memory_alignment
struct plop -- size: 12 bytes, address: 0x7ffc05ad8184
.a -- size: 4 bytes, address: 0x7ffc05ad8184, offset: 0
.b -- size: 1 bytes, address: 0x7ffc05ad8188, offset: 4
.c -- size: 4 bytes, address: 0x7ffc05ad818c, offset: 8
struct plop_packed -- size: 9 bytes, address: 0x7ffc05ad817b
.a -- size: 4 bytes, address: 0x7ffc05ad817b, offset: 0
.b -- size: 1 bytes, address: 0x7ffc05ad817f, offset: 4
.c -- size: 4 bytes, address: 0x7ffc05ad8180, offset: 5
void *sbrk(intptr_t increment)
increment
bytesvoid *mmap(void *addr, size_t length, int prot, int flags, int fd, off_t offset)
flags
contains MAP_ANON
, does not map
any file, but allocates an area filled with 0s\(\rightarrow\) Non-Uniform Memory Access \(\rightarrow\) On which memory bank to allocate data?
double *array = malloc(sizeof(double)*N);
for(int i=0; i<N; i++) {
[i] = something(i);
array}
#pragma omp parallel for
for(int i=0; i<N; i++) {
double value = array[i];
/* ... */
}
void *numa_alloc_interleaved(size_t size)
double *array =
(sizeof(double)*N);
numa_alloc_interleaved
for(int i=0; i<N; i++) {
[i] = something(i);
array}
#pragma omp parallel for
for(int i=0; i<N; i++) {
double value = array[i];
/* ... */
}
It is also possible to use set_mempolicy
in order to
choose an allocation strategy for future memory allocations.
mbind
long mbind(void *addr, unsigned long len, int mode, const unsigned long *nodemask, unsigned long maxnode, unsigned flags)
double *array = malloc(sizeof(double)*N);
(&array[0], N/4*sizeof(double),
mbind, &nodemask, maxnode,
MPOL_BIND);
MPOL_MF_MOVE
#pragma omp parallel for
for(int i=0; i<N; i++) {
double value = array[i];
/* ... */
}