Containers and Orchestration: a Security Perspective

Mathieu Bacou
mathieu.bacou@telecom-sudparis.eu
Télécom SudParis, IMT, IP Paris, Inria

2024–2025 NET5039 — Systèmes, virtualisation et sécurité

Cloud applications

Traditional applications and monolithic
- Everything is tightly coupled
- On full servers, managed from OS to deployment
  - This is a constraint on the user
In the cloud, you don’t manage real servers
- You share servers, thanks to virtualization
- You get new resources (“servers”) on-the-fly
We can drive this model further!

Cloud native applications

No OS management by the user
Component-level application scalability

Introducing: containers

Cloud users do not want to run OSes
- They want to run their applications
How to share cloud resources closer to the applications?
- Virtualization layer just between the OS and the application
Virtualize the OS for multiple applications at the same time!
- In other words, containers are OS-level virtualization
An OS executes a container runtime that uses a container engine to run containers
- Docker, LXC, OpenVZ…

Actors of OS-level virtualization

Container runtime
Container engine
Container

Containers runtimes

High-level management of containers, artifacts and runtime configuration
- Business-oriented container lifecycle
- Build, download container images
- Configure networking, volumes, security, etc.
A container image packages an application and its runtime
- Business core, dependencies, pre-configuration
Ecosystem of reusable images stored in registries (DockerHub, GitLab registries, local registry…)
- Images are built immutable for portability, reusability and composability
Examples: Docker (containerd), Podman, Apptainer…

Container engines

Low-level management of containers
- Create, start, stop, destroy…
- Prepare images for usage
- Last-mile setup of networking, mounts, security…
Different engines for different usages or orientations: generic, security-oriented, scientific…
Examples: runc, Kata Container, gVisor, wasmtime…

Containers

Container: isolated and limited virtual copy of the host OS
- Deploys the image to “fill in” the virtual copy
Isolation: users, devices, processes…
- Virtual filesystem: built from container image
Limits: CPU, memory, I/O…
- Also monitoring

Comparison with hardware virtualization: stack

Comparison with hardware virtualization: features

Comparison of features between hardware and OS-level virtualization techniques.
	Operating system-level virtualization	Hardware virtualization
Security	-	+
Usability	++	-
Performance	0	0
Startup time	+	- -
Image size	+	- -
Memory overhead	+	- -

Containers are better overall for cloud-native applications
- Applications are architectured to be deployed on the cloud
Security concerns
- Kernel shared between containers
VMs still have use cases: persistent, interactive environments, robustness, first-level resource provisioning…

Demo: Docker

Creation and usage of a Docker container
- Run an interactive image
- Pull and run a daemon service
- List images, monitor containers
Docker is rather “low-level” for applications: compose multiple components (containers) in a single application with Docker Compose

Building containers: two ways

Interactively
1. From a base distribution image
- Linux distributions: Ubuntu, Alpine…
- Runtime distributions (based on Linux distributions): Python…
1. Use the package manager to add software
2. docker commit tags the current state of the container as an image
- Efficient for testing and experimenting
Writing a Dockerfile
- DSL to describe how to install and configure the bundled applications
- Proper method: clean, reusable, reproducible, auditable…

Building containers with a Dockerfile

# Starting from a base image.
FROM alpine

# Execute commands to build and configure the image.
RUN apk add --no-cache perl

# Add local files.
COPY cowsay /usr/local/bin/cowsay
COPY docker.cow /usr/local/share/cows/default.cow

# Set the default executable.
ENTRYPOINT ["/usr/local/bin/cowsay"]

Build the image with docker build --tag namespace/name:tag
Can start from an empty image: FROM scratch
- Rarely used, only by base images of distributions, where the image is built from an archive
May also include users, volumes, network ports…

Security of containers

Isolation
Threat models and vectors
Good practices

Isolation

Fundamental issue for cloud providers: execute untrusted code
When using containers: tenants and provider share the kernel
- No mitigation when the kernel is compromised
- Incompatibility of kernel-level security policies
  - Because security measures are mostly not namespaces (AppArmor, etc.)
- Vast attack surface and trusted code base
  - Virtual Machines (VMs) are better in this regard (hypervisor interface vs. whole host kernel)
Isolation of untrusted code
- To protect containers from each other
- To protect the system from containers

Threat models: goals and targets

Attack goals:
- Disrupt services: bad neighbor, denial of service
- Subvert services: steal identity, steal resources
- Steal data
Targets are cloud-oriented: applications are regular services
- Mostly web servers or applications accessed by HTTP requests
- But sometimes containers include system services
  - SSH, cron jobs, logs…
  - More privileged requirements => more care!

Threat models: overview

direct attack from outside, on the containerized application
indirect attack from a container, on another containerized application
attack from a container, on the host system

Threat model (A): direct attack from outside

Containers are not security magic for Internet-facing applications!
- A vulnerable web server remains vulnerable
But containers help against vulnerabilities:
- Breach containment
  - Importance of the micro-service model
- Safe configuration is easier to achieve
  - More secure defaults
  - Fewer configuration items to tweak thank to virtualized environment
- Simpler audit
  - Limited set of dependencies and software pieces
- Fast, easy distribution of security updates
  - Container distribution model: generic images pulled from a centralized place
Regarding the distribution model: it can also be a threat vector (attacks on the distribution channel)
- Do not pull unaudited images or updates: may be freshly vulnerable or compromised
  - Malicious updates or owner, typosquatting…
- Use a private repository of audited images

Threat model (B): indirect attack between containers

Containers run arbitrary code by definition
B1: Escape to another container
- Bug in namespaces implementation
- Leaks in the filesystem
B2 : Abuse of the container network
- Packet forging
- Layer 2 attacks
B3 : Escalation to root
- Vulnerable SUID binaries
- Vulnerable container engine implementation
B4 : Execute arbitrary kernel code
- Exploitable system calls
Not always container-specific

Threat model (C): attack on the host system

Containers run arbitrary code by definition
Escape containment
- Namespace bug
- Filesystem leak
Escalate to root
- Vulnerable SUID binaries
- Vulnerable container engine implementation
Execute arbitrary kernel code
- Exploitable system calls
Not always container-specific

Good practices: as a user

Audit public images
- Fix versions, but monitor for security updates
Use the micro-services architectures
- For intrusion detection and containment
- Every micro-service can be augmented with its own monitor
Mount volumes read-only when possible
- Container images are already immutable thanks to the overlay FS
Drop capabilities
- Many are dropped by default, but more can usually be dropped

Good practices: as an image developer

Use and build immutable container images
- I.e., images that can be deployed identically everywhere, only configured to fit the environment
- Example: do not build images that download binaries when starting
Do not run as root in the container (non root images)
- User namespaces allow that, but this is not an excuse
- Do not rely on SUID binaries in general

Good practices: as a system administrator

Harden the kernel
- Enable MAC: Linux Security Modules (AppArmor, SELinux, etc.), seccomp…
  - Not really containerization-aware but still very well usable
- Use a hardened kernel (GRSEC…)
- Update the kernel to vetted versions
  - It is a critical part, because it is shared with all containers (huge trusted codebase between tenants)
Configure container networking tightly
- Do not use host mode
- Think about shared network namespaces, open ports, common virtual networks…
  - In practice: managed by docker-compose, Kubernetes, etc.
Why not go one step beyond: use virtual machines!
- An application in a Docker container in a virtual machine (in a container ?)
- Kata Containers, gVisor…
- Or just split physical servers by tenants using VMs

Internal of a container engine

Isolation
Limit
Operation control
Virtual filesystem

Isolation: namespaces

Provide an isolated view of the OS
8 dimensions:

mnt: mount points
- Hierarchy of sub-filesystems
pid: hierarchy of processes
- The first process in the container gets PID 1
net: networking facilities
- Interfaces, ports, protocol stack…
ipc: interprocess communication
- SysV IPC mechanisms: semaphores, message queues, shared memory segments
time: date and time

user: users, groups and privileges
- The engine establishes a mapping between host UIDs (GIDs) and in-container UIDs (GIDs)
  - root is defined as UID 0, which is available inside the container: escape the container as root, and you are root on the host!
uts: hostname and domain name
- For UNIX TimeSharing, from an era of remote computers and client terminals
cgroup: control groups (see next)

Limit: control groups (cgroups)

Constrain resource usage
- Also prioritization, accounting, control
8 “dimensions” (controllers):
- cpu: CPU time
- cpuset: task placement on memory and CPU nodes
- memory: memory usage
- io: block I/O
- pid: number of PIDs (i.e., of processes)
- device: access to device files
  - special: only through BPF
- perf_event: performance monitoring
- net: network packets priority and classes for QoS
Other specialized controllers: rdma, hugetlb, misc

Operation control: capabilities and MAC

Capabilities: selectively drop root privileges
Mandatory Access Control (MAC): system-level operational policies with Linux Security Modules
- SELinux, AppArmor, seccomp…
More than 40 capabilities (CAP_XXX):

SYS_NICE: change process niceness
SYS_ADMIN: system admin (mount…)
SYS_CHROOT: change root path of process
SYS_MODULE: (un)load kernel modules

SETGID/UID: change process GIDs/UIDs
KILL: send signals
NET_ADMIN: network admin
NET_RAW: use RAW sockets
CHOWN: change owner

Virtual filesystem

Isolated filesystem:
- mnt namespace to isolate hierarchy
- chroot to isolate the process to a subtree
Two parts to the filesystem visible to the container:
1. Container image: basis for virtual filesystem
  - Bundle of files and filesystem operations in layers
2. Volumes: external data storage
  - Mounted into the virtual filesystem of the running container

Virtual filesystem: layers and volumes

An image is made of layers
- Like git commits, to represent modifications on the filesystem
- Reusable by other images, with caching
- Visible with docker image history $IMAGE_NAME
Layers of an image, built from a Dockerfile, are read-only
- The engine adds a writeable layer on top during container execution
- Use copy-on-write to modify files from lower layers
Managed by a union filesystem: driver of a layered virtual filesystem (Overlayfs)

Virtual filesystem: OverlayFS and copy-on-write

Illustration of a union filesystem and copy-on-write.

Low-level view of a container engine

Low-level components and interface of a container engine.

Most features that make a container, come from the Linux kernel!

Demo: namespaces and cgroups

Spawn a new process in namespaces
Put a process in control groups
- Set limit and monitor resource usage
Using the virtual filesystem interface
- There are also syscalls

Containers for the cloud

Application architecture in the cloud
Micro-services
Orchestration

Cloud application architecture

Historic pattern: monolithic application
- All components are ad-hoc, tightly coupled
Unfit for the cloud
- Must manage all components at once for scalability, deployment, service quality
- Hard to reconfigure
New paradigm enabled by containers: micro-services

Micro-services

Components as processes
- Manual interfacing
  - Need in-container PID 1 (service manager) to run multiple processes
- Cons of monolithic apps (see previous slide)

Components as containers
- Maximum reuse of images
- High flexibility, clean configuration and interfacing
- Fine-grained scalability

Networking for micro-services

Configuration of networking by the container runtime
- Dedicated links between component containers
- Controlled link to the outside world
Network drivers:
- Host: expose host network devices to the container (no isolation)
- Bridge: local virtual network
  - May be exposed to the outside world via virtual routing
- Overlay: inter-host inter-container network
- None: no networking at all

Networking for micro-services: illustrations

Orchestration

Composition: build applications as micro-services
- Roughly: manage multiple containers as one application
- Example: Docker Compose
Orchestration: manage micro-services
- Deployment
- Distribution
- Replication
- Load-balancing
- Availability
- Rolling updates
- …
Orchestration exposes higher-level interfaces to the features of composition
- In the end, the orchestrator is the user front-end
Examples: Kubernetes, Docker Swarm
Abstraction of management unit: the pod

Orchestration: scheduling

Manual criteria: filters
- Handle host heterogeneity
  - Settings of container runtime, host OS…
- Container affinity: force placement for resource access
  - Image availability, volume placement, other container…
Strategies for deployment on physical hosts
- Spread: balance load over hosts
- Binpack: colocate as much as possible
Handle colocation of tightly-coupled containers: pods
- Containers in a pod share the same network namespace and same volumes
- Pod = service container + helper (sidecar) containers (logging, interfacing…)

Orchestration of pods

Demo: Kubernetes

Create and use a pod
Create and use a deployment
- Scalability
- Roll-out

Kubernetes application: example of `deployment.yaml`

kind: Deployment
# [...]
spec:
  # Scalability: set number of replicas.
  replicas: 3
  selector:
    matchLabels:
      app: simpleserver
  template:
    metadata:
      labels:
        app: simpleserver
    spec:
      # Pod: composition of containers.
      containers:
      - name: pythonserver
        image: python:simpleserver
        resources:
          requests:
            cpu: 0.5
        ports:
        - containerPort: 8080

Operating system-level virtualization

Virtualize the OS instead of the hardware
- Containers: simpler, lighter, faster
  - Not safer!
Based on the Linux kernel (LinuX Containers, LXC): namespaces, cgroups, etc.
- Container engines wrap those features and deliver unified specifications
- Container runtimes bring usability, networking, development processes…
Enabling new cloud-native application architecture: micro-services
- Compositions of containers managed by orchestrators

Containers and Orchestration: a Security Perspective

Cloud applications

Cloud native applications

Introducing: containers

Actors of OS-level virtualization

Containers runtimes

Container engines

Containers

Comparison with hardware virtualization: stack

Comparison with hardware virtualization: features

Demo: Docker

Building containers: two ways

Building containers with a Dockerfile

Security of containers

Isolation

Threat models: goals and targets

Threat models: overview

Threat model (A): direct attack from outside

Threat model (B): indirect attack between containers

Threat model (C): attack on the host system

Good practices: as a user

Good practices: as an image developer

Good practices: as a system administrator

Internal of a container engine

Isolation: namespaces

Limit: control groups (cgroups)

Operation control: capabilities and MAC

Virtual filesystem

Virtual filesystem: layers and volumes

Virtual filesystem: OverlayFS and copy-on-write

Low-level view of a container engine

Demo: namespaces and cgroups

Containers for the cloud

Cloud application architecture

Micro-services

Networking for micro-services

Networking for micro-services: illustrations

Orchestration

Orchestration: scheduling

Orchestration of pods

Demo: Kubernetes

Kubernetes application: example of deployment.yaml

Operating system-level virtualization

Kubernetes application: example of `deployment.yaml`