💻
Application Security Cheat Sheet
  • Application Security Cheat Sheet
  • Android Application
    • Overview
      • Application Data & Files
      • Application Package
      • Application Sandbox
      • Application Signing
      • Deployment
      • Package Manager
    • Intent Vulnerabilities
      • Deep Linking Vulnerabilities
    • WebView Vulnerabilities
      • WebResourceResponse Vulnerabilities
      • WebSettings Vulnerabilities
  • CI/CD
    • Dependency
      • Dependency Confusion
      • Dependency Hijaking
      • Typosquatting
    • GitHub
      • GitHub Actions
      • Code owners
      • Dependabot
      • Redirect
      • Releases
  • Cloud
    • AWS
      • Amazon API Gateway
      • Amazon Cognito
      • Amazon S3
  • Container
    • Overview
      • Container Basics
      • Docker Engine
    • Escaping
      • CVE List
      • Exposed Docker Socket
      • Excessive Capabilities
      • Host Networking Driver
      • PID Namespace Sharing
      • Sensitive Mounts
    • Container Analysis Tools
  • Framework
    • Spring
      • Overview
      • Mass Assignment
      • Routing Abuse
      • SpEL Injection
      • Spring Boot Actuators
      • Spring Data Redis Insecure Deserialization
      • Spring View Manipulation
    • React
      • Overview
      • Security Issues
  • Linux
    • Overview
      • Philosophy
      • File
      • File Descriptor
      • I/O Redirection
      • Process
      • Inter Process Communication
      • Shell
      • Signals
      • Socket
      • User Space vs Kernel Space
    • Bash Tips
  • iOS Application
    • Overview
      • Application Data & Files
      • Application Package
      • Application Sandbox
      • Application Signing
      • Deployment
    • Getting Started
      • IPA Patching
      • Source Code Patching
      • Testing with Objection
  • Resources
    • Lists
      • Payloads
      • Wordlists
    • Researching
      • Web Application
      • Write-ups
    • Software
      • AWS Tools
      • Azure Tools
      • Component Analysis
      • Docker Analysis
      • Dynamic Analysis
      • Fuzzing
      • GCP Tools
      • Reverse Engineering
      • Static Analysis
      • Vulnerability Scanning
    • Training
      • Secure Development
  • Web Application
    • Abusing HTTP hop-by-hop Request Headers
    • Broken Authentication
      • Two-Factor Authentication Vulnerabilities
    • Command Injection
      • Argument Injection
    • Content Security Policy
    • Cookie Security
      • Cookie Bomb
      • Cookie Jar Overflow
      • Cookie Tossing
    • CORS Misconfiguration
    • File Upload Vulnerabilities
    • GraphQL Vulnerabilities
    • HTML Injection
      • base
      • iframe
      • link
      • meta
      • target attribute
    • HTTP Header Security
    • HTTP Request Smuggling
    • Improper Rate Limits
    • JavaScript Prototype Pollution
    • JSON Web Token Vulnerabilities
    • OAuth 2.0 Vulnerabilities
      • OpenID Connect Vulnerabilities
    • Race Condition
    • Server Side Request Forgery
      • Post Exploitation
    • SVG Abuse
    • Weak Random Generation
    • Web Cache Poisoning
Powered by GitBook
On this page
  • Process descriptor and the task structure
  • /proc
  • Properties
  • Credentials
  • Capabilities
  • Filesystem
  • Namespaces
  • Cgroups
  • Linux Security Modules
  • seccomp
  • Container security model
  • References
  1. Container
  2. Overview

Container Basics

PreviousOverviewNextDocker Engine

Last updated 3 years ago

Container is a task, or set of tasks, with special properties to isolate the task(s), and restrict access to system resources.

Process descriptor and the task structure

The kernel stores the list of processes in a circular doubly linked list called the task list.

Each element in the task list is a process descriptor of the type struct task_struct. The process descriptor contains all the information about a specific process.

/proc

/proc is a special filesystem mount (procfs) for accessing system and process information directly from the kernel by reading "file" entries.

Properties

Credentials

All of a task's credentials are held in a refcounted structure of type struct cred. Each task points to its credentials by a pointer called cred in its task_struct.

Traditional UNIX implementations of permissions distinguish two categories:

  • privileged processes with user ID of 0 (root)

  • every other process

Capabilities

Linux capabilities were introduced as a way to break the role of root down into discrete subsections, which could be granted to non-root processes to allow them to perform privileged actions.

A process has a concept of a "permitted set" of capabilities, which acts as a limiting superset for the capabilities it can have. Importantly, and by default, this bounding set is carried over to any child process, so the "init" process of the container creates a limiting set of capabilities for all processes inside the container (as all processes descend from PID 1).

Containers are tasks which run should run with a restricted set of capabilities.

To view a list of capabilities, you can use capsh:

$ capsh --print

Filesystem

The container's root mount is often planted in a container-specialized filesystem, such as AUFS or OverlayFS. In case of OverlayFS, the container's root of / really lives in /var/lib/docker/overlay2.

Namespaces

There are 8 types of namespaces available on Linux.

Namespace
Isolates

Cgroup

Cgroup root directory

IPC

Provides namespaced versions of SystemV IPC and POSIX message queues.

Network

Provides a namespaced and isolated network stack. The majority of container use-cases involve networked services, so this will prove to be a core feature of containers.

Mount

PID

Provides a namespaced tree of process IDs (PIDs). This allows each container to have a full isolated process tree, in which it has an ‘init’ process that it runs as PID 1 inside this namespace. Processes running in a container will have a different PID on the host than they do inside the container’s PID namespace.

Time

Boot and monotonic clocks.

User

Provides a namespaced version of User IDs (UIDs) and Group IDs (GIDs). This is one of the most important features of modern container systems, as it is used to provide "unprivileged containers". These are containers in which root (UID 0) inside the container is not root outside the container, greatly increasing the container's security.

UTS

Provides a namespaced version of system identifiers.

Each process has a /proc/[pid]/ns/ subdirectory containing one entry for each namespace.

References:

Cgroups

Control groups, usually referred to as cgroups, are a Linux kernel feature which allow processes to be organized into hierarchical groups whose usage of various types of resources can then be limited and monitored. The kernel's cgroup interface is provided through a pseudo-filesystem called cgroupfs, where hierarchy is expressed through the directory tree in each mount.

Grouping is implemented in the core cgroup kernel code, while resource tracking and limits are implemented in a set of per-resource-type subsystems (memory, CPU, and so on).

A cgroup filesystem initially contains a single root cgroup, /, which all processes belong to. A new cgroup is created by creating a directory in the cgroup filesystem.

$ mkdir /sys/fs/cgroup/cpu/cg1

This creates a new empty cgroup. A process may be moved to this cgroup by writing its PID into the cgroup's cgroup.procs file:

$ echo $$ > /sys/fs/cgroup/cpu/cg1/cgroup.procs

Linux Security Modules

Docker and LXC enable a default LSM profile in enforcement mode, which mostly serves to restrict a container's access to sensitive /proc and /sys entries.

The profile also denies mount syscall.

seccomp

Seccomp policies come in two versions:

  • Strict mode - is a small set of allowed system calls which cannot be customized.

  • Filter mode - system call filters are written as Berkeley Packet Filter (BPF) programs; this allows more finely-grained policies to be set on system call usage (with some caveats, seccomp-bpf filters can inspect syscall arguments, but cannot dereference pointers).

Container security model

References

proc

is process identifiers. Credentials describe the user identity of a task, which determine its permissions for shared resources such as files, semaphores, and shared memory.

Since kernel 2.2, Linux divides the privileges associated with superuser into distinct units known as .

The filesystem root for a container is (usually) isolated from other containers and host's root filesystem via the syscall.

A wraps a global system resource in an abstraction that makes it appear to the processes within the namespace that they have their own isolated instance of the global resource. Changes to the global resource are visible to other processes that are members of the namespace, but are invisible to other processes.

Provides a namespaced view of mount points. Combined with the syscall, this will be used to isolate the container's filesystem from the host's filesystem.

Creation of new namespaces using and in most cases requires the CAP_SYS_ADMIN capability. User namespaces are the exception: since Linux 3.8, no privilege is required to create a user namespace.

is a Linux control groups.

and are Linux security modules providing Mandatory Access Control (MAC), where access rules for a program are described by a profile.

Since kernel 3.17, Linux has a mechanism for filtering access to system calls through the subsystem.

Credentials
capabilities
pivot_root
namespace
clone(2)
unshare(2)
Quarkslab's blog: Digging into Linux namespaces - part 1
Quarkslab's blog: Digging into Linux namespaces - part 2
Cgroups
AppArmor
SELinux
seccomp
A Compendium of Container Escapes
Abusing Privileged and Unprivileged Linux Containers
Understanding and Hardening Linux Containers
pivot_root()
container
task-list
task-struct
task-struct-example
capabilities
restricted-capabilities
overlayfs-root-path
namespace-user-example
cgroupfs
seccomp
container-security-model