@@ -17,7 +17,7 @@ container managers.
1717
1818Before you read on, please make sure you read the low-level [ kernel
1919documentation about
20- cgroupsv2 ] ( https://www.kernel.org/doc/Documentation/cgroup-v2.txt ) . This
20+ cgroup v2 ] ( https://www.kernel.org/doc/Documentation/cgroup-v2.txt ) . This
2121documentation then adds in the higher-level view from systemd.
2222
2323This document augments the existing documentation we already have:
@@ -34,8 +34,8 @@ wiki documentation into this very document, too.)
3434## Two Key Design Rules
3535
3636Much of the philosophy behind these concepts is based on a couple of basic
37- design ideas of cgroupsv2 (which we however try to adapt as far as we can to
38- cgroupsv1 too). Specifically two cgroupsv2 rules are the most relevant:
37+ design ideas of cgroup v2 (which we however try to adapt as far as we can to
38+ cgroup v1 too). Specifically two cgroup v2 rules are the most relevant:
3939
40401 . The ** no-processes-in-inner-nodes** rule: this means that it's not permitted
4141to have processes directly attached to a cgroup that also has child cgroups and
@@ -58,45 +58,45 @@ your container manager creates and manages cgroups in the system's root cgroup
5858you violate rule #2 , as the root cgroup is managed by systemd and hence off
5959limits to everybody else.
6060
61- Note that rule #1 is generally enforced by the kernel if cgroupsv2 is used: as
61+ Note that rule #1 is generally enforced by the kernel if cgroup v2 is used: as
6262soon as you add a process to a cgroup it is ensured the rule is not
63- violated. On cgroupsv1 this rule didn't exist, and hence isn't enforced, even
63+ violated. On cgroup v1 this rule didn't exist, and hence isn't enforced, even
6464though it's a good thing to follow it then too. Rule #2 is not enforced on
65- either cgroupsv1 nor cgroupsv2 (this is UNIX after all, in the general case
65+ either cgroup v1 nor cgroup v2 (this is UNIX after all, in the general case
6666root can do anything, modulo SELinux and friends), but if you ignore it you'll
6767be in constant pain as various pieces of software will fight over cgroup
6868ownership.
6969
70- Note that cgroupsv1 is currently the most deployed implementation, even though
70+ Note that cgroup v1 is currently the most deployed implementation, even though
7171it's semantically broken in many ways, and in many cases doesn't actually do
72- what people think it does. cgroupsv2 is where things are going, and most new
73- kernel features in this area are only added to cgroupsv2 , and not cgroupsv1
74- anymore. For example cgroupsv2 provides proper cgroup-empty notifications, has
72+ what people think it does. cgroup v2 is where things are going, and most new
73+ kernel features in this area are only added to cgroup v2 , and not cgroup v1
74+ anymore. For example cgroup v2 provides proper cgroup-empty notifications, has
7575support for all kinds of per-cgroup BPF magic, supports secure delegation of
7676cgroup trees to less privileged processes and so on, which all are not
77- available on cgroupsv1 .
77+ available on cgroup v1 .
7878
7979## Three Different Tree Setups 🌳
8080
8181systemd supports three different modes how cgroups are set up. Specifically:
8282
83- 1 . ** Unified** — this is the simplest mode, and exposes a pure cgroupsv2
83+ 1 . ** Unified** — this is the simplest mode, and exposes a pure cgroup v2
8484logic. In this mode ` /sys/fs/cgroup ` is the only mounted cgroup API file system
8585and all available controllers are exclusively exposed through it.
8686
87- 2 . ** Legacy** — this is the traditional cgroupsv1 mode. In this mode the
87+ 2 . ** Legacy** — this is the traditional cgroup v1 mode. In this mode the
8888various controllers each get their own cgroup file system mounted to
8989` /sys/fs/cgroup/<controller>/ ` . On top of that systemd manages its own cgroup
9090hierarchy for managing purposes as ` /sys/fs/cgroup/systemd/ ` .
9191
92923 . ** Hybrid** — this is a hybrid between the unified and legacy mode. It's set
9393up mostly like legacy, except that there's also an additional hierarchy
94- ` /sys/fs/cgroup/unified/ ` that contains the cgroupsv2 hierarchy. (Note that in
94+ ` /sys/fs/cgroup/unified/ ` that contains the cgroup v2 hierarchy. (Note that in
9595this mode the unified hierarchy won't have controllers attached, the
9696controllers are all mounted as separate hierarchies as in legacy mode,
97- i.e. ` /sys/fs/cgroup/unified/ ` is purely and exclusively about core cgroupsv2
97+ i.e. ` /sys/fs/cgroup/unified/ ` is purely and exclusively about core cgroup v2
9898functionality and not about resource management.) In this mode compatibility
99- with cgroupsv1 is retained while some cgroupsv2 features are available
99+ with cgroup v1 is retained while some cgroup v2 features are available
100100too. This mode is a stopgap. Don't bother with this too much unless you have
101101too much free time.
102102
@@ -116,7 +116,7 @@ to talk of one specific cgroup and actually mean the same cgroup in all
116116available controller hierarchies. E.g. if we talk about the cgroup ` /foo/bar/ `
117117then we actually mean ` /sys/fs/cgroup/cpu/foo/bar/ ` as well as
118118` /sys/fs/cgroup/memory/foo/bar/ ` , ` /sys/fs/cgroup/pids/foo/bar/ ` , and so on.
119- Note that in cgroupsv2 the controller hierarchies aren't orthogonal, hence
119+ Note that in cgroup v2 the controller hierarchies aren't orthogonal, hence
120120thinking about them as orthogonal won't help you in the long run anyway.
121121
122122If you wonder how to detect which of these three modes is currently used, use
@@ -168,7 +168,7 @@ cgroup `/foo.slice/foo-bar.slice/foo-bar-baz.slice/quux.service/`.
168168By default systemd sets up four slice units:
169169
1701701 . ` -.slice ` is the root slice. i.e. the parent of everything else. On the host
171- system it maps directly to the top-level directory of cgroupsv2 .
171+ system it maps directly to the top-level directory of cgroup v2 .
172172
1731732 . ` system.slice ` is where system services are by default placed, unless
174174 configured otherwise.
@@ -187,8 +187,8 @@ above are just the defaults.
187187
188188Container managers and suchlike often want to control cgroups directly using
189189the raw kernel APIs. That's entirely fine and supported, as long as proper
190- * delegation* is followed. Delegation is a concept we inherited from cgroupsv2 ,
191- but we expose it on cgroupsv1 too. Delegation means that some parts of the
190+ * delegation* is followed. Delegation is a concept we inherited from cgroup v2 ,
191+ but we expose it on cgroup v1 too. Delegation means that some parts of the
192192cgroup tree may be managed by different managers than others. As long as it is
193193clear which manager manages which part of the tree each one can do within its
194194sub-graph of the tree whatever it wants.
@@ -217,7 +217,7 @@ guarantees:
217217 hierarchy (in unified and hybrid mode) as well as on systemd's own private
218218 hierarchy (in legacy and hybrid mode). It won't pass ownership of the legacy
219219 controller hierarchies. Delegation to less privileges processes is not safe
220- in cgroupsv1 (as a limitation of the kernel), hence systemd won't facilitate
220+ in cgroup v1 (as a limitation of the kernel), hence systemd won't facilitate
221221 access to it.
222222
2232233 . Any BPF IP filter programs systemd installs will be installed with
@@ -322,19 +322,19 @@ to work on that, and widen your horizon a bit. You are welcome.
322322systemd supports a number of controllers (but not all). Specifically, supported
323323are:
324324
325- * on cgroupsv1 : ` cpu ` , ` cpuacct ` , ` blkio ` , ` memory ` , ` devices ` , ` pids `
326- * on cgroupsv2 : ` cpu ` , ` io ` , ` memory ` , ` pids `
325+ * on cgroup v1 : ` cpu ` , ` cpuacct ` , ` blkio ` , ` memory ` , ` devices ` , ` pids `
326+ * on cgroup v2 : ` cpu ` , ` io ` , ` memory ` , ` pids `
327327
328- It is our intention to natively support all cgroupsv2 controllers as they are
329- added to the kernel. However, regarding cgroupsv1 : at this point we will not
328+ It is our intention to natively support all cgroup v2 controllers as they are
329+ added to the kernel. However, regarding cgroup v1 : at this point we will not
330330add support for any other controllers anymore. This means systemd currently
331- does not and will never manage the following controllers on cgroupsv1 :
331+ does not and will never manage the following controllers on cgroup v1 :
332332` freezer ` , ` cpuset ` , ` net_cls ` , ` perf_event ` , ` net_prio ` , ` hugetlb ` . Why not?
333333Depending on the case, either their API semantics or implementations aren't
334- really usable, or it's very clear they have no future on cgroupsv2 , and we
334+ really usable, or it's very clear they have no future on cgroup v2 , and we
335335won't add new code for stuff that clearly has no future.
336336
337- Effectively this means that all those mentioned cgroupsv1 controllers are up
337+ Effectively this means that all those mentioned cgroup v1 controllers are up
338338for grabs: systemd won't manage them, and hence won't delegate them to your
339339code (however, systemd will still mount their hierarchies, simply because it
340340mounts all controller hierarchies it finds available in the kernel). If you
@@ -355,9 +355,9 @@ cgroups in them — from previous runs, and be extra careful with them as they
355355might still carry settings that might not be valid anymore.
356356
357357Note a particular asymmetry here: if your systemd version doesn't support a
358- specific controller on cgroupsv1 you can still make use of it for delegation,
358+ specific controller on cgroup v1 you can still make use of it for delegation,
359359by directly fiddling with its hierarchy and replicating the cgroup tree there
360- as necessary (as suggested above). However, on cgroupsv2 this is different:
360+ as necessary (as suggested above). However, on cgroup v2 this is different:
361361separately mounted hierarchies are not available, and delegation has always to
362362happen through systemd itself. This means: when you update your kernel and it
363363adds a new, so far unseen controller, and you want to use it for delegation,
@@ -417,7 +417,7 @@ unified you (of course, I guess) need to provide only `/sys/fs/cgroup/` itself.
417417 arbitrary naming, you might need to escape some of the names (for example,
418418 you really don't want to create a cgroup named ` tasks ` , just because the
419419 user created a container by that name, because ` tasks ` after all is a magic
420- attribute in cgroupsv1 , and your ` mkdir() ` will hence fail with ` EEXIST ` . In
420+ attribute in cgroup v1 , and your ` mkdir() ` will hence fail with ` EEXIST ` . In
421421 systemd we do escaping by prefixing names that might collide with a kernel
422422 attribute name with an underscore. You might want to do the same, but this
423423 is really up to you how you do it. Just do it, and be careful.
@@ -462,9 +462,9 @@ unified you (of course, I guess) need to provide only `/sys/fs/cgroup/` itself.
462462 to get the cgroup for a unit. The method ` GetUnitByControlGroup() ` may be
463463 used to get the unit for a cgroup.)
464464
465- 6 . ⚡ Think twice before delegating cgroupsv1 controllers to less privileged
465+ 6 . ⚡ Think twice before delegating cgroup v1 controllers to less privileged
466466 containers. It's not safe, you basically allow your containers to freeze the
467- system with that and worse. Delegation is a strongpoint of cgroupsv2 though,
467+ system with that and worse. Delegation is a strongpoint of cgroup v2 though,
468468 and there it's safe to treat delegation boundaries as privilege boundaries.
469469
470470And that's it for now. If you have further questions, refer to the systemd
0 commit comments