Skip to content

Commit

Permalink
Merge tag 'mm-hotfixes-stable-2024-06-17-11-43' of git://git.kernel.o…
Browse files Browse the repository at this point in the history
…rg/pub/scm/linux/kernel/git/akpm/mm

Pull misc fixes from Andrew Morton:
 "Mainly MM singleton fixes. And a couple of ocfs2 regression fixes"

* tag 'mm-hotfixes-stable-2024-06-17-11-43' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
  kcov: don't lose track of remote references during softirqs
  mm: shmem: fix getting incorrect lruvec when replacing a shmem folio
  mm/debug_vm_pgtable: drop RANDOM_ORVALUE trick
  mm: fix possible OOB in numa_rebuild_large_mapping()
  mm/migrate: fix kernel BUG at mm/compaction.c:2761!
  selftests: mm: make map_fixed_noreplace test names stable
  mm/memfd: add documentation for MFD_NOEXEC_SEAL MFD_EXEC
  mm: mmap: allow for the maximum number of bits for randomizing mmap_base by default
  gcov: add support for GCC 14
  zap_pid_ns_processes: clear TIF_NOTIFY_SIGNAL along with TIF_SIGPENDING
  mm: huge_memory: fix misused mapping_large_folio_support() for anon folios
  lib/alloc_tag: fix RCU imbalance in pgalloc_tag_get()
  lib/alloc_tag: do not register sysctl interface when CONFIG_SYSCTL=n
  MAINTAINERS: remove Lorenzo as vmalloc reviewer
  Revert "mm: init_mlocked_on_free_v3"
  mm/page_table_check: fix crash on ZONE_DEVICE
  gcc: disable '-Warray-bounds' for gcc-9
  ocfs2: fix NULL pointer dereference in ocfs2_abort_trigger()
  ocfs2: fix NULL pointer dereference in ocfs2_journal_dirty()
  • Loading branch information
torvalds committed Jun 17, 2024
2 parents 5cf81d7 + 01c8f98 commit e6b324f
Show file tree
Hide file tree
Showing 29 changed files with 345 additions and 222 deletions.
6 changes: 0 additions & 6 deletions Documentation/admin-guide/kernel-parameters.txt
Original file line number Diff line number Diff line change
Expand Up @@ -2192,12 +2192,6 @@
Format: 0 | 1
Default set by CONFIG_INIT_ON_FREE_DEFAULT_ON.

init_mlocked_on_free= [MM] Fill freed userspace memory with zeroes if
it was mlock'ed and not explicitly munlock'ed
afterwards.
Format: 0 | 1
Default set by CONFIG_INIT_MLOCKED_ON_FREE_DEFAULT_ON

init_pkru= [X86] Specify the default memory protection keys rights
register contents for all processes. 0x55555554 by
default (disallow access to all but pkey 0). Can
Expand Down
1 change: 1 addition & 0 deletions Documentation/userspace-api/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@ Security-related interfaces
seccomp_filter
landlock
lsm
mfd_noexec
spec_ctrl
tee

Expand Down
86 changes: 86 additions & 0 deletions Documentation/userspace-api/mfd_noexec.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
.. SPDX-License-Identifier: GPL-2.0
==================================
Introduction of non-executable mfd
==================================
:Author:
Daniel Verkamp <dverkamp@chromium.org>
Jeff Xu <jeffxu@chromium.org>

:Contributor:
Aleksa Sarai <cyphar@cyphar.com>

Since Linux introduced the memfd feature, memfds have always had their
execute bit set, and the memfd_create() syscall doesn't allow setting
it differently.

However, in a secure-by-default system, such as ChromeOS, (where all
executables should come from the rootfs, which is protected by verified
boot), this executable nature of memfd opens a door for NoExec bypass
and enables “confused deputy attack”. E.g, in VRP bug [1]: cros_vm
process created a memfd to share the content with an external process,
however the memfd is overwritten and used for executing arbitrary code
and root escalation. [2] lists more VRP of this kind.

On the other hand, executable memfd has its legit use: runc uses memfd’s
seal and executable feature to copy the contents of the binary then
execute them. For such a system, we need a solution to differentiate runc's
use of executable memfds and an attacker's [3].

To address those above:
- Let memfd_create() set X bit at creation time.
- Let memfd be sealed for modifying X bit when NX is set.
- Add a new pid namespace sysctl: vm.memfd_noexec to help applications in
migrating and enforcing non-executable MFD.

User API
========
``int memfd_create(const char *name, unsigned int flags)``

``MFD_NOEXEC_SEAL``
When MFD_NOEXEC_SEAL bit is set in the ``flags``, memfd is created
with NX. F_SEAL_EXEC is set and the memfd can't be modified to
add X later. MFD_ALLOW_SEALING is also implied.
This is the most common case for the application to use memfd.

``MFD_EXEC``
When MFD_EXEC bit is set in the ``flags``, memfd is created with X.

Note:
``MFD_NOEXEC_SEAL`` implies ``MFD_ALLOW_SEALING``. In case that
an app doesn't want sealing, it can add F_SEAL_SEAL after creation.


Sysctl:
========
``pid namespaced sysctl vm.memfd_noexec``

The new pid namespaced sysctl vm.memfd_noexec has 3 values:

- 0: MEMFD_NOEXEC_SCOPE_EXEC
memfd_create() without MFD_EXEC nor MFD_NOEXEC_SEAL acts like
MFD_EXEC was set.

- 1: MEMFD_NOEXEC_SCOPE_NOEXEC_SEAL
memfd_create() without MFD_EXEC nor MFD_NOEXEC_SEAL acts like
MFD_NOEXEC_SEAL was set.

- 2: MEMFD_NOEXEC_SCOPE_NOEXEC_ENFORCED
memfd_create() without MFD_NOEXEC_SEAL will be rejected.

The sysctl allows finer control of memfd_create for old software that
doesn't set the executable bit; for example, a container with
vm.memfd_noexec=1 means the old software will create non-executable memfd
by default while new software can create executable memfd by setting
MFD_EXEC.

The value of vm.memfd_noexec is passed to child namespace at creation
time. In addition, the setting is hierarchical, i.e. during memfd_create,
we will search from current ns to root ns and use the most restrictive
setting.

[1] https://crbug.com/1305267

[2] https://bugs.chromium.org/p/chromium/issues/list?q=type%3Dbug-security%20memfd%20escalation&can=1

[3] https://lwn.net/Articles/781013/
1 change: 0 additions & 1 deletion MAINTAINERS
Original file line number Diff line number Diff line change
Expand Up @@ -23974,7 +23974,6 @@ VMALLOC
M: Andrew Morton <akpm@linux-foundation.org>
R: Uladzislau Rezki <urezki@gmail.com>
R: Christoph Hellwig <hch@infradead.org>
R: Lorenzo Stoakes <lstoakes@gmail.com>
L: linux-mm@kvack.org
S: Maintained
W: http://www.linux-mm.org
Expand Down
12 changes: 12 additions & 0 deletions arch/Kconfig
Original file line number Diff line number Diff line change
Expand Up @@ -1046,10 +1046,21 @@ config ARCH_MMAP_RND_BITS_MAX
config ARCH_MMAP_RND_BITS_DEFAULT
int

config FORCE_MAX_MMAP_RND_BITS
bool "Force maximum number of bits to use for ASLR of mmap base address"
default y if !64BIT
help
ARCH_MMAP_RND_BITS and ARCH_MMAP_RND_COMPAT_BITS represent the number
of bits to use for ASLR and if no custom value is assigned (EXPERT)
then the architecture's lower bound (minimum) value is assumed.
This toggle changes that default assumption to assume the arch upper
bound (maximum) value instead.

config ARCH_MMAP_RND_BITS
int "Number of bits to use for ASLR of mmap base address" if EXPERT
range ARCH_MMAP_RND_BITS_MIN ARCH_MMAP_RND_BITS_MAX
default ARCH_MMAP_RND_BITS_DEFAULT if ARCH_MMAP_RND_BITS_DEFAULT
default ARCH_MMAP_RND_BITS_MAX if FORCE_MAX_MMAP_RND_BITS
default ARCH_MMAP_RND_BITS_MIN
depends on HAVE_ARCH_MMAP_RND_BITS
help
Expand Down Expand Up @@ -1084,6 +1095,7 @@ config ARCH_MMAP_RND_COMPAT_BITS
int "Number of bits to use for ASLR of mmap base address for compatible applications" if EXPERT
range ARCH_MMAP_RND_COMPAT_BITS_MIN ARCH_MMAP_RND_COMPAT_BITS_MAX
default ARCH_MMAP_RND_COMPAT_BITS_DEFAULT if ARCH_MMAP_RND_COMPAT_BITS_DEFAULT
default ARCH_MMAP_RND_COMPAT_BITS_MAX if FORCE_MAX_MMAP_RND_BITS
default ARCH_MMAP_RND_COMPAT_BITS_MIN
depends on HAVE_ARCH_MMAP_RND_COMPAT_BITS
help
Expand Down
Loading

0 comments on commit e6b324f

Please sign in to comment.