Skip to content
This repository was archived by the owner on Nov 21, 2022. It is now read-only.

Commit 23ff7ff

Browse files
davidhildenbrandsfrothwell
authored andcommitted
mm/memory_hotplug: introduce add_memory_driver_managed()
Patch series "mm/memory_hotplug: Interface to add driver-managed system ram", v4. kexec (via kexec_load()) can currently not properly handle memory added via dax/kmem, and will have similar issues with virtio-mem. kexec-tools will currently add all memory to the fixed-up initial firmware memmap. In case of dax/kmem, this means that - in contrast to a proper reboot - how that persistent memory will be used can no longer be configured by the kexec'd kernel. In case of virtio-mem it will be harmful, because that memory might contain inaccessible pieces that require coordination with hypervisor first. In both cases, we want to let the driver in the kexec'd kernel handle detecting and adding the memory, like during an ordinary reboot. Introduce add_memory_driver_managed(). More on the samentics are in patch #1. In the future, we might want to make this behavior configurable for dax/kmem- either by configuring it in the kernel (which would then also allow to configure kexec_file_load()) or in kexec-tools by also adding "System RAM (kmem)" memory from /proc/iomem to the fixed-up initial firmware memmap. More on the motivation can be found in [1] and [2]. [1] https://lkml.kernel.org/r/20200429160803.109056-1-david@redhat.com [2] https://lkml.kernel.org/r/20200430102908.10107-1-david@redhat.com This patch (of 3): Some device drivers rely on memory they managed to not get added to the initial (firmware) memmap as system RAM - so it's not used as initial system RAM by the kernel and the driver is under control. While this is the case during cold boot and after a reboot, kexec is not aware of that and might add such memory to the initial (firmware) memmap of the kexec kernel. We need ways to teach kernel and userspace that this system ram is different. For example, dax/kmem allows to decide at runtime if persistent memory is to be used as system ram. Another future user is virtio-mem, which has to coordinate with its hypervisor to deal with inaccessible parts within memory resources. We want to let users in the kernel (esp. kexec) but also user space (esp. kexec-tools) know that this memory has different semantics and needs to be handled differently: 1. Don't create entries in /sys/firmware/memmap/ 2. Name the memory resource "System RAM ($DRIVER)" (exposed via /proc/iomem) ($DRIVER might be "kmem", "virtio_mem"). 3. Flag the memory resource IORESOURCE_MEM_DRIVER_MANAGED /sys/firmware/memmap/ [1] represents the "raw firmware-provided memory map" because "on most architectures that firmware-provided memory map is modified afterwards by the kernel itself". The primary user is kexec on x86-64. Since commit d96ae53 ("memory-hotplug: create /sys/firmware/memmap entry for new memory"), we add all hotplugged memory to that firmware memmap - which makes perfect sense for traditional memory hotplug on x86-64, where real HW will also add hotplugged DIMMs to the firmware memmap. We replicate what the "raw firmware-provided memory map" looks like after hot(un)plug. To keep things simple, let the user provide the full resource name instead of only the driver name - this way, we don't have to manually allocate/craft strings for memory resources. Also use the resource name to make decisions, to avoid passing additional flags. In case the name isn't "System RAM", it's special. We don't have to worry about firmware_map_remove() on the removal path. If there is no entry, it will simply return with -EINVAL. We'll adapt dax/kmem in a follow-up patch. [1] https://www.kernel.org/doc/Documentation/ABI/testing/sysfs-firmware-memmap Link: http://lkml.kernel.org/r/20200508084217.9160-1-david@redhat.com Link: http://lkml.kernel.org/r/20200508084217.9160-3-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Acked-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Baoquan He <bhe@redhat.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
1 parent 8ce1066 commit 23ff7ff

File tree

3 files changed

+61
-4
lines changed

3 files changed

+61
-4
lines changed

include/linux/ioport.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -103,6 +103,7 @@ struct resource {
103103
#define IORESOURCE_MEM_32BIT (3<<3)
104104
#define IORESOURCE_MEM_SHADOWABLE (1<<5) /* dup: IORESOURCE_SHADOWABLE */
105105
#define IORESOURCE_MEM_EXPANSIONROM (1<<6)
106+
#define IORESOURCE_MEM_DRIVER_MANAGED (1<<7)
106107

107108
/* PnP I/O specific bits (IORESOURCE_BITS) */
108109
#define IORESOURCE_IO_16BIT_ADDR (1<<0)

include/linux/memory_hotplug.h

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -342,6 +342,8 @@ extern void __ref free_area_init_core_hotplug(int nid);
342342
extern int __add_memory(int nid, u64 start, u64 size);
343343
extern int add_memory(int nid, u64 start, u64 size);
344344
extern int add_memory_resource(int nid, struct resource *resource);
345+
extern int add_memory_driver_managed(int nid, u64 start, u64 size,
346+
const char *resource_name);
345347
extern void move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn,
346348
unsigned long nr_pages, struct vmem_altmap *altmap);
347349
extern void remove_pfn_range_from_zone(struct zone *zone,

mm/memory_hotplug.c

Lines changed: 58 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -98,11 +98,14 @@ void mem_hotplug_done(void)
9898
u64 max_mem_size = U64_MAX;
9999

100100
/* add this memory to iomem resource */
101-
static struct resource *register_memory_resource(u64 start, u64 size)
101+
static struct resource *register_memory_resource(u64 start, u64 size,
102+
const char *resource_name)
102103
{
103104
struct resource *res;
104105
unsigned long flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
105-
char *resource_name = "System RAM";
106+
107+
if (strcmp(resource_name, "System RAM"))
108+
flags |= IORESOURCE_MEM_DRIVER_MANAGED;
106109

107110
/*
108111
* Make sure value parsed from 'mem=' only restricts memory adding
@@ -1057,7 +1060,8 @@ int __ref add_memory_resource(int nid, struct resource *res)
10571060
BUG_ON(ret);
10581061

10591062
/* create new memmap entry */
1060-
firmware_map_add_hotplug(start, start + size, "System RAM");
1063+
if (!strcmp(res->name, "System RAM"))
1064+
firmware_map_add_hotplug(start, start + size, "System RAM");
10611065

10621066
/* device_online() will take the lock when calling online_pages() */
10631067
mem_hotplug_done();
@@ -1083,7 +1087,7 @@ int __ref __add_memory(int nid, u64 start, u64 size)
10831087
struct resource *res;
10841088
int ret;
10851089

1086-
res = register_memory_resource(start, size);
1090+
res = register_memory_resource(start, size, "System RAM");
10871091
if (IS_ERR(res))
10881092
return PTR_ERR(res);
10891093

@@ -1105,6 +1109,56 @@ int add_memory(int nid, u64 start, u64 size)
11051109
}
11061110
EXPORT_SYMBOL_GPL(add_memory);
11071111

1112+
/*
1113+
* Add special, driver-managed memory to the system as system RAM. Such
1114+
* memory is not exposed via the raw firmware-provided memmap as system
1115+
* RAM, instead, it is detected and added by a driver - during cold boot,
1116+
* after a reboot, and after kexec.
1117+
*
1118+
* Reasons why this memory should not be used for the initial memmap of a
1119+
* kexec kernel or for placing kexec images:
1120+
* - The booting kernel is in charge of determining how this memory will be
1121+
* used (e.g., use persistent memory as system RAM)
1122+
* - Coordination with a hypervisor is required before this memory
1123+
* can be used (e.g., inaccessible parts).
1124+
*
1125+
* For this memory, no entries in /sys/firmware/memmap ("raw firmware-provided
1126+
* memory map") are created. Also, the created memory resource is flagged
1127+
* with IORESOURCE_MEM_DRIVER_MANAGED, so in-kernel users can special-case
1128+
* this memory as well (esp., not place kexec images onto it).
1129+
*
1130+
* The resource_name (visible via /proc/iomem) has to have the format
1131+
* "System RAM ($DRIVER)".
1132+
*/
1133+
int add_memory_driver_managed(int nid, u64 start, u64 size,
1134+
const char *resource_name)
1135+
{
1136+
struct resource *res;
1137+
int rc;
1138+
1139+
if (!resource_name ||
1140+
strstr(resource_name, "System RAM (") != resource_name ||
1141+
resource_name[strlen(resource_name) - 1] != ')')
1142+
return -EINVAL;
1143+
1144+
lock_device_hotplug();
1145+
1146+
res = register_memory_resource(start, size, resource_name);
1147+
if (IS_ERR(res)) {
1148+
rc = PTR_ERR(res);
1149+
goto out_unlock;
1150+
}
1151+
1152+
rc = add_memory_resource(nid, res);
1153+
if (rc < 0)
1154+
release_memory_resource(res);
1155+
1156+
out_unlock:
1157+
unlock_device_hotplug();
1158+
return rc;
1159+
}
1160+
EXPORT_SYMBOL_GPL(add_memory_driver_managed);
1161+
11081162
#ifdef CONFIG_MEMORY_HOTREMOVE
11091163
/*
11101164
* Confirm all pages in a range [start, end) belong to the same zone (skipping

0 commit comments

Comments
 (0)