Skip to content

Commit

Permalink
Merge branch 'akpm' (more patches from Andrew)
Browse files Browse the repository at this point in the history
Merge patches from Andrew Morton:
 "Most of the rest of MM, plus a few dribs and drabs.

  I still have quite a few irritating patches left around: ones with
  dubious testing results, lack of review, ones which should have gone
  via maintainer trees but the maintainers are slack, etc.

  I need to be more activist in getting these things wrapped up outside
  the merge window, but they're such a PITA."

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (48 commits)
  mm/vmscan.c: avoid possible deadlock caused by too_many_isolated()
  vmscan: comment too_many_isolated()
  mm/kmemleak.c: remove obsolete simple_strtoul
  mm/memory_hotplug.c: improve comments
  mm/hugetlb: create hugetlb cgroup file in hugetlb_init
  mm/mprotect.c: coding-style cleanups
  Documentation: ABI: /sys/devices/system/node/
  slub: drop mutex before deleting sysfs entry
  memcg: add comments clarifying aspects of cache attribute propagation
  kmem: add slab-specific documentation about the kmem controller
  slub: slub-specific propagation changes
  slab: propagate tunable values
  memcg: aggregate memcg cache values in slabinfo
  memcg/sl[au]b: shrink dead caches
  memcg/sl[au]b: track all the memcg children of a kmem_cache
  memcg: destroy memcg caches
  sl[au]b: allocate objects from memcg cache
  sl[au]b: always get the cache from its page in kmem_cache_free()
  memcg: skip memcg kmem allocations in specified code regions
  memcg: infrastructure to match an allocation to the right cache
  ...
  • Loading branch information
torvalds committed Dec 18, 2012
2 parents d7b96ca + 3cf2384 commit 673ab87
Show file tree
Hide file tree
Showing 38 changed files with 2,548 additions and 345 deletions.
96 changes: 95 additions & 1 deletion Documentation/ABI/stable/sysfs-devices-node
Original file line number Diff line number Diff line change
@@ -1,7 +1,101 @@
What: /sys/devices/system/node/possible
Date: October 2002
Contact: Linux Memory Management list <linux-mm@kvack.org>
Description:
Nodes that could be possibly become online at some point.

What: /sys/devices/system/node/online
Date: October 2002
Contact: Linux Memory Management list <linux-mm@kvack.org>
Description:
Nodes that are online.

What: /sys/devices/system/node/has_normal_memory
Date: October 2002
Contact: Linux Memory Management list <linux-mm@kvack.org>
Description:
Nodes that have regular memory.

What: /sys/devices/system/node/has_cpu
Date: October 2002
Contact: Linux Memory Management list <linux-mm@kvack.org>
Description:
Nodes that have one or more CPUs.

What: /sys/devices/system/node/has_high_memory
Date: October 2002
Contact: Linux Memory Management list <linux-mm@kvack.org>
Description:
Nodes that have regular or high memory.
Depends on CONFIG_HIGHMEM.

What: /sys/devices/system/node/nodeX
Date: October 2002
Contact: Linux Memory Management list <linux-mm@kvack.org>
Description:
When CONFIG_NUMA is enabled, this is a directory containing
information on node X such as what CPUs are local to the
node.
node. Each file is detailed next.

What: /sys/devices/system/node/nodeX/cpumap
Date: October 2002
Contact: Linux Memory Management list <linux-mm@kvack.org>
Description:
The node's cpumap.

What: /sys/devices/system/node/nodeX/cpulist
Date: October 2002
Contact: Linux Memory Management list <linux-mm@kvack.org>
Description:
The CPUs associated to the node.

What: /sys/devices/system/node/nodeX/meminfo
Date: October 2002
Contact: Linux Memory Management list <linux-mm@kvack.org>
Description:
Provides information about the node's distribution and memory
utilization. Similar to /proc/meminfo, see Documentation/filesystems/proc.txt

What: /sys/devices/system/node/nodeX/numastat
Date: October 2002
Contact: Linux Memory Management list <linux-mm@kvack.org>
Description:
The node's hit/miss statistics, in units of pages.
See Documentation/numastat.txt

What: /sys/devices/system/node/nodeX/distance
Date: October 2002
Contact: Linux Memory Management list <linux-mm@kvack.org>
Description:
Distance between the node and all the other nodes
in the system.

What: /sys/devices/system/node/nodeX/vmstat
Date: October 2002
Contact: Linux Memory Management list <linux-mm@kvack.org>
Description:
The node's zoned virtual memory statistics.
This is a superset of numastat.

What: /sys/devices/system/node/nodeX/compact
Date: February 2010
Contact: Mel Gorman <mel@csn.ul.ie>
Description:
When this file is written to, all memory within that node
will be compacted. When it completes, memory will be freed
into blocks which have as many contiguous pages as possible

What: /sys/devices/system/node/nodeX/scan_unevictable_pages
Date: October 2008
Contact: Lee Schermerhorn <lee.schermerhorn@hp.com>
Description:
When set, it triggers scanning the node's unevictable lists
and move any pages that have become evictable onto the respective
zone's inactive list. See mm/vmscan.c

What: /sys/devices/system/node/nodeX/hugepages/hugepages-<size>/
Date: December 2009
Contact: Lee Schermerhorn <lee.schermerhorn@hp.com>
Description:
The node's huge page size control/query attributes.
See Documentation/vm/hugetlbpage.txt
66 changes: 65 additions & 1 deletion Documentation/cgroups/memory.txt
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,11 @@ Brief summary of control files.
memory.oom_control # set/show oom controls.
memory.numa_stat # show the number of memory usage per numa node

memory.kmem.limit_in_bytes # set/show hard limit for kernel memory
memory.kmem.usage_in_bytes # show current kernel memory allocation
memory.kmem.failcnt # show the number of kernel memory usage hits limits
memory.kmem.max_usage_in_bytes # show max kernel memory usage recorded

memory.kmem.tcp.limit_in_bytes # set/show hard limit for tcp buf memory
memory.kmem.tcp.usage_in_bytes # show current tcp buf memory allocation
memory.kmem.tcp.failcnt # show the number of tcp buf memory usage hits limits
Expand Down Expand Up @@ -268,20 +273,73 @@ the amount of kernel memory used by the system. Kernel memory is fundamentally
different than user memory, since it can't be swapped out, which makes it
possible to DoS the system by consuming too much of this precious resource.

Kernel memory won't be accounted at all until limit on a group is set. This
allows for existing setups to continue working without disruption. The limit
cannot be set if the cgroup have children, or if there are already tasks in the
cgroup. Attempting to set the limit under those conditions will return -EBUSY.
When use_hierarchy == 1 and a group is accounted, its children will
automatically be accounted regardless of their limit value.

After a group is first limited, it will be kept being accounted until it
is removed. The memory limitation itself, can of course be removed by writing
-1 to memory.kmem.limit_in_bytes. In this case, kmem will be accounted, but not
limited.

Kernel memory limits are not imposed for the root cgroup. Usage for the root
cgroup may or may not be accounted.
cgroup may or may not be accounted. The memory used is accumulated into
memory.kmem.usage_in_bytes, or in a separate counter when it makes sense.
(currently only for tcp).
The main "kmem" counter is fed into the main counter, so kmem charges will
also be visible from the user counter.

Currently no soft limit is implemented for kernel memory. It is future work
to trigger slab reclaim when those limits are reached.

2.7.1 Current Kernel Memory resources accounted

* stack pages: every process consumes some stack pages. By accounting into
kernel memory, we prevent new processes from being created when the kernel
memory usage is too high.

* slab pages: pages allocated by the SLAB or SLUB allocator are tracked. A copy
of each kmem_cache is created everytime the cache is touched by the first time
from inside the memcg. The creation is done lazily, so some objects can still be
skipped while the cache is being created. All objects in a slab page should
belong to the same memcg. This only fails to hold when a task is migrated to a
different memcg during the page allocation by the cache.

* sockets memory pressure: some sockets protocols have memory pressure
thresholds. The Memory Controller allows them to be controlled individually
per cgroup, instead of globally.

* tcp memory pressure: sockets memory pressure for the tcp protocol.

2.7.3 Common use cases

Because the "kmem" counter is fed to the main user counter, kernel memory can
never be limited completely independently of user memory. Say "U" is the user
limit, and "K" the kernel limit. There are three possible ways limits can be
set:

U != 0, K = unlimited:
This is the standard memcg limitation mechanism already present before kmem
accounting. Kernel memory is completely ignored.

U != 0, K < U:
Kernel memory is a subset of the user memory. This setup is useful in
deployments where the total amount of memory per-cgroup is overcommited.
Overcommiting kernel memory limits is definitely not recommended, since the
box can still run out of non-reclaimable memory.
In this case, the admin could set up K so that the sum of all groups is
never greater than the total memory, and freely set U at the cost of his
QoS.

U != 0, K >= U:
Since kmem charges will also be fed to the user counter and reclaim will be
triggered for the cgroup for both kinds of memory. This setup gives the
admin a unified view of memory, and it is also useful for people who just
want to track kernel memory usage.

3. User Interface

0. Configuration
Expand All @@ -290,6 +348,7 @@ a. Enable CONFIG_CGROUPS
b. Enable CONFIG_RESOURCE_COUNTERS
c. Enable CONFIG_MEMCG
d. Enable CONFIG_MEMCG_SWAP (to use swap extension)
d. Enable CONFIG_MEMCG_KMEM (to use kmem extension)

1. Prepare the cgroups (see cgroups.txt, Why are cgroups needed?)
# mount -t tmpfs none /sys/fs/cgroup
Expand Down Expand Up @@ -406,6 +465,11 @@ About use_hierarchy, see Section 6.
Because rmdir() moves all pages to parent, some out-of-use page caches can be
moved to the parent. If you want to avoid that, force_empty will be useful.

Also, note that when memory.kmem.limit_in_bytes is set the charges due to
kernel pages will still be seen. This is not considered a failure and the
write will still return success. In this case, it is expected that
memory.kmem.usage_in_bytes == memory.usage_in_bytes.

About use_hierarchy, see Section 6.

5.2 stat file
Expand Down
7 changes: 4 additions & 3 deletions Documentation/cgroups/resource_counter.txt
Original file line number Diff line number Diff line change
Expand Up @@ -83,16 +83,17 @@ to work with it.
res_counter->lock internally (it must be called with res_counter->lock
held). The force parameter indicates whether we can bypass the limit.

e. void res_counter_uncharge[_locked]
e. u64 res_counter_uncharge[_locked]
(struct res_counter *rc, unsigned long val)

When a resource is released (freed) it should be de-accounted
from the resource counter it was accounted to. This is called
"uncharging".
"uncharging". The return value of this function indicate the amount
of charges still present in the counter.

The _locked routines imply that the res_counter->lock is taken.

f. void res_counter_uncharge_until
f. u64 res_counter_uncharge_until
(struct res_counter *rc, struct res_counter *top,
unsinged long val)

Expand Down
39 changes: 33 additions & 6 deletions arch/cris/include/asm/io.h
Original file line number Diff line number Diff line change
Expand Up @@ -133,12 +133,39 @@ static inline void writel(unsigned int b, volatile void __iomem *addr)
#define insb(port,addr,count) (cris_iops ? cris_iops->read_io(port,addr,1,count) : 0)
#define insw(port,addr,count) (cris_iops ? cris_iops->read_io(port,addr,2,count) : 0)
#define insl(port,addr,count) (cris_iops ? cris_iops->read_io(port,addr,4,count) : 0)
#define outb(data,port) if (cris_iops) cris_iops->write_io(port,(void*)(unsigned)data,1,1)
#define outw(data,port) if (cris_iops) cris_iops->write_io(port,(void*)(unsigned)data,2,1)
#define outl(data,port) if (cris_iops) cris_iops->write_io(port,(void*)(unsigned)data,4,1)
#define outsb(port,addr,count) if(cris_iops) cris_iops->write_io(port,(void*)addr,1,count)
#define outsw(port,addr,count) if(cris_iops) cris_iops->write_io(port,(void*)addr,2,count)
#define outsl(port,addr,count) if(cris_iops) cris_iops->write_io(port,(void*)addr,3,count)
static inline void outb(unsigned char data, unsigned int port)
{
if (cris_iops)
cris_iops->write_io(port, (void *) &data, 1, 1);
}
static inline void outw(unsigned short data, unsigned int port)
{
if (cris_iops)
cris_iops->write_io(port, (void *) &data, 2, 1);
}
static inline void outl(unsigned int data, unsigned int port)
{
if (cris_iops)
cris_iops->write_io(port, (void *) &data, 4, 1);
}
static inline void outsb(unsigned int port, const void *addr,
unsigned long count)
{
if (cris_iops)
cris_iops->write_io(port, (void *)addr, 1, count);
}
static inline void outsw(unsigned int port, const void *addr,
unsigned long count)
{
if (cris_iops)
cris_iops->write_io(port, (void *)addr, 2, count);
}
static inline void outsl(unsigned int port, const void *addr,
unsigned long count)
{
if (cris_iops)
cris_iops->write_io(port, (void *)addr, 4, count);
}

/*
* Convert a physical pointer to a virtual kernel pointer for /dev/mem
Expand Down
1 change: 1 addition & 0 deletions arch/h8300/Kconfig
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@ config H8300
default y
select HAVE_IDE
select HAVE_GENERIC_HARDIRQS
select GENERIC_ATOMIC64
select HAVE_UID16
select ARCH_WANT_IPC_PARSE_VERSION
select GENERIC_IRQ_SHOW
Expand Down
67 changes: 57 additions & 10 deletions arch/x86/platform/iris/iris.c
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@

#include <linux/moduleparam.h>
#include <linux/module.h>
#include <linux/platform_device.h>
#include <linux/kernel.h>
#include <linux/errno.h>
#include <linux/delay.h>
Expand Down Expand Up @@ -62,29 +63,75 @@ static void iris_power_off(void)
* by reading its input port and seeing whether the read value is
* meaningful.
*/
static int iris_init(void)
static int iris_probe(struct platform_device *pdev)
{
unsigned char status;
if (force != 1) {
printk(KERN_ERR "The force parameter has not been set to 1 so the Iris poweroff handler will not be installed.\n");
return -ENODEV;
}
status = inb(IRIS_GIO_INPUT);
unsigned char status = inb(IRIS_GIO_INPUT);
if (status == IRIS_GIO_NODEV) {
printk(KERN_ERR "This machine does not seem to be an Iris. Power_off handler not installed.\n");
printk(KERN_ERR "This machine does not seem to be an Iris. "
"Power off handler not installed.\n");
return -ENODEV;
}
old_pm_power_off = pm_power_off;
pm_power_off = &iris_power_off;
printk(KERN_INFO "Iris power_off handler installed.\n");

return 0;
}

static void iris_exit(void)
static int iris_remove(struct platform_device *pdev)
{
pm_power_off = old_pm_power_off;
printk(KERN_INFO "Iris power_off handler uninstalled.\n");
return 0;
}

static struct platform_driver iris_driver = {
.driver = {
.name = "iris",
.owner = THIS_MODULE,
},
.probe = iris_probe,
.remove = iris_remove,
};

static struct resource iris_resources[] = {
{
.start = IRIS_GIO_BASE,
.end = IRIS_GIO_OUTPUT,
.flags = IORESOURCE_IO,
.name = "address"
}
};

static struct platform_device *iris_device;

static int iris_init(void)
{
int ret;
if (force != 1) {
printk(KERN_ERR "The force parameter has not been set to 1."
" The Iris poweroff handler will not be installed.\n");
return -ENODEV;
}
ret = platform_driver_register(&iris_driver);
if (ret < 0) {
printk(KERN_ERR "Failed to register iris platform driver: %d\n",
ret);
return ret;
}
iris_device = platform_device_register_simple("iris", (-1),
iris_resources, ARRAY_SIZE(iris_resources));
if (IS_ERR(iris_device)) {
printk(KERN_ERR "Failed to register iris platform device\n");
platform_driver_unregister(&iris_driver);
return PTR_ERR(iris_device);
}
return 0;
}

static void iris_exit(void)
{
platform_device_unregister(iris_device);
platform_driver_unregister(&iris_driver);
}

module_init(iris_init);
Expand Down
1 change: 1 addition & 0 deletions drivers/message/fusion/mptscsih.c
Original file line number Diff line number Diff line change
Expand Up @@ -792,6 +792,7 @@ mptscsih_io_done(MPT_ADAPTER *ioc, MPT_FRAME_HDR *mf, MPT_FRAME_HDR *mr)
* than an unsolicited DID_ABORT.
*/
sc->result = DID_RESET << 16;
break;

case MPI_IOCSTATUS_SCSI_EXT_TERMINATED: /* 0x004C */
if (ioc->bus_type == FC)
Expand Down
Loading

0 comments on commit 673ab87

Please sign in to comment.