From: Paul Jackson <pj@sgi.com>

Stringent enforcement of cpuset memory placement could cause the kernel to
panic on a GFP_ATOMIC (!wait) memory allocation, even though memory was
available elsewhere in the system.

Relax the cpuset constraint, on the last zone loop in
mm/page_alloc.c:__alloc_pages(), for ATOMIC requests.

Signed-off-by: Paul Jackson <pj@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
---

 25-akpm/Documentation/cpusets.txt |    8 ++++++++
 25-akpm/mm/page_alloc.c           |    5 ++++-
 2 files changed, 12 insertions(+), 1 deletion(-)

diff -puN Documentation/cpusets.txt~cpusets-special-case-gfp_atomic-allocs Documentation/cpusets.txt
--- 25/Documentation/cpusets.txt~cpusets-special-case-gfp_atomic-allocs	2005-03-30 18:01:48.000000000 -0800
+++ 25-akpm/Documentation/cpusets.txt	2005-03-30 18:01:48.000000000 -0800
@@ -261,6 +261,14 @@ that has had all its allowed CPUs or Mem
 code should reconfigure cpusets to only refer to online CPUs and Memory
 Nodes when using hotplug to add or remove such resources.
 
+There is a second exception to the above.  GFP_ATOMIC requests are
+kernel internal allocations that must be satisfied, immediately.
+The kernel may panic if such a requested page is not allocated.
+If such a request cannot be satisfied within the cpusets allowed
+memory, then we relax the cpuset boundaries and allow any page in
+the system to satisfy a GFP_ATOMIC request.  It is better to violate
+the cpuset constraints than it is to panic the kernel.
+
 To start a new job that is to be contained within a cpuset, the steps are:
 
  1) mkdir /dev/cpuset
diff -puN mm/page_alloc.c~cpusets-special-case-gfp_atomic-allocs mm/page_alloc.c
--- 25/mm/page_alloc.c~cpusets-special-case-gfp_atomic-allocs	2005-03-30 18:01:48.000000000 -0800
+++ 25-akpm/mm/page_alloc.c	2005-03-30 18:01:48.000000000 -0800
@@ -789,6 +789,9 @@ __alloc_pages(unsigned int __nocast gfp_
 	/*
 	 * Go through the zonelist again. Let __GFP_HIGH and allocations
 	 * coming from realtime tasks to go deeper into reserves
+	 *
+	 * This is the last chance, in general, before the goto nopage.
+	 * Ignore cpuset if GFP_ATOMIC (!wait) - better that than panic.
 	 */
 	for (i = 0; (z = zones[i]) != NULL; i++) {
 		if (!zone_watermark_ok(z, order, z->pages_min,
@@ -796,7 +799,7 @@ __alloc_pages(unsigned int __nocast gfp_
 				       gfp_mask & __GFP_HIGH))
 			continue;
 
-		if (!cpuset_zone_allowed(z))
+		if (wait && !cpuset_zone_allowed(z))
 			continue;
 
 		page = buffered_rmqueue(z, order, gfp_mask);
_