Skip to content

Commit 201a154

Browse files
committed
FS-Cache: Handle pages pending storage that get evicted under OOM conditions
Handle netfs pages that the vmscan algorithm wants to evict from the pagecache under OOM conditions, but that are waiting for write to the cache. Under these conditions, vmscan calls the releasepage() function of the netfs, asking if a page can be discarded. The problem is typified by the following trace of a stuck process: kslowd005 D 0000000000000000 0 4253 2 0x00000080 ffff88001b14f370 0000000000000046 ffff880020d0d000 0000000000000007 0000000000000006 0000000000000001 ffff88001b14ffd8 ffff880020d0d2a8 000000000000ddf0 00000000000118c0 00000000000118c0 ffff880020d0d2a8 Call Trace: [<ffffffffa00782d8>] __fscache_wait_on_page_write+0x8b/0xa7 [fscache] [<ffffffff8104c0f1>] ? autoremove_wake_function+0x0/0x34 [<ffffffffa0078240>] ? __fscache_check_page_write+0x63/0x70 [fscache] [<ffffffffa00b671d>] nfs_fscache_release_page+0x4e/0xc4 [nfs] [<ffffffffa00927f0>] nfs_release_page+0x3c/0x41 [nfs] [<ffffffff810885d3>] try_to_release_page+0x32/0x3b [<ffffffff81093203>] shrink_page_list+0x316/0x4ac [<ffffffff8109372b>] shrink_inactive_list+0x392/0x67c [<ffffffff813532fa>] ? __mutex_unlock_slowpath+0x100/0x10b [<ffffffff81058df0>] ? trace_hardirqs_on_caller+0x10c/0x130 [<ffffffff8135330e>] ? mutex_unlock+0x9/0xb [<ffffffff81093aa2>] shrink_list+0x8d/0x8f [<ffffffff81093d1c>] shrink_zone+0x278/0x33c [<ffffffff81052d6c>] ? ktime_get_ts+0xad/0xba [<ffffffff81094b13>] try_to_free_pages+0x22e/0x392 [<ffffffff81091e24>] ? isolate_pages_global+0x0/0x212 [<ffffffff8108e743>] __alloc_pages_nodemask+0x3dc/0x5cf [<ffffffff81089529>] grab_cache_page_write_begin+0x65/0xaa [<ffffffff8110f8c0>] ext3_write_begin+0x78/0x1eb [<ffffffff81089ec5>] generic_file_buffered_write+0x109/0x28c [<ffffffff8103cb69>] ? current_fs_time+0x22/0x29 [<ffffffff8108a509>] __generic_file_aio_write+0x350/0x385 [<ffffffff8108a588>] ? generic_file_aio_write+0x4a/0xae [<ffffffff8108a59e>] generic_file_aio_write+0x60/0xae [<ffffffff810b2e82>] do_sync_write+0xe3/0x120 [<ffffffff8104c0f1>] ? autoremove_wake_function+0x0/0x34 [<ffffffff810b18e1>] ? __dentry_open+0x1a5/0x2b8 [<ffffffff810b1a76>] ? dentry_open+0x82/0x89 [<ffffffffa00e693c>] cachefiles_write_page+0x298/0x335 [cachefiles] [<ffffffffa0077147>] fscache_write_op+0x178/0x2c2 [fscache] [<ffffffffa0075656>] fscache_op_execute+0x7a/0xd1 [fscache] [<ffffffff81082093>] slow_work_execute+0x18f/0x2d1 [<ffffffff8108239a>] slow_work_thread+0x1c5/0x308 [<ffffffff8104c0f1>] ? autoremove_wake_function+0x0/0x34 [<ffffffff810821d5>] ? slow_work_thread+0x0/0x308 [<ffffffff8104be91>] kthread+0x7a/0x82 [<ffffffff8100beda>] child_rip+0xa/0x20 [<ffffffff8100b87c>] ? restore_args+0x0/0x30 [<ffffffff8102ef83>] ? tg_shares_up+0x171/0x227 [<ffffffff8104be17>] ? kthread+0x0/0x82 [<ffffffff8100bed0>] ? child_rip+0x0/0x20 In the above backtrace, the following is happening: (1) A page storage operation is being executed by a slow-work thread (fscache_write_op()). (2) FS-Cache farms the operation out to the cache to perform (cachefiles_write_page()). (3) CacheFiles is then calling Ext3 to perform the actual write, using Ext3's standard write (do_sync_write()) under KERNEL_DS directly from the netfs page. (4) However, for Ext3 to perform the write, it must allocate some memory, in particular, it must allocate at least one page cache page into which it can copy the data from the netfs page. (5) Under OOM conditions, the memory allocator can't immediately come up with a page, so it uses vmscan to find something to discard (try_to_free_pages()). (6) vmscan finds a clean netfs page it might be able to discard (possibly the one it's trying to write out). (7) The netfs is called to throw the page away (nfs_release_page()) - but it's called with __GFP_WAIT, so the netfs decides to wait for the store to complete (__fscache_wait_on_page_write()). (8) This blocks a slow-work processing thread - possibly against itself. The system ends up stuck because it can't write out any netfs pages to the cache without allocating more memory. To avoid this, we make FS-Cache cancel some writes that aren't in the middle of actually being performed. This means that some data won't make it into the cache this time. To support this, a new FS-Cache function is added fscache_maybe_release_page() that replaces what the netfs releasepage() functions used to do with respect to the cache. The decisions fscache_maybe_release_page() makes are counted and displayed through /proc/fs/fscache/stats on a line labelled "VmScan". There are four counters provided: "nos=N" - pages that weren't pending storage; "gon=N" - pages that were pending storage when we first looked, but weren't by the time we got the object lock; "bsy=N" - pages that we ignored as they were actively being written when we looked; and "can=N" - pages that we cancelled the storage of. What I'd really like to do is alter the behaviour of the cancellation heuristics, depending on how necessary it is to expel pages. If there are plenty of other pages that aren't waiting to be written to the cache that could be ejected first, then it would be nice to hold up on immediate cancellation of cache writes - but I don't see a way of doing that. Signed-off-by: David Howells <dhowells@redhat.com>
1 parent e3d4d28 commit 201a154

File tree

10 files changed

+152
-35
lines changed

10 files changed

+152
-35
lines changed

Documentation/filesystems/caching/fscache.txt

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -272,6 +272,10 @@ proc files.
272272
pgs=N Number of pages given store req processing time
273273
rxd=N Number of store reqs deleted from tracking tree
274274
olm=N Number of store reqs over store limit
275+
VmScan nos=N Number of release reqs against pages with no pending store
276+
gon=N Number of release reqs against pages stored by time lock granted
277+
bsy=N Number of release reqs ignored due to in-progress store
278+
can=N Number of page stores cancelled due to release req
275279
Ops pend=N Number of times async ops added to pending queues
276280
run=N Number of times async ops given CPU time
277281
enq=N Number of times async ops queued for processing

Documentation/filesystems/caching/netfs-api.txt

Lines changed: 20 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -641,7 +641,7 @@ data file must be retired (see the relinquish cookie function below).
641641

642642
Furthermore, note that this does not cancel the asynchronous read or write
643643
operation started by the read/alloc and write functions, so the page
644-
invalidation and release functions must use:
644+
invalidation functions must use:
645645

646646
bool fscache_check_page_write(struct fscache_cookie *cookie,
647647
struct page *page);
@@ -654,6 +654,25 @@ to see if a page is being written to the cache, and:
654654
to wait for it to finish if it is.
655655

656656

657+
When releasepage() is being implemented, a special FS-Cache function exists to
658+
manage the heuristics of coping with vmscan trying to eject pages, which may
659+
conflict with the cache trying to write pages to the cache (which may itself
660+
need to allocate memory):
661+
662+
bool fscache_maybe_release_page(struct fscache_cookie *cookie,
663+
struct page *page,
664+
gfp_t gfp);
665+
666+
This takes the netfs cookie, and the page and gfp arguments as supplied to
667+
releasepage(). It will return false if the page cannot be released yet for
668+
some reason and if it returns true, the page has been uncached and can now be
669+
released.
670+
671+
To make a page available for release, this function may wait for an outstanding
672+
storage request to complete, or it may attempt to cancel the storage request -
673+
in which case the page will not be stored in the cache this time.
674+
675+
657676
==========================
658677
INDEX AND DATA FILE UPDATE
659678
==========================

fs/9p/cache.c

Lines changed: 1 addition & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -343,18 +343,7 @@ int __v9fs_fscache_release_page(struct page *page, gfp_t gfp)
343343

344344
BUG_ON(!vcookie->fscache);
345345

346-
if (PageFsCache(page)) {
347-
if (fscache_check_page_write(vcookie->fscache, page)) {
348-
if (!(gfp & __GFP_WAIT))
349-
return 0;
350-
fscache_wait_on_page_write(vcookie->fscache, page);
351-
}
352-
353-
fscache_uncache_page(vcookie->fscache, page);
354-
ClearPageFsCache(page);
355-
}
356-
357-
return 1;
346+
return fscache_maybe_release_page(vnode->cache, page, gfp);
358347
}
359348

360349
void __v9fs_fscache_invalidate_page(struct page *page)
@@ -368,7 +357,6 @@ void __v9fs_fscache_invalidate_page(struct page *page)
368357
fscache_wait_on_page_write(vcookie->fscache, page);
369358
BUG_ON(!PageLocked(page));
370359
fscache_uncache_page(vcookie->fscache, page);
371-
ClearPageFsCache(page);
372360
}
373361
}
374362

fs/afs/file.c

Lines changed: 3 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -315,7 +315,6 @@ static void afs_invalidatepage(struct page *page, unsigned long offset)
315315
struct afs_vnode *vnode = AFS_FS_I(page->mapping->host);
316316
fscache_wait_on_page_write(vnode->cache, page);
317317
fscache_uncache_page(vnode->cache, page);
318-
ClearPageFsCache(page);
319318
}
320319
#endif
321320

@@ -349,17 +348,9 @@ static int afs_releasepage(struct page *page, gfp_t gfp_flags)
349348
/* deny if page is being written to the cache and the caller hasn't
350349
* elected to wait */
351350
#ifdef CONFIG_AFS_FSCACHE
352-
if (PageFsCache(page)) {
353-
if (fscache_check_page_write(vnode->cache, page)) {
354-
if (!(gfp_flags & __GFP_WAIT)) {
355-
_leave(" = F [cache busy]");
356-
return 0;
357-
}
358-
fscache_wait_on_page_write(vnode->cache, page);
359-
}
360-
361-
fscache_uncache_page(vnode->cache, page);
362-
ClearPageFsCache(page);
351+
if (!fscache_maybe_release_page(vnode->cache, page, gfp_flags)) {
352+
_leave(" = F [cache busy]");
353+
return 0;
363354
}
364355
#endif
365356

fs/fscache/internal.h

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -180,6 +180,11 @@ extern atomic_t fscache_n_store_pages;
180180
extern atomic_t fscache_n_store_radix_deletes;
181181
extern atomic_t fscache_n_store_pages_over_limit;
182182

183+
extern atomic_t fscache_n_store_vmscan_not_storing;
184+
extern atomic_t fscache_n_store_vmscan_gone;
185+
extern atomic_t fscache_n_store_vmscan_busy;
186+
extern atomic_t fscache_n_store_vmscan_cancelled;
187+
183188
extern atomic_t fscache_n_marks;
184189
extern atomic_t fscache_n_uncaches;
185190

fs/fscache/page.c

Lines changed: 77 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -42,6 +42,75 @@ void __fscache_wait_on_page_write(struct fscache_cookie *cookie, struct page *pa
4242
}
4343
EXPORT_SYMBOL(__fscache_wait_on_page_write);
4444

45+
/*
46+
* decide whether a page can be released, possibly by cancelling a store to it
47+
* - we're allowed to sleep if __GFP_WAIT is flagged
48+
*/
49+
bool __fscache_maybe_release_page(struct fscache_cookie *cookie,
50+
struct page *page,
51+
gfp_t gfp)
52+
{
53+
struct page *xpage;
54+
void *val;
55+
56+
_enter("%p,%p,%x", cookie, page, gfp);
57+
58+
rcu_read_lock();
59+
val = radix_tree_lookup(&cookie->stores, page->index);
60+
if (!val) {
61+
rcu_read_unlock();
62+
fscache_stat(&fscache_n_store_vmscan_not_storing);
63+
__fscache_uncache_page(cookie, page);
64+
return true;
65+
}
66+
67+
/* see if the page is actually undergoing storage - if so we can't get
68+
* rid of it till the cache has finished with it */
69+
if (radix_tree_tag_get(&cookie->stores, page->index,
70+
FSCACHE_COOKIE_STORING_TAG)) {
71+
rcu_read_unlock();
72+
goto page_busy;
73+
}
74+
75+
/* the page is pending storage, so we attempt to cancel the store and
76+
* discard the store request so that the page can be reclaimed */
77+
spin_lock(&cookie->stores_lock);
78+
rcu_read_unlock();
79+
80+
if (radix_tree_tag_get(&cookie->stores, page->index,
81+
FSCACHE_COOKIE_STORING_TAG)) {
82+
/* the page started to undergo storage whilst we were looking,
83+
* so now we can only wait or return */
84+
spin_unlock(&cookie->stores_lock);
85+
goto page_busy;
86+
}
87+
88+
xpage = radix_tree_delete(&cookie->stores, page->index);
89+
spin_unlock(&cookie->stores_lock);
90+
91+
if (xpage) {
92+
fscache_stat(&fscache_n_store_vmscan_cancelled);
93+
fscache_stat(&fscache_n_store_radix_deletes);
94+
ASSERTCMP(xpage, ==, page);
95+
} else {
96+
fscache_stat(&fscache_n_store_vmscan_gone);
97+
}
98+
99+
wake_up_bit(&cookie->flags, 0);
100+
if (xpage)
101+
page_cache_release(xpage);
102+
__fscache_uncache_page(cookie, page);
103+
return true;
104+
105+
page_busy:
106+
/* we might want to wait here, but that could deadlock the allocator as
107+
* the slow-work threads writing to the cache may all end up sleeping
108+
* on memory allocation */
109+
fscache_stat(&fscache_n_store_vmscan_busy);
110+
return false;
111+
}
112+
EXPORT_SYMBOL(__fscache_maybe_release_page);
113+
45114
/*
46115
* note that a page has finished being written to the cache
47116
*/
@@ -57,6 +126,8 @@ static void fscache_end_page_write(struct fscache_object *object,
57126
/* delete the page from the tree if it is now no longer
58127
* pending */
59128
spin_lock(&cookie->stores_lock);
129+
radix_tree_tag_clear(&cookie->stores, page->index,
130+
FSCACHE_COOKIE_STORING_TAG);
60131
if (!radix_tree_tag_get(&cookie->stores, page->index,
61132
FSCACHE_COOKIE_PENDING_TAG)) {
62133
fscache_stat(&fscache_n_store_radix_deletes);
@@ -640,8 +711,12 @@ static void fscache_write_op(struct fscache_operation *_op)
640711
goto superseded;
641712
}
642713

643-
radix_tree_tag_clear(&cookie->stores, page->index,
644-
FSCACHE_COOKIE_PENDING_TAG);
714+
if (page) {
715+
radix_tree_tag_set(&cookie->stores, page->index,
716+
FSCACHE_COOKIE_STORING_TAG);
717+
radix_tree_tag_clear(&cookie->stores, page->index,
718+
FSCACHE_COOKIE_PENDING_TAG);
719+
}
645720

646721
spin_unlock(&cookie->stores_lock);
647722
spin_unlock(&object->lock);

fs/fscache/stats.c

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -63,6 +63,11 @@ atomic_t fscache_n_store_pages;
6363
atomic_t fscache_n_store_radix_deletes;
6464
atomic_t fscache_n_store_pages_over_limit;
6565

66+
atomic_t fscache_n_store_vmscan_not_storing;
67+
atomic_t fscache_n_store_vmscan_gone;
68+
atomic_t fscache_n_store_vmscan_busy;
69+
atomic_t fscache_n_store_vmscan_cancelled;
70+
6671
atomic_t fscache_n_marks;
6772
atomic_t fscache_n_uncaches;
6873

@@ -211,6 +216,12 @@ static int fscache_stats_show(struct seq_file *m, void *v)
211216
atomic_read(&fscache_n_store_radix_deletes),
212217
atomic_read(&fscache_n_store_pages_over_limit));
213218

219+
seq_printf(m, "VmScan : nos=%u gon=%u bsy=%u can=%u\n",
220+
atomic_read(&fscache_n_store_vmscan_not_storing),
221+
atomic_read(&fscache_n_store_vmscan_gone),
222+
atomic_read(&fscache_n_store_vmscan_busy),
223+
atomic_read(&fscache_n_store_vmscan_cancelled));
224+
214225
seq_printf(m, "Ops : pend=%u run=%u enq=%u can=%u rej=%u\n",
215226
atomic_read(&fscache_n_op_pend),
216227
atomic_read(&fscache_n_op_run),

fs/nfs/fscache.c

Lines changed: 3 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -359,17 +359,13 @@ int nfs_fscache_release_page(struct page *page, gfp_t gfp)
359359

360360
BUG_ON(!cookie);
361361

362-
if (fscache_check_page_write(cookie, page)) {
363-
if (!(gfp & __GFP_WAIT))
364-
return 0;
365-
fscache_wait_on_page_write(cookie, page);
366-
}
367-
368362
if (PageFsCache(page)) {
369363
dfprintk(FSCACHE, "NFS: fscache releasepage (0x%p/0x%p/0x%p)\n",
370364
cookie, page, nfsi);
371365

372-
fscache_uncache_page(cookie, page);
366+
if (!fscache_maybe_release_page(cookie, page, gfp))
367+
return 0;
368+
373369
nfs_add_fscache_stats(page->mapping->host,
374370
NFSIOS_FSCACHE_PAGES_UNCACHED, 1);
375371
}

include/linux/fscache-cache.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -317,6 +317,7 @@ struct fscache_cookie {
317317
void *netfs_data; /* back pointer to netfs */
318318
struct radix_tree_root stores; /* pages to be stored on this cookie */
319319
#define FSCACHE_COOKIE_PENDING_TAG 0 /* pages tag: pending write to cache */
320+
#define FSCACHE_COOKIE_STORING_TAG 1 /* pages tag: writing to cache */
320321

321322
unsigned long flags;
322323
#define FSCACHE_COOKIE_LOOKING_UP 0 /* T if non-index cookie being looked up still */

include/linux/fscache.h

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -202,6 +202,8 @@ extern int __fscache_write_page(struct fscache_cookie *, struct page *, gfp_t);
202202
extern void __fscache_uncache_page(struct fscache_cookie *, struct page *);
203203
extern bool __fscache_check_page_write(struct fscache_cookie *, struct page *);
204204
extern void __fscache_wait_on_page_write(struct fscache_cookie *, struct page *);
205+
extern bool __fscache_maybe_release_page(struct fscache_cookie *, struct page *,
206+
gfp_t);
205207

206208
/**
207209
* fscache_register_netfs - Register a filesystem as desiring caching services
@@ -615,4 +617,29 @@ void fscache_wait_on_page_write(struct fscache_cookie *cookie,
615617
__fscache_wait_on_page_write(cookie, page);
616618
}
617619

620+
/**
621+
* fscache_maybe_release_page - Consider releasing a page, cancelling a store
622+
* @cookie: The cookie representing the cache object
623+
* @page: The netfs page that is being cached.
624+
* @gfp: The gfp flags passed to releasepage()
625+
*
626+
* Consider releasing a page for the vmscan algorithm, on behalf of the netfs's
627+
* releasepage() call. A storage request on the page may cancelled if it is
628+
* not currently being processed.
629+
*
630+
* The function returns true if the page no longer has a storage request on it,
631+
* and false if a storage request is left in place. If true is returned, the
632+
* page will have been passed to fscache_uncache_page(). If false is returned
633+
* the page cannot be freed yet.
634+
*/
635+
static inline
636+
bool fscache_maybe_release_page(struct fscache_cookie *cookie,
637+
struct page *page,
638+
gfp_t gfp)
639+
{
640+
if (fscache_cookie_valid(cookie) && PageFsCache(page))
641+
return __fscache_maybe_release_page(cookie, page, gfp);
642+
return false;
643+
}
644+
618645
#endif /* _LINUX_FSCACHE_H */

0 commit comments

Comments
 (0)