|
|
| Next: [News] Openshot a Big Step Forward for GNU/Linux .. |
| Author |
Message |
Frans Pop External

Since: May 04, 2006 Posts: 460
|
Posted: Sun Oct 18, 2009 11:10 pm Post subject: Re: [Bug #14141] order 2 page allocation failures in iwlagn [Login to view extended thread Info.] Archived from groups: linux>kernel (more info?) |
|
|
On Monday 19 October 2009, Pekka Enberg wrote:
> On Wednesday 14 October 2009, Frans Pop wrote:
> > On Thursday 15 October 2009, Mel Gorman wrote:
> > > Outside the range of commits suspected of causing problems was the
> > > following. It's extremely low probability
> > >
> > > Commit 8aa7e84 Fix congestion_wait() sync/async vs read/write
> > > confusion This patch alters the call to congestion_wait() in the
> > > page allocator. Frankly, I don't get the change but it might worth
> > > checking if replacing BLK_RW_ASYNC with WRITE on top of 2.6.31 makes
> > > any difference
> >
> > This is the real culprit. Mel: thanks very much for looking beyond the
> > area I identified. Your overview of mm changes was exactly what I
> > needed and really helped a lot during my later tests.
> >
> > This commit definitely causes most of the problems; confirmed by
> > reverting it on top of 2.6.31 (also requires reverting 373c0a7e, which
> > is a later build fix).
>
> Mel/Jens, any ideas why commit 8aa7e84 makes us run out of high order
> pages? Should we be using BLK_RW_SYNC in mm/page_alloc.c instead of
> BLK_RW_ASYNC?
I'm starting to think that this commit may not be directly related to high
order allocation failures. The fact that I'm seeing SKB allocation
failures earlier because of this commit could be just a side effect.
It could be that instead the main impact of this commit is on encrypted
file system and/or encrypted swap (kcryptd).
Besides mm the commit also touches dm-crypt (and nfs/write.c, but as I'm
only reading from NFS that's unlikely).
Reason for thinking this is that reverting it makes no difference for Karol
[1]. It will be interesting to see if it does make a difference for Sven
Geggus [2].
/me wonders if we'll ever get to the bottom of this...
[1] http://lkml.org/lkml/2009/10/18/138
[2] http://lkml.org/lkml/2009/10/17/113
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo.DeleteThis@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/ |
|
| Back to top |
|
 |
Jens Axboe External

Since: Oct 04, 2006 Posts: 673
|
Posted: Mon Oct 19, 2009 12:10 am Post subject: Re: [Bug #14141] order 2 page allocation failures in iwlagn [Login to view extended thread Info.] Archived from groups: per prev. post (more info?) |
|
|
On Mon, Oct 19 2009, Pekka Enberg wrote:
> (Adding Jens to CC.)
>
> On Wednesday 14 October 2009, Frans Pop wrote:
> > > > There still has not been a mm-change identified that makes
> > > > fragmentation significantly worse.
>
> On Mon, 2009-10-19 at 01:33 +0200, Frans Pop wrote:
> > > My bisection shows a very clear point, even if not an individual commit,
> > > in the 'akpm' merge where SKB errors suddenly become *much* more
> > > frequent and easy to trigger.
> > > I'm sorry to say this, but the fact that nothing has been identified yet
> > > is IMO the result of a lack of effort, not because there is no such
> > > change.
> >
> > I was wrong. It turns out that I was creating the variations in the test
> > results around the akpm merge myself by tiny changes in the way I ran the
> > tests. It took another round of about 30 compilations and tests purely in
> > this range to show that, but those same tests also made me aware of other
> > patterns I should look at.
> >
> > Until a few days ago I was concentrating on "do I see SKB allocation errors
> > or not". Since then I've also been looking more consciously at when they
> > happen, at disk access patterns and at desktop freeze patterns.
> >
> > I think I did mention before that this whole issue is rather subtle :-/
> > So, my apologies for finguering the wrong area for so long, but it looked
> > solid given the info available at the time.
> >
> > On Thursday 15 October 2009, Mel Gorman wrote:
> > > Outside the range of commits suspected of causing problems was the
> > > following. It's extremely low probability
> > >
> > > Commit 8aa7e84 Fix congestion_wait() sync/async vs read/write confusion
> > > This patch alters the call to congestion_wait() in the page
> > > allocator. Frankly, I don't get the change but it might worth
> > > checking if replacing BLK_RW_ASYNC with WRITE on top of 2.6.31
> > > makes any difference
> >
> > This is the real culprit. Mel: thanks very much for looking beyond the
> > area I identified. Your overview of mm changes was exactly what I needed
> > and really helped a lot during my later tests.
> >
> > This commit definitely causes most of the problems; confirmed by reverting
> > it on top of 2.6.31 (also requires reverting 373c0a7e, which is a later
> > build fix).
>
> Mel/Jens, any ideas why commit 8aa7e84 makes us run out of high order
> pages? Should we be using BLK_RW_SYNC in mm/page_alloc.c instead of
> BLK_RW_ASYNC?
No, I think that is definitely broken since the page freeing should be
using async writes. If the commit in question is making the difference
and the below does indeed fix it, I think that's primarliy due to timing
issues and the general brokeness of the congestion bits. With the below
change, you essentially guarenteed to be waiting 20ms every time and
it's quite likely that that is enough to change the picture.
So I'd like elsewhere for the real problem, it's not likely to be caused
by the sync vs async bits themselves.
--
Jens Axboe
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo.DeleteThis@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/ |
|
| Back to top |
|
 |
Tobi Oetiker External

Since: Oct 19, 2009 Posts: 1
|
Posted: Mon Oct 19, 2009 6:10 am Post subject: Re: [Bug #14141] order 2 page allocation failures (generic) [Login to view extended thread Info.] Archived from groups: per prev. post (more info?) |
|
|
Today Frans Pop wrote:
>
> I'm starting to think that this commit may not be directly related to high
> order allocation failures. The fact that I'm seeing SKB allocation
> failures earlier because of this commit could be just a side effect.
> It could be that instead the main impact of this commit is on encrypted
> file system and/or encrypted swap (kcryptd).
>
> Besides mm the commit also touches dm-crypt (and nfs/write.c, but as I'm
> only reading from NFS that's unlikely).
I have updated a fileserver to 2.6.31 today and I see page
allocation failures from several parts of the system ... mostly nfs though ... (it is a nfs server).
So I guess the problem must be quite generic:
Oct 19 07:10:02 johan kernel: [23565.684110] swapper: page allocation failure. order:5, mode:0x4020 [kern.warning]
Oct 19 07:10:02 johan kernel: [23565.684118] Pid: 0, comm: swapper Not tainted 2.6.31-02063104-generic #02063104 [kern.warning]
Oct 19 07:10:02 johan kernel: [23565.684121] Call Trace: [kern.warning]
Oct 19 07:10:02 johan kernel: [23565.684124] <IRQ> [<ffffffff810da5a2>] __alloc_pages_slowpath+0x3b2/0x4c0 [kern.warning]
Oct 19 08:59:16 johan kernel: [30120.685647] __ratelimit: 13 callbacks suppressed [kern.warning]
Oct 19 08:59:16 johan kernel: [30120.685654] nfsd: page allocation failure. order:5, mode:0x4020 [kern.warning]
Oct 19 08:59:16 johan kernel: [30120.685660] Pid: 6071, comm: nfsd Not tainted 2.6.31-02063104-generic #02063104 [kern.warning]
Oct 19 08:59:16 johan kernel: [30120.685663] Call Trace: [kern.warning]
Oct 19 08:59:16 johan kernel: [30120.685666] <IRQ> [<ffffffff810da5a2>] __alloc_pages_slowpath+0x3b2/0x4c0 [kern.warning]
Oct 19 09:36:31 johan kernel: [32355.708345] __ratelimit: 16 callbacks suppressed [kern.warning]
Oct 19 09:36:31 johan kernel: [32355.708352] nfsd: page allocation failure. order:5, mode:0x4020 [kern.warning]
Oct 19 09:36:31 johan kernel: [32355.708358] Pid: 6087, comm: nfsd Not tainted 2.6.31-02063104-generic #02063104 [kern.warning]
Oct 19 09:36:31 johan kernel: [32355.708361] Call Trace: [kern.warning]
Oct 19 09:36:31 johan kernel: [32355.708364] <IRQ> [<ffffffff810da5a2>] __alloc_pages_slowpath+0x3b2/0x4c0 [kern.warning]
Oct 19 10:52:01 johan kernel: [36885.358312] __ratelimit: 31 callbacks suppressed [kern.warning]
Oct 19 10:52:01 johan kernel: [36885.358319] nfsd: page allocation failure. order:5, mode:0x4020 [kern.warning]
Oct 19 10:52:01 johan kernel: [36885.358325] Pid: 6057, comm: nfsd Not tainted 2.6.31-02063104-generic #02063104 [kern.warning]
Oct 19 10:52:01 johan kernel: [36885.358327] Call Trace: [kern.warning]
Oct 19 10:52:01 johan kernel: [36885.358331] <IRQ> [<ffffffff810da5a2>] __alloc_pages_slowpath+0x3b2/0x4c0 [kern.warning]
Oct 19 11:12:01 johan kernel: [38085.163831] events/3: page allocation failure. order:5, mode:0x4020 [kern.warning]
Oct 19 11:12:01 johan kernel: [38085.163840] Pid: 18, comm: events/3 Not tainted 2.6.31-02063104-generic #02063104 [kern.warning]
Oct 19 11:12:01 johan kernel: [38085.163843] Call Trace: [kern.warning]
Oct 19 11:12:01 johan kernel: [38085.163846] <IRQ> [<ffffffff810da5a2>] __alloc_pages_slowpath+0x3b2/0x4c0 [kern.warning]
--
Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten, Switzerland
http://it.oetiker.ch tobi.DeleteThis@oetiker.ch ++41 62 775 9902 / sb: -9900
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo.DeleteThis@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/ |
|
| Back to top |
|
 |
Pekka Enberg External

Since: Jun 15, 2006 Posts: 70
|
Posted: Mon Oct 19, 2009 7:10 am Post subject: Re: [Bug #14141] order 2 page allocation failures (generic) [Login to view extended thread Info.] Archived from groups: per prev. post (more info?) |
|
|
On Mon, 2009-10-19 at 11:49 +0200, Tobi Oetiker wrote:
> Today Frans Pop wrote:
>
> >
> > I'm starting to think that this commit may not be directly related to high
> > order allocation failures. The fact that I'm seeing SKB allocation
> > failures earlier because of this commit could be just a side effect.
> > It could be that instead the main impact of this commit is on encrypted
> > file system and/or encrypted swap (kcryptd).
> >
> > Besides mm the commit also touches dm-crypt (and nfs/write.c, but as I'm
> > only reading from NFS that's unlikely).
>
> I have updated a fileserver to 2.6.31 today and I see page
> allocation failures from several parts of the system ... mostly nfs though ... (it is a nfs server).
> So I guess the problem must be quite generic:
Yup, it almost certainly is. Does this patch help?
http://lkml.org/lkml/2009/10/16/89
Frans, did you ever get around retesting with just the above patch
applied?
Pekka
> Oct 19 07:10:02 johan kernel: [23565.684110] swapper: page allocation failure. order:5, mode:0x4020 [kern.warning]
> Oct 19 07:10:02 johan kernel: [23565.684118] Pid: 0, comm: swapper Not tainted 2.6.31-02063104-generic #02063104 [kern.warning]
> Oct 19 07:10:02 johan kernel: [23565.684121] Call Trace: [kern.warning]
> Oct 19 07:10:02 johan kernel: [23565.684124] <IRQ> [<ffffffff810da5a2>] __alloc_pages_slowpath+0x3b2/0x4c0 [kern.warning]
>
>
> Oct 19 08:59:16 johan kernel: [30120.685647] __ratelimit: 13 callbacks suppressed [kern.warning]
> Oct 19 08:59:16 johan kernel: [30120.685654] nfsd: page allocation failure. order:5, mode:0x4020 [kern.warning]
> Oct 19 08:59:16 johan kernel: [30120.685660] Pid: 6071, comm: nfsd Not tainted 2.6.31-02063104-generic #02063104 [kern.warning]
> Oct 19 08:59:16 johan kernel: [30120.685663] Call Trace: [kern.warning]
> Oct 19 08:59:16 johan kernel: [30120.685666] <IRQ> [<ffffffff810da5a2>] __alloc_pages_slowpath+0x3b2/0x4c0 [kern.warning]
>
> Oct 19 09:36:31 johan kernel: [32355.708345] __ratelimit: 16 callbacks suppressed [kern.warning]
> Oct 19 09:36:31 johan kernel: [32355.708352] nfsd: page allocation failure. order:5, mode:0x4020 [kern.warning]
> Oct 19 09:36:31 johan kernel: [32355.708358] Pid: 6087, comm: nfsd Not tainted 2.6.31-02063104-generic #02063104 [kern.warning]
> Oct 19 09:36:31 johan kernel: [32355.708361] Call Trace: [kern.warning]
> Oct 19 09:36:31 johan kernel: [32355.708364] <IRQ> [<ffffffff810da5a2>] __alloc_pages_slowpath+0x3b2/0x4c0 [kern.warning]
>
> Oct 19 10:52:01 johan kernel: [36885.358312] __ratelimit: 31 callbacks suppressed [kern.warning]
> Oct 19 10:52:01 johan kernel: [36885.358319] nfsd: page allocation failure. order:5, mode:0x4020 [kern.warning]
> Oct 19 10:52:01 johan kernel: [36885.358325] Pid: 6057, comm: nfsd Not tainted 2.6.31-02063104-generic #02063104 [kern.warning]
> Oct 19 10:52:01 johan kernel: [36885.358327] Call Trace: [kern.warning]
> Oct 19 10:52:01 johan kernel: [36885.358331] <IRQ> [<ffffffff810da5a2>] __alloc_pages_slowpath+0x3b2/0x4c0 [kern.warning]
>
> Oct 19 11:12:01 johan kernel: [38085.163831] events/3: page allocation failure. order:5, mode:0x4020 [kern.warning]
> Oct 19 11:12:01 johan kernel: [38085.163840] Pid: 18, comm: events/3 Not tainted 2.6.31-02063104-generic #02063104 [kern.warning]
> Oct 19 11:12:01 johan kernel: [38085.163843] Call Trace: [kern.warning]
> Oct 19 11:12:01 johan kernel: [38085.163846] <IRQ> [<ffffffff810da5a2>] __alloc_pages_slowpath+0x3b2/0x4c0 [kern.warning]
>
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo DeleteThis @vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/ |
|
| Back to top |
|
 |
Mel Gorman External

Since: May 19, 2006 Posts: 253
|
Posted: Mon Oct 19, 2009 10:10 am Post subject: Re: [Bug #14141] order 2 page allocation failures (generic) [Login to view extended thread Info.] Archived from groups: per prev. post (more info?) |
|
|
On Mon, Oct 19, 2009 at 11:49:08AM +0200, Tobi Oetiker wrote:
> Today Frans Pop wrote:
>
> >
> > I'm starting to think that this commit may not be directly related to high
> > order allocation failures. The fact that I'm seeing SKB allocation
> > failures earlier because of this commit could be just a side effect.
> > It could be that instead the main impact of this commit is on encrypted
> > file system and/or encrypted swap (kcryptd).
> >
> > Besides mm the commit also touches dm-crypt (and nfs/write.c, but as I'm
> > only reading from NFS that's unlikely).
>
> I have updated a fileserver to 2.6.31 today and I see page
> allocation failures from several parts of the system ... mostly nfs though ... (it is a nfs server).
> So I guess the problem must be quite generic:
>
>
> Oct 19 07:10:02 johan kernel: [23565.684110] swapper: page allocation failure. order:5, mode:0x4020 [kern.warning]
> Oct 19 07:10:02 johan kernel: [23565.684118] Pid: 0, comm: swapper Not tainted 2.6.31-02063104-generic #02063104 [kern.warning]
> Oct 19 07:10:02 johan kernel: [23565.684121] Call Trace: [kern.warning]
> Oct 19 07:10:02 johan kernel: [23565.684124] <IRQ> [<ffffffff810da5a2>] __alloc_pages_slowpath+0x3b2/0x4c0 [kern.warning]
>
What's the rest of the stack trace? I'm wondering where a large number
of order-5 GFP_ATOMIC allocations are coming from. It seems different to
the e100 problem where there is one GFP_ATOMIC allocation while the
firmware is being loaded.
Thanks
>
> Oct 19 08:59:16 johan kernel: [30120.685647] __ratelimit: 13 callbacks suppressed [kern.warning]
> Oct 19 08:59:16 johan kernel: [30120.685654] nfsd: page allocation failure. order:5, mode:0x4020 [kern.warning]
> Oct 19 08:59:16 johan kernel: [30120.685660] Pid: 6071, comm: nfsd Not tainted 2.6.31-02063104-generic #02063104 [kern.warning]
> Oct 19 08:59:16 johan kernel: [30120.685663] Call Trace: [kern.warning]
> Oct 19 08:59:16 johan kernel: [30120.685666] <IRQ> [<ffffffff810da5a2>] __alloc_pages_slowpath+0x3b2/0x4c0 [kern.warning]
>
> Oct 19 09:36:31 johan kernel: [32355.708345] __ratelimit: 16 callbacks suppressed [kern.warning]
> Oct 19 09:36:31 johan kernel: [32355.708352] nfsd: page allocation failure. order:5, mode:0x4020 [kern.warning]
> Oct 19 09:36:31 johan kernel: [32355.708358] Pid: 6087, comm: nfsd Not tainted 2.6.31-02063104-generic #02063104 [kern.warning]
> Oct 19 09:36:31 johan kernel: [32355.708361] Call Trace: [kern.warning]
> Oct 19 09:36:31 johan kernel: [32355.708364] <IRQ> [<ffffffff810da5a2>] __alloc_pages_slowpath+0x3b2/0x4c0 [kern.warning]
>
> Oct 19 10:52:01 johan kernel: [36885.358312] __ratelimit: 31 callbacks suppressed [kern.warning]
> Oct 19 10:52:01 johan kernel: [36885.358319] nfsd: page allocation failure. order:5, mode:0x4020 [kern.warning]
> Oct 19 10:52:01 johan kernel: [36885.358325] Pid: 6057, comm: nfsd Not tainted 2.6.31-02063104-generic #02063104 [kern.warning]
> Oct 19 10:52:01 johan kernel: [36885.358327] Call Trace: [kern.warning]
> Oct 19 10:52:01 johan kernel: [36885.358331] <IRQ> [<ffffffff810da5a2>] __alloc_pages_slowpath+0x3b2/0x4c0 [kern.warning]
>
> Oct 19 11:12:01 johan kernel: [38085.163831] events/3: page allocation failure. order:5, mode:0x4020 [kern.warning]
> Oct 19 11:12:01 johan kernel: [38085.163840] Pid: 18, comm: events/3 Not tainted 2.6.31-02063104-generic #02063104 [kern.warning]
> Oct 19 11:12:01 johan kernel: [38085.163843] Call Trace: [kern.warning]
> Oct 19 11:12:01 johan kernel: [38085.163846] <IRQ> [<ffffffff810da5a2>] __alloc_pages_slowpath+0x3b2/0x4c0 [kern.warning]
>
>
>
> --
> Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten, Switzerland
> http://it.oetiker.ch tobi RemoveThis @oetiker.ch ++41 62 775 9902 / sb: -9900
>
--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo RemoveThis @vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/ |
|
| Back to top |
|
 |
Tobias Oetiker External

Since: Sep 15, 2009 Posts: 12
|
Posted: Mon Oct 19, 2009 10:10 am Post subject: Re: [Bug #14141] order 2 page allocation failures (generic) [Login to view extended thread Info.] Archived from groups: per prev. post (more info?) |
|
|
Hi Mel,
Today Mel Gorman wrote:
> On Mon, Oct 19, 2009 at 11:49:08AM +0200, Tobi Oetiker wrote:
> > Today Frans Pop wrote:
> >
> > >
> > > I'm starting to think that this commit may not be directly related to high
> > > order allocation failures. The fact that I'm seeing SKB allocation
> > > failures earlier because of this commit could be just a side effect.
> > > It could be that instead the main impact of this commit is on encrypted
> > > file system and/or encrypted swap (kcryptd).
> > >
> > > Besides mm the commit also touches dm-crypt (and nfs/write.c, but as I'm
> > > only reading from NFS that's unlikely).
> >
> > I have updated a fileserver to 2.6.31 today and I see page
> > allocation failures from several parts of the system ... mostly nfs though ... (it is a nfs server).
> > So I guess the problem must be quite generic:
> >
> >
> > Oct 19 07:10:02 johan kernel: [23565.684110] swapper: page allocation failure. order:5, mode:0x4020 [kern.warning]
> > Oct 19 07:10:02 johan kernel: [23565.684118] Pid: 0, comm: swapper Not tainted 2.6.31-02063104-generic #02063104 [kern.warning]
> > Oct 19 07:10:02 johan kernel: [23565.684121] Call Trace: [kern.warning]
> > Oct 19 07:10:02 johan kernel: [23565.684124] <IRQ> [<ffffffff810da5a2>] __alloc_pages_slowpath+0x3b2/0x4c0 [kern.warning]
> >
>
> What's the rest of the stack trace? I'm wondering where a large number
> of order-5 GFP_ATOMIC allocations are coming from. It seems different to
> the e100 problem where there is one GFP_ATOMIC allocation while the
> firmware is being loaded.
Oct 19 07:10:02 johan kernel: [23565.684110] swapper: page allocation failure. order:5, mode:0x4020 [kern.warning]
Oct 19 07:10:02 johan kernel: [23565.684118] Pid: 0, comm: swapper Not tainted 2.6.31-02063104-generic #02063104 [kern.warning]
Oct 19 07:10:02 johan kernel: [23565.684121] Call Trace: [kern.warning]
Oct 19 07:10:02 johan kernel: [23565.684124] <IRQ> [<ffffffff810da5a2>] __alloc_pages_slowpath+0x3b2/0x4c0 [kern.warning]
Oct 19 07:10:02 johan kernel: [23565.684157] [<ffffffff810da7e5>] __alloc_pages_nodemask+0x135/0x140 [kern.warning]
Oct 19 07:10:02 johan kernel: [23565.684164] [<ffffffff815065b4>] ? _spin_unlock_bh+0x14/0x20 [kern.warning]
Oct 19 07:10:02 johan kernel: [23565.684170] [<ffffffff8110b368>] kmalloc_large_node+0x68/0xc0 [kern.warning]
Oct 19 07:10:02 johan kernel: [23565.684175] [<ffffffff8110f15a>] __kmalloc_node_track_caller+0x11a/0x180 [kern.warning]
Oct 19 07:10:02 johan kernel: [23565.684181] [<ffffffff8140ffd2>] ? skb_copy+0x32/0xa0 [kern.warning]
Oct 19 07:10:02 johan kernel: [23565.684185] [<ffffffff8140d8b6>] __alloc_skb+0x76/0x180 [kern.warning]
Oct 19 07:10:02 johan kernel: [23565.684205] [<ffffffff8140ffd2>] skb_copy+0x32/0xa0 [kern.warning]
Oct 19 07:10:02 johan kernel: [23565.684221] [<ffffffffa050f33c>] vboxNetFltLinuxPacketHandler+0x5c/0xd0 [vboxnetflt] [kern.warning]
Oct 19 07:10:02 johan kernel: [23565.684227] [<ffffffff81416a6d>] dev_queue_xmit_nit+0x10d/0x170 [kern.warning]
Oct 19 07:10:02 johan kernel: [23565.684231] [<ffffffff81416f79>] dev_hard_start_xmit+0x189/0x1c0 [kern.warning]
Oct 19 07:10:02 johan kernel: [23565.684236] [<ffffffff8142f071>] __qdisc_run+0x1a1/0x230 [kern.warning]
Oct 19 07:10:02 johan kernel: [23565.684240] [<ffffffff81418a88>] dev_queue_xmit+0x238/0x310 [kern.warning]
Oct 19 07:10:02 johan kernel: [23565.684246] [<ffffffff8144864b>] ip_finish_output+0x11b/0x2f0 [kern.warning]
Oct 19 07:10:02 johan kernel: [23565.684250] [<ffffffff814488a9>] ip_output+0x89/0xd0 [kern.warning]
Oct 19 07:10:02 johan kernel: [23565.684254] [<ffffffff814478c0>] ip_local_out+0x20/0x30 [kern.warning]
Oct 19 07:10:02 johan kernel: [23565.684258] [<ffffffff814481ab>] ip_queue_xmit+0x22b/0x3f0 [kern.warning]
Oct 19 07:10:02 johan kernel: [23565.684264] [<ffffffff8145d5e5>] tcp_transmit_skb+0x345/0x4e0 [kern.warning]
Oct 19 07:10:02 johan kernel: [23565.684269] [<ffffffff8145eaf6>] tcp_write_xmit+0xb6/0x2e0 [kern.warning]
Oct 19 07:10:02 johan kernel: [23565.684273] [<ffffffff8145ed8b>] __tcp_push_pending_frames+0x2b/0xa0 [kern.warning]
Oct 19 07:10:02 johan kernel: [23565.684277] [<ffffffff8145b249>] tcp_rcv_established+0x459/0x6d0 [kern.warning]
Oct 19 07:10:02 johan kernel: [23565.684282] [<ffffffff814630bd>] tcp_v4_do_rcv+0x12d/0x140 [kern.warning]
Oct 19 07:10:02 johan kernel: [23565.684285] [<ffffffff8146365e>] tcp_v4_rcv+0x58e/0x7c0 [kern.warning]
Oct 19 07:10:02 johan kernel: [23565.684289] [<ffffffff8144276d>] ip_local_deliver_finish+0x11d/0x2b0 [kern.warning]
Oct 19 07:10:02 johan kernel: [23565.684293] [<ffffffff8144293b>] ip_local_deliver+0x3b/0x90 [kern.warning]
Oct 19 07:10:02 johan kernel: [23565.684297] [<ffffffff81442ad6>] ip_rcv_finish+0x146/0x420 [kern.warning]
Oct 19 07:10:02 johan kernel: [23565.684301] [<ffffffff8144304b>] ip_rcv+0x29b/0x370 [kern.warning]
Oct 19 07:10:04 johan kernel: [23565.684304] [<ffffffff81418f9a>] netif_receive_skb+0x38a/0x4d0 [kern.warning]
Oct 19 07:10:04 johan kernel: [23565.684308] [<ffffffff81419268>] napi_skb_finish+0x48/0x60 [kern.warning]
Oct 19 07:10:04 johan kernel: [23565.684311] [<ffffffff81419724>] napi_gro_receive+0x34/0x40 [kern.warning]
Oct 19 07:10:04 johan kernel: [23565.684330] [<ffffffffa006b623>] tg3_rx+0x373/0x4b0 [tg3] [kern.warning]
Oct 19 07:10:04 johan kernel: [23565.684339] [<ffffffffa006cbf0>] tg3_poll_work+0x70/0xf0 [tg3] [kern.warning]
Oct 19 07:10:04 johan kernel: [23565.684347] [<ffffffffa006ccae>] tg3_poll+0x3e/0xe0 [tg3] [kern.warning]
Oct 19 07:10:04 johan kernel: [23565.684350] [<ffffffff814198d2>] net_rx_action+0x102/0x210 [kern.warning]
Oct 19 07:10:04 johan kernel: [23565.684357] [<ffffffff81061d24>] __do_softirq+0xc4/0x1f0 [kern.warning]
Oct 19 07:10:04 johan kernel: [23565.684362] [<ffffffff8101314c>] call_softirq+0x1c/0x30 [kern.warning]
Oct 19 07:10:04 johan kernel: [23565.684365] [<ffffffff81014945>] do_softirq+0x55/0x90 [kern.warning]
Oct 19 07:10:04 johan kernel: [23565.684369] [<ffffffff8106116b>] irq_exit+0x7b/0x90 [kern.warning]
Oct 19 07:10:04 johan kernel: [23565.684372] [<ffffffff81013e93>] do_IRQ+0x73/0xe0 [kern.warning]
Oct 19 07:10:04 johan kernel: [23565.684378] [<ffffffff81012993>] ret_from_intr+0x0/0x11 [kern.warning]
Oct 19 07:10:04 johan kernel: [23565.684381] <EOI> [<ffffffff810318b6>] ? native_safe_halt+0x6/0x10 [kern.warning]
Oct 19 07:10:04 johan kernel: [23565.684391] [<ffffffff81019cd8>] ? default_idle+0x48/0xe0 [kern.warning]
Oct 19 07:10:04 johan kernel: [23565.684396] [<ffffffff8150929d>] ? __atomic_notifier_call_chain+0xd/0x10 [kern.warning]
Oct 19 07:10:04 johan kernel: [23565.684400] [<ffffffff815092b1>] ? atomic_notifier_call_chain+0x11/0x20 [kern.warning]
Oct 19 07:10:04 johan kernel: [23565.684404] [<ffffffff810107c8>] ? cpu_idle+0x98/0xe0 [kern.warning]
Oct 19 07:10:04 johan kernel: [23565.684410] [<ffffffff81500d95>] ? start_secondary+0x95/0xc0 [kern.warning]
if you need more, I can send you a whole bunch of them ...
cheers
tobi
--
Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten, Switzerland
http://it.oetiker.ch tobi RemoveThis @oetiker.ch ++41 62 775 9902 / sb: -9900
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo RemoveThis @vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/ |
|
| Back to top |
|
 |
Mel Gorman External

Since: May 19, 2006 Posts: 253
|
Posted: Mon Oct 19, 2009 11:10 am Post subject: Re: [Bug #14141] order 2 page allocation failures in iwlagn [Login to view extended thread Info.] Archived from groups: per prev. post (more info?) |
|
|
On Mon, Oct 19, 2009 at 01:33:29AM +0200, Frans Pop wrote:
> Another long mail, sorry.
>
> On Wednesday 14 October 2009, Frans Pop wrote:
> > > There still has not been a mm-change identified that makes
> > > fragmentation significantly worse.
> >
> > My bisection shows a very clear point, even if not an individual commit,
> > in the 'akpm' merge where SKB errors suddenly become *much* more
> > frequent and easy to trigger.
> > I'm sorry to say this, but the fact that nothing has been identified yet
> > is IMO the result of a lack of effort, not because there is no such
> > change.
>
> I was wrong. It turns out that I was creating the variations in the test
> results around the akpm merge myself by tiny changes in the way I ran the
> tests. It took another round of about 30 compilations and tests purely in
> this range to show that, but those same tests also made me aware of other
> patterns I should look at.
>
Once again, thanks for persisting with this for so long. That many tests
and searching is a miserable undertaking.
> Until a few days ago I was concentrating on "do I see SKB allocation errors
> or not". Since then I've also been looking more consciously at when they
> happen, at disk access patterns and at desktop freeze patterns.
>
> I think I did mention before that this whole issue is rather subtle :-/
Indeed
> So, my apologies for finguering the wrong area for so long, but it looked
> solid given the info available at the time.
>
> On Thursday 15 October 2009, Mel Gorman wrote:
> > Outside the range of commits suspected of causing problems was the
> > following. It's extremely low probability
> >
> > Commit 8aa7e84 Fix congestion_wait() sync/async vs read/write confusion
> > This patch alters the call to congestion_wait() in the page
> > allocator. Frankly, I don't get the change but it might worth
> > checking if replacing BLK_RW_ASYNC with WRITE on top of 2.6.31
> > makes any difference
>
> This is the real culprit. Mel: thanks very much for looking beyond the
> area I identified. Your overview of mm changes was exactly what I needed
> and really helped a lot during my later tests.
>
I'm surprised this made such a big difference which is why I described
it as "extremely low probability". It implies that the real problem isn't
fragmentation per-se but the timing of when pages get consumed.
Maybe what has really changed is how long direct reclaimers wait before trying
to allocate again. After the commit, if direct reclaimers are waiting longer
between direct reclaim attempts, it might mean that the GFP_KERNEL reclaimers
of high-order pages are doing less work before and hurting parallel GFP_ATOMIC
users. Jens, does this sound plausible?
> This commit definitely causes most of the problems; confirmed by reverting
> it on top of 2.6.31 (also requires reverting 373c0a7e, which is a later
> build fix).
>
> The rest of this mail gives details on my tests and how I reached the above
> conclusion.
>
> TEST BASELINE (2.6.30)
> ======================
> I mentioned in an earlier mail that I run three instances of gitk for my
> tests. Loading gitk seems to consist of 3 phases:
> 1) general initial scan of the repository (branches?)
> 2) reading commits: commit counter increases
> 3) reading references (including bisection good/bad points) and
> uncommitted changes
>
> Below times and comments per stage when the test is run with 2.6.30. As my
> test starts after a clean boot, buffers are mostly empty.
>
> 1st instance: 'gitk v2.6.29..master' (preparation)
> 1) ~20 seconds; user interface is mostly blank
> 2) ~5 seconds to read 35.000 commits; user interface is updated and counter
> increases steadily as they are read
> 3) ~10 seconds; "branch"/"follows"/"precedes" info and tags are filled
> in; fairly heavy disk activity
>
> 2st instance: 'gitk master' (preparation)
> 1) 0 seconds (because data is already buffered)
> 2) ~25 seconds to read 167500 commits; counter increases steadily
> 3) 1-2 seconds (because data is already buffered)
>
> 3st instance: 'gitk master' (the actual test)
> 1) 0 seconds because data is already buffered
> 2) ~55 seconds due to swapping overhead; minor music skip around commit
> 110.000; counter slower after 90.000, some short halts, but generally
> increases steadily; moderate disk activity
> 3) ~55-60 seconds; because buffers have been emptied data must by read
> again, with swapping; very heavy disk activity; fairly long music
> skip (15-20 seconds), but no SKB allocation errors
>
> So, the loading of the 3rd instance takes 1.5 minutes longer than the
> second because of the swapping. And phase 3) is most affected by it.
>
> AFTER WIRELESS CHANGE
> =====================
> After commit 4752c93c30 ("iwlcore: Allow skb allocation from tasklet") I
> start getting the SKB errors. They can be triggered reliably if the whole
> test is repeated 1 or 2 times, but generally not the first time the test
> is run.
It's up to the wireless driver maintainer what to do here, but it seems
like that patch needs to be reverted and thought about some more before
trying again.
>
> Or so I thought for a long time.
> It turns out that I will get SKB errors during the first run if I'm
> "sloppy" in the test execution. For example if I wait too long before
> switching from the last gitk instance to konsole where I have
> a 'tail -f /var/log/kern.log' running.
So the timing is critical of when the high-order atomic allocations
start kicking in.
> Another factor is the state of the repository: do I have master checked
> out, or an older branch, or am I in the middle of a bisection. This
> influences how data is read from the disk and thus the test results.
> A last factor may be the size of the kernel I'm using: my test/bisect
> kernel is significantly smaller than my regular kernel.
>
> If the test is run completely cleanly, I will not get SKB errors during the
> first run. Also, this change does not affect the timings of the test at
> all: the total load time of the 3rd instance is still ~1:55 and music
> skips happen in roughly the same places. The pattern of disk activity also
> remains unchanged.
>
> If I do *not* run the test cleanly, any SKB errors during the first test
> run will always be during phase 3), never during phase 2). This is what I
> saw during tests in the 'akpm' range, and explains the inconsistent
> results there.
>
> After discovering this I've made a copy of the git repo so that I always
> test using the exact same state and tightened my test procedure.
>
> AFTER congestion_wait CHANGE
> ============================
> If I test commit 9f2d8be, which is just before the congestion_wait()
> change, I still get the same pattern as described above. But when I test
> with 8aa7e84 ("Fix congestion_wait() sync/async vs read/write confusion"),
> things change dramatically when the 3rd gitk instance is started.
>
So, assuming this is a timing problem, this commit affects the timing of
when pages are consumed by processes doing direct reclaim.
> During the 2nd phase I see the first SKB allocation errors with a music
> skip between reading commits 95.000 and 110.000.
> About commit 115.000 there is a very long pause during which the counter
> does not increase, music stops and the desktop freezes completely. The
> first 30 seconds of that freeze there is only very low disk activity (which
> seems strange);
I'm just going to have to depend on Jens here. Jens, the congestion_wait() is
on BLK_RW_ASYNC after the commit. Reclaim usually writes pages asynchronously
but lumpy reclaim actually waits of pages to write out synchronously so
it's not always async.
Either way, reclaim is usually worried about writing pages but it would appear
after this change that a lot of read activity can also stall a process in
direct reclaim. What might be happening in Frans's particular case is that the
tasklet that allocates high-order pages for the RX buffers is getting stalled
by congestion caused by other processes doing reads from the filesystem.
While it makes sense from a congestion point of view to halt the IO, the
reclaim operations from direct reclaimers is getting delayed for long enough
to cause problems for GFP_ATOMIC.
Does this sound plausible to you? If so, what's the best way of
addressing this? Changing congestion_wait back to WRITE (assuming that
works for Frans)? Changing it to SYNC (again, assuming it actually
works) or a revert?
> the next 25 seconds there suddenly is very high disk
> activity during which things gradually unfreeze and more SKB errors are
> displayed. After that the commit counter runs up fairly steadily again.
>
> Phase 2) ends at ~1:45. Phase 3) (with more SKB errors) ends at ~2:05.
>
> So this change almost doubles the time needed for phase 2) and causes SKB
> allocation errors to occur during that phase. Also, before this commit the
> desktop freezes are much shorter and less severe. With this change the
> desktop is completely unusable for almost a minute during phase 2), with
> even the mouse pointer frozen solid.
> Note that phase 3) becomes shorter, but that the total time needed to load
> the 3rd instance increases by about 10-15 seconds.
>
> Note: -rc2 and -rc3 had broken NFS, so I had to cherry-pick 3 NFS commits
> from -rc4 on top of the commits I wanted to test.
>
> WITH congestion_wait CHANGE REVERTED
> ====================================
> I've done quite a few tests of 2.6.31 with 373c0a7e and 8aa7e847 reverted
> to confirm that's really the culprit. I've done this for .31-rc3, .31-rc4,
> .31-rc5, .31 and .31.1.
>
> In all cases the huge freeze in phase 2) is gone and the general behavior
> and timings are again as it was after the wireless change. During most
> tests I did not get any SKB allocation errors during phase 2) or phase 3).
>
> However with .31-rc5, .31 and .31.1 I have had some tests where I would see
> a few SKB allocation errors during phase 3) (which is somewhat likely),
> but also during phase 2). At this point I'm unsure whether this is just
> noise, or maybe a minor influence from some change merged after .31-rc4.
> Looking through the commits there are several mm/page allocation changes.
>
It could still be kswapd not being woken up often enough after direct
reclaimers. I took a look through the commits but none of the mm or
allocator changes struck me as likely candidates for making
fragmentation worse or altering the timing.
> For now I suggest ignoring this though as the impact (if any) is very minor
> and it is not reproducible reliably enough.
>
> Next I'll retest Mel's patches and also test Reinette's patches.
>
Of the two patches, only the kswapd one should have any significance. As
David pointed out, the second patch is essentially a no-op as it should
not have been possible to enter direct reclaim with ALLOC_NO_WATERMARKS
set.
--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo RemoveThis @vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/ |
|
| Back to top |
|
 |
Mel Gorman External

Since: May 19, 2006 Posts: 253
|
Posted: Mon Oct 19, 2009 11:10 am Post subject: Re: [Bug #14141] order 2 page allocation failures (generic) [Login to view extended thread Info.] Archived from groups: per prev. post (more info?) |
|
|
On Mon, Oct 19, 2009 at 04:01:45PM +0200, Karol Lewandowski wrote:
> On Mon, Oct 19, 2009 at 12:54:11PM +0300, Pekka Enberg wrote:
> > On Mon, 2009-10-19 at 11:49 +0200, Tobi Oetiker wrote:
> > > I have updated a fileserver to 2.6.31 today and I see page
> > > allocation failures from several parts of the system ... mostly nfs though ... (it is a nfs server).
> > > So I guess the problem must be quite generic:
> >
> > Yup, it almost certainly is. Does this patch help?
> >
> > http://lkml.org/lkml/2009/10/16/89
>
> This patch seems to help in some cases. Before applying this patch I
> was able to trigger alloc failures on different machine by booting
> kernel with "mem=256MB" and doing:
>
> $ gitk on-full-tree &
> # rmmod e100
> ... wait for few MBs in swap
> # modprobe e100; ifup --force ethX
>
> So here this patch helped -- with it, I was unable to trigger page
> allocation failures (testing was short, tough). However, as I said
> here[1], I applied both of Mel's patches (including this one) and that
> didn't help my orginal issue (failures after suspend).
>
> [1] http://lkml.org/lkml/2009/10/17/109
>
Can you test with my kswapd patch applied and commits 373c0a7e,8aa7e847
reverted please?
--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo.TakeThisOut@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/ |
|
| Back to top |
|
 |
Karol Lewandowski External

Since: Oct 02, 2009 Posts: 8
|
Posted: Mon Oct 19, 2009 11:10 am Post subject: Re: [Bug #14141] order 2 page allocation failures (generic) [Login to view extended thread Info.] Archived from groups: per prev. post (more info?) |
|
|
On Mon, Oct 19, 2009 at 12:54:11PM +0300, Pekka Enberg wrote:
> On Mon, 2009-10-19 at 11:49 +0200, Tobi Oetiker wrote:
> > I have updated a fileserver to 2.6.31 today and I see page
> > allocation failures from several parts of the system ... mostly nfs though ... (it is a nfs server).
> > So I guess the problem must be quite generic:
>
> Yup, it almost certainly is. Does this patch help?
>
> http://lkml.org/lkml/2009/10/16/89
This patch seems to help in some cases. Before applying this patch I
was able to trigger alloc failures on different machine by booting
kernel with "mem=256MB" and doing:
$ gitk on-full-tree &
# rmmod e100
... wait for few MBs in swap
# modprobe e100; ifup --force ethX
So here this patch helped -- with it, I was unable to trigger page
allocation failures (testing was short, tough). However, as I said
here[1], I applied both of Mel's patches (including this one) and that
didn't help my orginal issue (failures after suspend).
[1] http://lkml.org/lkml/2009/10/17/109
Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo RemoveThis @vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/ |
|
| Back to top |
|
 |
Mel Gorman External

Since: May 19, 2006 Posts: 253
|
Posted: Mon Oct 19, 2009 11:10 am Post subject: Re: [Bug #14141] order 2 page allocation failures (generic) [Login to view extended thread Info.] Archived from groups: per prev. post (more info?) |
|
|
On Mon, Oct 19, 2009 at 03:40:05PM +0200, Tobias Oetiker wrote:
> Hi Mel,
>
> Today Mel Gorman wrote:
>
> > On Mon, Oct 19, 2009 at 11:49:08AM +0200, Tobi Oetiker wrote:
> > > Today Frans Pop wrote:
> > >
> > > >
> > > > I'm starting to think that this commit may not be directly related to high
> > > > order allocation failures. The fact that I'm seeing SKB allocation
> > > > failures earlier because of this commit could be just a side effect.
> > > > It could be that instead the main impact of this commit is on encrypted
> > > > file system and/or encrypted swap (kcryptd).
> > > >
> > > > Besides mm the commit also touches dm-crypt (and nfs/write.c, but as I'm
> > > > only reading from NFS that's unlikely).
> > >
> > > I have updated a fileserver to 2.6.31 today and I see page
> > > allocation failures from several parts of the system ... mostly nfs though ... (it is a nfs server).
> > > So I guess the problem must be quite generic:
> > >
> > >
> > > Oct 19 07:10:02 johan kernel: [23565.684110] swapper: page allocation failure. order:5, mode:0x4020 [kern.warning]
> > > Oct 19 07:10:02 johan kernel: [23565.684118] Pid: 0, comm: swapper Not tainted 2.6.31-02063104-generic #02063104 [kern.warning]
> > > Oct 19 07:10:02 johan kernel: [23565.684121] Call Trace: [kern.warning]
> > > Oct 19 07:10:02 johan kernel: [23565.684124] <IRQ> [<ffffffff810da5a2>] __alloc_pages_slowpath+0x3b2/0x4c0 [kern.warning]
> > >
> >
> > What's the rest of the stack trace? I'm wondering where a large number
> > of order-5 GFP_ATOMIC allocations are coming from. It seems different to
> > the e100 problem where there is one GFP_ATOMIC allocation while the
> > firmware is being loaded.
>
> Oct 19 07:10:02 johan kernel: [23565.684110] swapper: page allocation failure. order:5, mode:0x4020 [kern.warning]
> Oct 19 07:10:02 johan kernel: [23565.684118] Pid: 0, comm: swapper Not tainted 2.6.31-02063104-generic #02063104 [kern.warning]
> Oct 19 07:10:02 johan kernel: [23565.684121] Call Trace: [kern.warning]
> Oct 19 07:10:02 johan kernel: [23565.684124] <IRQ> [<ffffffff810da5a2>] __alloc_pages_slowpath+0x3b2/0x4c0 [kern.warning]
> Oct 19 07:10:02 johan kernel: [23565.684157] [<ffffffff810da7e5>] __alloc_pages_nodemask+0x135/0x140 [kern.warning]
> Oct 19 07:10:02 johan kernel: [23565.684164] [<ffffffff815065b4>] ? _spin_unlock_bh+0x14/0x20 [kern.warning]
> Oct 19 07:10:02 johan kernel: [23565.684170] [<ffffffff8110b368>] kmalloc_large_node+0x68/0xc0 [kern.warning]
> Oct 19 07:10:02 johan kernel: [23565.684175] [<ffffffff8110f15a>] __kmalloc_node_track_caller+0x11a/0x180 [kern.warning]
> Oct 19 07:10:02 johan kernel: [23565.684181] [<ffffffff8140ffd2>] ? skb_copy+0x32/0xa0 [kern.warning]
> Oct 19 07:10:02 johan kernel: [23565.684185] [<ffffffff8140d8b6>] __alloc_skb+0x76/0x180 [kern.warning]
> Oct 19 07:10:02 johan kernel: [23565.684205] [<ffffffff8140ffd2>] skb_copy+0x32/0xa0 [kern.warning]
> Oct 19 07:10:02 johan kernel: [23565.684221] [<ffffffffa050f33c>] vboxNetFltLinuxPacketHandler+0x5c/0xd0 [vboxnetflt] [kern.warning]
Is the MTU set very high between the host and virtualised machine?
Can you test please with the patch at http://lkml.org/lkml/2009/10/16/89
applied and with commits 373c0a7e and 8aa7e847 reverted please?
> Oct 19 07:10:02 johan kernel: [23565.684231] [<ffffffff81416f79>] dev_hard_start_xmit+0x189/0x1c0 [kern.warning]
> Oct 19 07:10:02 johan kernel: [23565.684236] [<ffffffff8142f071>] __qdisc_run+0x1a1/0x230 [kern.warning]
> Oct 19 07:10:02 johan kernel: [23565.684240] [<ffffffff81418a88>] dev_queue_xmit+0x238/0x310 [kern.warning]
> Oct 19 07:10:02 johan kernel: [23565.684246] [<ffffffff8144864b>] ip_finish_output+0x11b/0x2f0 [kern.warning]
> Oct 19 07:10:02 johan kernel: [23565.684250] [<ffffffff814488a9>] ip_output+0x89/0xd0 [kern.warning]
> Oct 19 07:10:02 johan kernel: [23565.684254] [<ffffffff814478c0>] ip_local_out+0x20/0x30 [kern.warning]
> Oct 19 07:10:02 johan kernel: [23565.684258] [<ffffffff814481ab>] ip_queue_xmit+0x22b/0x3f0 [kern.warning]
> Oct 19 07:10:02 johan kernel: [23565.684264] [<ffffffff8145d5e5>] tcp_transmit_skb+0x345/0x4e0 [kern.warning]
> Oct 19 07:10:02 johan kernel: [23565.684269] [<ffffffff8145eaf6>] tcp_write_xmit+0xb6/0x2e0 [kern.warning]
> Oct 19 07:10:02 johan kernel: [23565.684273] [<ffffffff8145ed8b>] __tcp_push_pending_frames+0x2b/0xa0 [kern.warning]
> Oct 19 07:10:02 johan kernel: [23565.684277] [<ffffffff8145b249>] tcp_rcv_established+0x459/0x6d0 [kern.warning]
> Oct 19 07:10:02 johan kernel: [23565.684282] [<ffffffff814630bd>] tcp_v4_do_rcv+0x12d/0x140 [kern.warning]
> Oct 19 07:10:02 johan kernel: [23565.684285] [<ffffffff8146365e>] tcp_v4_rcv+0x58e/0x7c0 [kern.warning]
> Oct 19 07:10:02 johan kernel: [23565.684289] [<ffffffff8144276d>] ip_local_deliver_finish+0x11d/0x2b0 [kern.warning]
> Oct 19 07:10:02 johan kernel: [23565.684293] [<ffffffff8144293b>] ip_local_deliver+0x3b/0x90 [kern.warning]
> Oct 19 07:10:02 johan kernel: [23565.684297] [<ffffffff81442ad6>] ip_rcv_finish+0x146/0x420 [kern.warning]
> Oct 19 07:10:02 johan kernel: [23565.684301] [<ffffffff8144304b>] ip_rcv+0x29b/0x370 [kern.warning]
> Oct 19 07:10:04 johan kernel: [23565.684304] [<ffffffff81418f9a>] netif_receive_skb+0x38a/0x4d0 [kern.warning]
> Oct 19 07:10:04 johan kernel: [23565.684308] [<ffffffff81419268>] napi_skb_finish+0x48/0x60 [kern.warning]
> Oct 19 07:10:04 johan kernel: [23565.684311] [<ffffffff81419724>] napi_gro_receive+0x34/0x40 [kern.warning]
> Oct 19 07:10:04 johan kernel: [23565.684330] [<ffffffffa006b623>] tg3_rx+0x373/0x4b0 [tg3] [kern.warning]
> Oct 19 07:10:04 johan kernel: [23565.684339] [<ffffffffa006cbf0>] tg3_poll_work+0x70/0xf0 [tg3] [kern.warning]
> Oct 19 07:10:04 johan kernel: [23565.684347] [<ffffffffa006ccae>] tg3_poll+0x3e/0xe0 [tg3] [kern.warning]
> Oct 19 07:10:04 johan kernel: [23565.684350] [<ffffffff814198d2>] net_rx_action+0x102/0x210 [kern.warning]
> Oct 19 07:10:04 johan kernel: [23565.684357] [<ffffffff81061d24>] __do_softirq+0xc4/0x1f0 [kern.warning]
> Oct 19 07:10:04 johan kernel: [23565.684362] [<ffffffff8101314c>] call_softirq+0x1c/0x30 [kern.warning]
> Oct 19 07:10:04 johan kernel: [23565.684365] [<ffffffff81014945>] do_softirq+0x55/0x90 [kern.warning]
> Oct 19 07:10:04 johan kernel: [23565.684369] [<ffffffff8106116b>] irq_exit+0x7b/0x90 [kern.warning]
> Oct 19 07:10:04 johan kernel: [23565.684372] [<ffffffff81013e93>] do_IRQ+0x73/0xe0 [kern.warning]
> Oct 19 07:10:04 johan kernel: [23565.684378] [<ffffffff81012993>] ret_from_intr+0x0/0x11 [kern.warning]
> Oct 19 07:10:04 johan kernel: [23565.684381] <EOI> [<ffffffff810318b6>] ? native_safe_halt+0x6/0x10 [kern.warning]
> Oct 19 07:10:04 johan kernel: [23565.684391] [<ffffffff81019cd8>] ? default_idle+0x48/0xe0 [kern.warning]
> Oct 19 07:10:04 johan kernel: [23565.684396] [<ffffffff8150929d>] ? __atomic_notifier_call_chain+0xd/0x10 [kern.warning]
> Oct 19 07:10:04 johan kernel: [23565.684400] [<ffffffff815092b1>] ? atomic_notifier_call_chain+0x11/0x20 [kern.warning]
> Oct 19 07:10:04 johan kernel: [23565.684404] [<ffffffff810107c8>] ? cpu_idle+0x98/0xe0 [kern.warning]
> Oct 19 07:10:04 johan kernel: [23565.684410] [<ffffffff81500d95>] ? start_secondary+0x95/0xc0 [kern.warning]
>
> if you need more, I can send you a whole bunch of them ...
>
I'm assuming they are all more or less the same.
--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo.RemoveThis@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/ |
|
| Back to top |
|
 |
Tobias Oetiker External

Since: Sep 15, 2009 Posts: 12
|
Posted: Mon Oct 19, 2009 11:10 am Post subject: Re: [Bug #14141] order 2 page allocation failures (generic) [Login to view extended thread Info.] Archived from groups: per prev. post (more info?) |
|
|
Hi Mel,
Today Mel Gorman wrote:
> On Mon, Oct 19, 2009 at 03:40:05PM +0200, Tobias Oetiker wrote:
> > Hi Mel,
> >
> > Today Mel Gorman wrote:
> >
> > > On Mon, Oct 19, 2009 at 11:49:08AM +0200, Tobi Oetiker wrote:
> > > > Today Frans Pop wrote:
> > > >
> > > > >
> > > > > I'm starting to think that this commit may not be directly related to high
> > > > > order allocation failures. The fact that I'm seeing SKB allocation
> > > > > failures earlier because of this commit could be just a side effect.
> > > > > It could be that instead the main impact of this commit is on encrypted
> > > > > file system and/or encrypted swap (kcryptd).
> > > > >
> > > > > Besides mm the commit also touches dm-crypt (and nfs/write.c, but as I'm
> > > > > only reading from NFS that's unlikely).
> > > >
> > > > I have updated a fileserver to 2.6.31 today and I see page
> > > > allocation failures from several parts of the system ... mostly nfs though ... (it is a nfs server).
> > > > So I guess the problem must be quite generic:
> > > >
> > > >
> > > > Oct 19 07:10:02 johan kernel: [23565.684110] swapper: page allocation failure. order:5, mode:0x4020 [kern.warning]
> > > > Oct 19 07:10:02 johan kernel: [23565.684118] Pid: 0, comm: swapper Not tainted 2.6.31-02063104-generic #02063104 [kern.warning]
> > > > Oct 19 07:10:02 johan kernel: [23565.684121] Call Trace: [kern.warning]
> > > > Oct 19 07:10:02 johan kernel: [23565.684124] <IRQ> [<ffffffff810da5a2>] __alloc_pages_slowpath+0x3b2/0x4c0 [kern.warning]
> > > >
> > >
> > > What's the rest of the stack trace? I'm wondering where a large number
> > > of order-5 GFP_ATOMIC allocations are coming from. It seems different to
> > > the e100 problem where there is one GFP_ATOMIC allocation while the
> > > firmware is being loaded.
> >
> > Oct 19 07:10:02 johan kernel: [23565.684110] swapper: page allocation failure. order:5, mode:0x4020 [kern.warning]
> > Oct 19 07:10:02 johan kernel: [23565.684118] Pid: 0, comm: swapper Not tainted 2.6.31-02063104-generic #02063104 [kern.warning]
> > Oct 19 07:10:02 johan kernel: [23565.684121] Call Trace: [kern.warning]
> > Oct 19 07:10:02 johan kernel: [23565.684124] <IRQ> [<ffffffff810da5a2>] __alloc_pages_slowpath+0x3b2/0x4c0 [kern.warning]
> > Oct 19 07:10:02 johan kernel: [23565.684157] [<ffffffff810da7e5>] __alloc_pages_nodemask+0x135/0x140 [kern.warning]
> > Oct 19 07:10:02 johan kernel: [23565.684164] [<ffffffff815065b4>] ? _spin_unlock_bh+0x14/0x20 [kern.warning]
> > Oct 19 07:10:02 johan kernel: [23565.684170] [<ffffffff8110b368>] kmalloc_large_node+0x68/0xc0 [kern.warning]
> > Oct 19 07:10:02 johan kernel: [23565.684175] [<ffffffff8110f15a>] __kmalloc_node_track_caller+0x11a/0x180 [kern.warning]
> > Oct 19 07:10:02 johan kernel: [23565.684181] [<ffffffff8140ffd2>] ? skb_copy+0x32/0xa0 [kern.warning]
> > Oct 19 07:10:02 johan kernel: [23565.684185] [<ffffffff8140d8b6>] __alloc_skb+0x76/0x180 [kern.warning]
> > Oct 19 07:10:02 johan kernel: [23565.684205] [<ffffffff8140ffd2>] skb_copy+0x32/0xa0 [kern.warning]
> > Oct 19 07:10:02 johan kernel: [23565.684221] [<ffffffffa050f33c>] vboxNetFltLinuxPacketHandler+0x5c/0xd0 [vboxnetflt] [kern.warning]
>
> Is the MTU set very high between the host and virtualised machine?
>
> Can you test please with the patch at http://lkml.org/lkml/2009/10/16/89
> applied and with commits 373c0a7e and 8aa7e847 reverted please?
if you can send me a consolidated patch which does apply to
2.6.31.4 I will be glad to try ...
your patch in http://lkml.org/lkml/2009/10/16/89 seems not to be
for 2.6.31 ... I assume it would be but then again I I don't realy
understand the code so this is just pattern matching ...
--- a/mm/page_alloc.c 2009-10-05 19:12:06.000000000 +0200
+++ b/mm/page_alloc.c 2009-10-19 14:52:15.000000000 +0200
@@ -1763,6 +1763,7 @@
if (NUMA_BUILD && (gfp_mask & GFP_THISNODE) == GFP_THISNODE)
goto nopage;
+restart:
wake_all_kswapd(order, zonelist, high_zoneidx);
/*
@@ -1772,7 +1773,6 @@
*/
alloc_flags = gfp_to_alloc_flags(gfp_mask);
-restart:
/* This is the last chance, in general, before the goto nopage. */
page = get_page_from_freelist(gfp_mask, nodemask, order, zonelist,
high_zoneidx, alloc_flags & ~ALLOC_NO_WATERMARKS,
cheers
tobi
--
Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten, Switzerland
http://it.oetiker.ch tobi.TakeThisOut@oetiker.ch ++41 62 775 9902 / sb: -9900
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo.TakeThisOut@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/ |
|
| Back to top |
|
 |
Mel Gorman External

Since: May 19, 2006 Posts: 253
|
Posted: Mon Oct 19, 2009 12:10 pm Post subject: Re: [Bug #14141] order 2 page allocation failures (generic) [Login to view extended thread Info.] Archived from groups: per prev. post (more info?) |
|
|
On Mon, Oct 19, 2009 at 04:16:36PM +0200, Tobias Oetiker wrote:
> Hi Mel,
>
> Today Mel Gorman wrote:
>
> > On Mon, Oct 19, 2009 at 03:40:05PM +0200, Tobias Oetiker wrote:
> > > Hi Mel,
> > >
> > > Today Mel Gorman wrote:
> > >
> > > > On Mon, Oct 19, 2009 at 11:49:08AM +0200, Tobi Oetiker wrote:
> > > > > Today Frans Pop wrote:
> > > > >
> > > > > >
> > > > > > I'm starting to think that this commit may not be directly related to high
> > > > > > order allocation failures. The fact that I'm seeing SKB allocation
> > > > > > failures earlier because of this commit could be just a side effect.
> > > > > > It could be that instead the main impact of this commit is on encrypted
> > > > > > file system and/or encrypted swap (kcryptd).
> > > > > >
> > > > > > Besides mm the commit also touches dm-crypt (and nfs/write.c, but as I'm
> > > > > > only reading from NFS that's unlikely).
> > > > >
> > > > > I have updated a fileserver to 2.6.31 today and I see page
> > > > > allocation failures from several parts of the system ... mostly nfs though ... (it is a nfs server).
> > > > > So I guess the problem must be quite generic:
> > > > >
> > > > >
> > > > > Oct 19 07:10:02 johan kernel: [23565.684110] swapper: page allocation failure. order:5, mode:0x4020 [kern.warning]
> > > > > Oct 19 07:10:02 johan kernel: [23565.684118] Pid: 0, comm: swapper Not tainted 2.6.31-02063104-generic #02063104 [kern.warning]
> > > > > Oct 19 07:10:02 johan kernel: [23565.684121] Call Trace: [kern.warning]
> > > > > Oct 19 07:10:02 johan kernel: [23565.684124] <IRQ> [<ffffffff810da5a2>] __alloc_pages_slowpath+0x3b2/0x4c0 [kern.warning]
> > > > >
> > > >
> > > > What's the rest of the stack trace? I'm wondering where a large number
> > > > of order-5 GFP_ATOMIC allocations are coming from. It seems different to
> > > > the e100 problem where there is one GFP_ATOMIC allocation while the
> > > > firmware is being loaded.
> > >
> > > Oct 19 07:10:02 johan kernel: [23565.684110] swapper: page allocation failure. order:5, mode:0x4020 [kern.warning]
> > > Oct 19 07:10:02 johan kernel: [23565.684118] Pid: 0, comm: swapper Not tainted 2.6.31-02063104-generic #02063104 [kern.warning]
> > > Oct 19 07:10:02 johan kernel: [23565.684121] Call Trace: [kern.warning]
> > > Oct 19 07:10:02 johan kernel: [23565.684124] <IRQ> [<ffffffff810da5a2>] __alloc_pages_slowpath+0x3b2/0x4c0 [kern.warning]
> > > Oct 19 07:10:02 johan kernel: [23565.684157] [<ffffffff810da7e5>] __alloc_pages_nodemask+0x135/0x140 [kern.warning]
> > > Oct 19 07:10:02 johan kernel: [23565.684164] [<ffffffff815065b4>] ? _spin_unlock_bh+0x14/0x20 [kern.warning]
> > > Oct 19 07:10:02 johan kernel: [23565.684170] [<ffffffff8110b368>] kmalloc_large_node+0x68/0xc0 [kern.warning]
> > > Oct 19 07:10:02 johan kernel: [23565.684175] [<ffffffff8110f15a>] __kmalloc_node_track_caller+0x11a/0x180 [kern.warning]
> > > Oct 19 07:10:02 johan kernel: [23565.684181] [<ffffffff8140ffd2>] ? skb_copy+0x32/0xa0 [kern.warning]
> > > Oct 19 07:10:02 johan kernel: [23565.684185] [<ffffffff8140d8b6>] __alloc_skb+0x76/0x180 [kern.warning]
> > > Oct 19 07:10:02 johan kernel: [23565.684205] [<ffffffff8140ffd2>] skb_copy+0x32/0xa0 [kern.warning]
> > > Oct 19 07:10:02 johan kernel: [23565.684221] [<ffffffffa050f33c>] vboxNetFltLinuxPacketHandler+0x5c/0xd0 [vboxnetflt] [kern.warning]
> >
> > Is the MTU set very high between the host and virtualised machine?
> >
> > Can you test please with the patch at http://lkml.org/lkml/2009/10/16/89
> > applied and with commits 373c0a7e and 8aa7e847 reverted please?
>
> if you can send me a consolidated patch which does apply to
> 2.6.31.4 I will be glad to try ...
>
Sure
==== CUT HERE ====
From 6c0215af3b7c39ef7b8083ea38ca3ad93cd3f51f Mon Sep 17 00:00:00 2001
From: Mel Gorman <mel.TakeThisOut@csn.ul.ie>
Date: Mon, 19 Oct 2009 15:40:43 +0100
Subject: [PATCH] Kick off kswapd after direct reclaim and revert congestion changes
The following patch is http://lkml.org/lkml/2009/10/16/89 on top of
2.6.31.4 as well as patches 373c0a7e and 8aa7e847 reverted.
---
arch/x86/lib/usercopy_32.c | 2 +-
drivers/block/pktcdvd.c | 10 ++++------
drivers/md/dm-crypt.c | 2 +-
fs/fat/file.c | 2 +-
fs/fuse/dev.c | 8 ++++----
fs/nfs/write.c | 8 +++-----
fs/reiserfs/journal.c | 2 +-
fs/xfs/linux-2.6/kmem.c | 4 ++--
fs/xfs/linux-2.6/xfs_buf.c | 2 +-
include/linux/backing-dev.h | 11 +++--------
include/linux/blkdev.h | 13 +++++++++----
mm/backing-dev.c | 7 ++++---
mm/memcontrol.c | 2 +-
mm/page-writeback.c | 8 ++++----
mm/page_alloc.c | 15 ++++++++-------
mm/vmscan.c | 8 ++++----
16 files changed, 51 insertions(+), 53 deletions(-)
diff --git a/arch/x86/lib/usercopy_32.c b/arch/x86/lib/usercopy_32.c
index 1f118d4..7c8ca91 100644
--- a/arch/x86/lib/usercopy_32.c
+++ b/arch/x86/lib/usercopy_32.c
@@ -751,7 +751,7 @@ survive:
if (retval == -ENOMEM && is_global_init(current)) {
up_read(¤t->mm->mmap_sem);
- congestion_wait(BLK_RW_ASYNC, HZ/50);
+ congestion_wait(WRITE, HZ/50);
goto survive;
}
diff --git a/drivers/block/pktcdvd.c b/drivers/block/pktcdvd.c
index 99a506f..83650e0 100644
--- a/drivers/block/pktcdvd.c
+++ b/drivers/block/pktcdvd.c
@@ -1372,10 +1372,8 @@ try_next_bio:
wakeup = (pd->write_congestion_on > 0
&& pd->bio_queue_size <= pd->write_congestion_off);
spin_unlock(&pd->lock);
- if (wakeup) {
- clear_bdi_congested(&pd->disk->queue->backing_dev_info,
- BLK_RW_ASYNC);
- }
+ if (wakeup)
+ clear_bdi_congested(&pd->disk->queue->backing_dev_info, WRITE);
pkt->sleep_time = max(PACKET_WAIT_TIME, 1);
pkt_set_state(pkt, PACKET_WAITING_STATE);
@@ -2594,10 +2592,10 @@ static int pkt_make_request(struct request_queue *q, struct bio *bio)
spin_lock(&pd->lock);
if (pd->write_congestion_on > 0
&& pd->bio_queue_size >= pd->write_congestion_on) {
- set_bdi_congested(&q->backing_dev_info, BLK_RW_ASYNC);
+ set_bdi_congested(&q->backing_dev_info, WRITE);
do {
spin_unlock(&pd->lock);
- congestion_wait(BLK_RW_ASYNC, HZ);
+ congestion_wait(WRITE, HZ);
spin_lock(&pd->lock);
} while(pd->bio_queue_size > pd->write_congestion_off);
}
diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c
index ed10381..c72a8dd 100644
--- a/drivers/md/dm-crypt.c
+++ b/drivers/md/dm-crypt.c
@@ -776,7 +776,7 @@ static void kcryptd_crypt_write_convert(struct dm_crypt_io *io)
* But don't wait if split was due to the io size restriction
*/
if (unlikely(out_of_pages))
- congestion_wait(BLK_RW_ASYNC, HZ/100);
+ congestion_wait(WRITE, HZ/100);
/*
* With async crypto it is unsafe to share the crypto context
diff --git a/fs/fat/file.c b/fs/fat/file.c
index f042b96..b28ea64 100644
--- a/fs/fat/file.c
+++ b/fs/fat/file.c
@@ -134,7 +134,7 @@ static int fat_file_release(struct inode *inode, struct file *filp)
if ((filp->f_mode & FMODE_WRITE) &&
MSDOS_SB(inode->i_sb)->options.flush) {
fat_flush_inodes(inode->i_sb, inode, NULL);
- congestion_wait(BLK_RW_ASYNC, HZ/10);
+ congestion_wait(WRITE, HZ/10);
}
return 0;
}
diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index 6484eb7..f58ecbc 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -286,8 +286,8 @@ __releases(&fc->lock)
}
if (fc->num_background == FUSE_CONGESTION_THRESHOLD &&
fc->connected && fc->bdi_initialized) {
- clear_bdi_congested(&fc->bdi, BLK_RW_SYNC);
- clear_bdi_congested(&fc->bdi, BLK_RW_ASYNC);
+ clear_bdi_congested(&fc->bdi, READ);
+ clear_bdi_congested(&fc->bdi, WRITE);
}
fc->num_background--;
fc->active_background--;
@@ -414,8 +414,8 @@ static void fuse_request_send_nowait_locked(struct fuse_conn *fc,
fc->blocked = 1;
if (fc->num_background == FUSE_CONGESTION_THRESHOLD &&
fc->bdi_initialized) {
- set_bdi_congested(&fc->bdi, BLK_RW_SYNC);
- set_bdi_congested(&fc->bdi, BLK_RW_ASYNC);
+ set_bdi_congested(&fc->bdi, READ);
+ set_bdi_congested(&fc->bdi, WRITE);
}
list_add_tail(&req->list, &fc->bg_queue);
flush_bg_queue(fc);
diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index a34fae2..5693fcd 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -200,10 +200,8 @@ static int nfs_set_page_writeback(struct page *page)
struct nfs_server *nfss = NFS_SERVER(inode);
if (atomic_long_inc_return(&nfss->writeback) >
- NFS_CONGESTION_ON_THRESH) {
- set_bdi_congested(&nfss->backing_dev_info,
- BLK_RW_ASYNC);
- }
+ NFS_CONGESTION_ON_THRESH)
+ set_bdi_congested(&nfss->backing_dev_info, WRITE);
}
return ret;
}
@@ -215,7 +213,7 @@ static void nfs_end_page_writeback(struct page *page)
end_page_writeback(page);
if (atomic_long_dec_return(&nfss->writeback) < NFS_CONGESTION_OFF_THRESH)
- clear_bdi_congested(&nfss->backing_dev_info, BLK_RW_ASYNC);
+ clear_bdi_congested(&nfss->backing_dev_info, WRITE);
}
/*
diff --git a/fs/reiserfs/journal.c b/fs/reiserfs/journal.c
index 9062220..77f5bb7 100644
--- a/fs/reiserfs/journal.c
+++ b/fs/reiserfs/journal.c
@@ -997,7 +997,7 @@ static int reiserfs_async_progress_wait(struct super_block *s)
DEFINE_WAIT(wait);
struct reiserfs_journal *j = SB_JOURNAL(s);
if (atomic_read(&j->j_async_throttle))
- congestion_wait(BLK_RW_ASYNC, HZ / 10);
+ congestion_wait(WRITE, HZ / 10);
return 0;
}
diff --git a/fs/xfs/linux-2.6/kmem.c b/fs/xfs/linux-2.6/kmem.c
index 2d3f90a..1cd3b55 100644
--- a/fs/xfs/linux-2.6/kmem.c
+++ b/fs/xfs/linux-2.6/kmem.c
@@ -53,7 +53,7 @@ kmem_alloc(size_t size, unsigned int __nocast flags)
printk(KERN_ERR "XFS: possible memory allocation "
"deadlock in %s (mode:0x%x)\n",
__func__, lflags);
- congestion_wait(BLK_RW_ASYNC, HZ/50);
+ congestion_wait(WRITE, HZ/50);
} while (1);
}
@@ -130,7 +130,7 @@ kmem_zone_alloc(kmem_zone_t *zone, unsigned int __nocast flags)
printk(KERN_ERR "XFS: possible memory allocation "
"deadlock in %s (mode:0x%x)\n",
__func__, lflags);
- congestion_wait(BLK_RW_ASYNC, HZ/50);
+ congestion_wait(WRITE, HZ/50);
} while (1);
}
diff --git a/fs/xfs/linux-2.6/xfs_buf.c b/fs/xfs/linux-2.6/xfs_buf.c
index 965df12..178c20c 100644
--- a/fs/xfs/linux-2.6/xfs_buf.c
+++ b/fs/xfs/linux-2.6/xfs_buf.c
@@ -412,7 +412,7 @@ _xfs_buf_lookup_pages(
XFS_STATS_INC(xb_page_retries);
xfsbufd_wakeup(0, gfp_mask);
- congestion_wait(BLK_RW_ASYNC, HZ/50);
+ congestion_wait(WRITE, HZ/50);
goto retry;
}
diff --git a/include/linux/backing-dev.h b/include/linux/backing-dev.h
index 1d52425..0ec2c59 100644
--- a/include/linux/backing-dev.h
+++ b/include/linux/backing-dev.h
@@ -229,14 +229,9 @@ static inline int bdi_rw_congested(struct backing_dev_info *bdi)
(1 << BDI_async_congested));
}
-enum {
- BLK_RW_ASYNC = 0,
- BLK_RW_SYNC = 1,
-};
-
-void clear_bdi_congested(struct backing_dev_info *bdi, int sync);
-void set_bdi_congested(struct backing_dev_info *bdi, int sync);
-long congestion_wait(int sync, long timeout);
+void clear_bdi_congested(struct backing_dev_info *bdi, int rw);
+void set_bdi_congested(struct backing_dev_info *bdi, int rw);
+long congestion_wait(int rw, long timeout);
static inline bool bdi_cap_writeback_dirty(struct backing_dev_info *bdi)
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 69103e0..998c8e0 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -70,6 +70,11 @@ enum rq_cmd_type_bits {
REQ_TYPE_ATA_PC,
};
+enum {
+ BLK_RW_ASYNC = 0,
+ BLK_RW_SYNC = 1,
+};
+
/*
* For request of type REQ_TYPE_LINUX_BLOCK, rq->cmd[0] is the opcode being
* sent down (similar to how REQ_TYPE_BLOCK_PC means that ->cmd[] holds a
@@ -775,18 +780,18 @@ extern int sg_scsi_ioctl(struct request_queue *, struct gendisk *, fmode_t,
* congested queues, and wake up anyone who was waiting for requests to be
* put back.
*/
-static inline void blk_clear_queue_congested(struct request_queue *q, int sync)
+static inline void blk_clear_queue_congested(struct request_queue *q, int rw)
{
- clear_bdi_congested(&q->backing_dev_info, sync);
+ clear_bdi_congested(&q->backing_dev_info, rw);
}
/*
* A queue has just entered congestion. Flag that in the queue's VM-visible
* state flags and increment the global gounter of congested queues.
*/
-static inline void blk_set_queue_congested(struct request_queue *q, int sync)
+static inline void blk_set_queue_congested(struct request_queue *q, int rw)
{
- set_bdi_congested(&q->backing_dev_info, sync);
+ set_bdi_congested(&q->backing_dev_info, rw);
}
extern void blk_start_queue(struct request_queue *q);
diff --git a/mm/backing-dev.c b/mm/backing-dev.c
index c86edd2..493b468 100644
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -283,6 +283,7 @@ static wait_queue_head_t congestion_wqh[2] = {
__WAIT_QUEUE_HEAD_INITIALIZER(congestion_wqh[1])
};
+
void clear_bdi_congested(struct backing_dev_info *bdi, int sync)
{
enum bdi_state bit;
@@ -307,18 +308,18 @@ EXPORT_SYMBOL(set_bdi_congested);
/**
* congestion_wait - wait for a backing_dev to become uncongested
- * @sync: SYNC or ASYNC IO
+ * @rw: READ or WRITE
* @timeout: timeout in jiffies
*
* Waits for up to @timeout jiffies for a backing_dev (any backing_dev) to exit
* write congestion. If no backing_devs are congested then just wait for the
* next write to be completed.
*/
-long congestion_wait(int sync, long timeout)
+long congestion_wait(int rw, long timeout)
{
long ret;
DEFINE_WAIT(wait);
- wait_queue_head_t *wqh = &congestion_wqh[sync];
+ wait_queue_head_t *wqh = &congestion_wqh[rw];
prepare_to_wait(wqh, &wait, TASK_UNINTERRUPTIBLE);
ret = io_schedule_timeout(timeout);
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index fd4529d..834509f 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1990,7 +1990,7 @@ try_to_free:
if (!progress) {
nr_retries--;
/* maybe some writeback is necessary */
- congestion_wait(BLK_RW_ASYNC, HZ/10);
+ congestion_wait(WRITE, HZ/10);
}
}
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 81627eb..7687879 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -575,7 +575,7 @@ static void balance_dirty_pages(struct address_space *mapping)
if (pages_written >= write_chunk)
break; /* We've done our duty */
- congestion_wait(BLK_RW_ASYNC, HZ/10);
+ congestion_wait(WRITE, HZ/10);
}
if (bdi_nr_reclaimable + bdi_nr_writeback < bdi_thresh &&
@@ -669,7 +669,7 @@ void throttle_vm_writeout(gfp_t gfp_mask)
if (global_page_state(NR_UNSTABLE_NFS) +
global_page_state(NR_WRITEBACK) <= dirty_thresh)
break;
- congestion_wait(BLK_RW_ASYNC, HZ/10);
+ congestion_wait(WRITE, HZ/10);
/*
* The caller might hold locks which can prevent IO completion
@@ -715,7 +715,7 @@ static void background_writeout(unsigned long _min_pages)
if (wbc.nr_to_write > 0 || wbc.pages_skipped > 0) {
/* Wrote less than expected */
if (wbc.encountered_congestion || wbc.more_io)
- congestion_wait(BLK_RW_ASYNC, HZ/10);
+ congestion_wait(WRITE, HZ/10);
else
break;
}
@@ -787,7 +787,7 @@ static void wb_kupdate(unsigned long arg)
writeback_inodes(&wbc);
if (wbc.nr_to_write > 0) {
if (wbc.encountered_congestion || wbc.more_io)
- congestion_wait(BLK_RW_ASYNC, HZ/10);
+ congestion_wait(WRITE, HZ/10);
else
break; /* All the old data is written */
}
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 0b3c6cb..489a187 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1673,7 +1673,7 @@ __alloc_pages_high_priority(gfp_t gfp_mask, unsigned int order,
preferred_zone, migratetype);
if (!page && gfp_mask & __GFP_NOFAIL)
- congestion_wait(BLK_RW_ASYNC, HZ/50);
+ congestion_wait(WRITE, HZ/50);
} while (!page && (gfp_mask & __GFP_NOFAIL));
return page;
@@ -1763,16 +1763,17 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
if (NUMA_BUILD && (gfp_mask & GFP_THISNODE) == GFP_THISNODE)
goto nopage;
- wake_all_kswapd(order, zonelist, high_zoneidx);
-
/*
- * OK, we're below the kswapd watermark and have kicked background
- * reclaim. Now things get more complex, so set up alloc_flags according
- * to how we want to proceed.
+ * OK, we're below the kswapd watermark and now things get more
+ * complex, so set up alloc_flags according to how we want to
+ * proceed.
*/
alloc_flags = gfp_to_alloc_flags(gfp_mask);
restart:
+ /* Kick background reclaim */
+ wake_all_kswapd(order, zonelist, high_zoneidx);
+
/* This is the last chance, in general, before the goto nopage. */
page = get_page_from_freelist(gfp_mask, nodemask, order, zonelist,
high_zoneidx, alloc_flags & ~ALLOC_NO_WATERMARKS,
@@ -1844,7 +1845,7 @@ rebalance:
pages_reclaimed += did_some_progress;
if (should_alloc_retry(gfp_mask, order, pages_reclaimed)) {
/* Wait for some write requests to complete then retry */
- congestion_wait(BLK_RW_ASYNC, HZ/50);
+ congestion_wait(WRITE, HZ/50);
goto rebalance;
}
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 94e86dd..9219beb 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1109,7 +1109,7 @@ static unsigned long shrink_inactive_list(unsigned long max_scan,
*/
if (nr_freed < nr_taken && !current_is_kswapd() &&
lumpy_reclaim) {
- congestion_wait(BLK_RW_ASYNC, HZ/10);
+ congestion_wait(WRITE, HZ/10);
/*
* The attempt at page out may have made some
@@ -1726,7 +1726,7 @@ static unsigned long do_try_to_free_pages(struct zonelist *zonelist,
/* Take a nap, wait for some writeback to complete */
if (sc->nr_scanned && priority < DEF_PRIORITY - 2)
- congestion_wait(BLK_RW_ASYNC, HZ/10);
+ congestion_wait(WRITE, HZ/10);
}
/* top priority shrink_zones still had more to do? don't OOM, then */
if (!sc->all_unreclaimable && scanning_global_lru(sc))
@@ -1965,7 +1965,7 @@ loop_again:
* another pass across the zones.
*/
if (total_scanned && priority < DEF_PRIORITY - 2)
- congestion_wait(BLK_RW_ASYNC, HZ/10);
+ congestion_wait(WRITE, HZ/10);
/*
* We do this so kswapd doesn't build up large priorities for
@@ -2238,7 +2238,7 @@ unsigned long shrink_all_memory(unsigned long nr_pages)
goto out;
if (sc.nr_scanned && prio < DEF_PRIORITY - 2)
- congestion_wait(BLK_RW_ASYNC, HZ / 10);
+ congestion_wait(WRITE, HZ / 10);
}
}
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo.TakeThisOut@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/ |
|
| Back to top |
|
 |
Chris Mason External

Since: Sep 18, 2006 Posts: 87
|
Posted: Mon Oct 19, 2009 1:10 pm Post subject: Re: [Bug #14141] order 2 page allocation failures in iwlagn [Login to view extended thread Info.] Archived from groups: per prev. post (more info?) |
|
|
On Mon, Oct 19, 2009 at 03:01:52PM +0100, Mel Gorman wrote:
>
> > During the 2nd phase I see the first SKB allocation errors with a music
> > skip between reading commits 95.000 and 110.000.
> > About commit 115.000 there is a very long pause during which the counter
> > does not increase, music stops and the desktop freezes completely. The
> > first 30 seconds of that freeze there is only very low disk activity (which
> > seems strange);
>
> I'm just going to have to depend on Jens here. Jens, the congestion_wait() is
> on BLK_RW_ASYNC after the commit. Reclaim usually writes pages asynchronously
> but lumpy reclaim actually waits of pages to write out synchronously so
> it's not always async.
Waiting doesn't make it synchronous from the elevator point of view
If you're using WB_SYNC_NONE, it's a async write. WB_SYNC_ALL makes it
a sync write. I only see WB_SYNC_NONE in vmscan.c, so we should be
using the async congestion wait. (the exception is xfs which always
does async writes).
But I'm honestly not 100% sure. Looking back through the emails, the
test case is doing IO on top of a whole lot of things on top of
dm-crypt? I just tried to figure out if dm-crypt is turning the async
IO into sync IOs, but didn't quite make sense of it.
Could you also please include which filesystems were being abused during
the test and how? Reading through the emails, I think you've got:
gitk being run 3 times on some FS (NFS?)
streaming reads on NFS
swap on dm-crypt
If other filesystems are being used, please correct me. Also please
include if they are on crypto or straight block device.
>
> Either way, reclaim is usually worried about writing pages but it would appear
> after this change that a lot of read activity can also stall a process in
> direct reclaim. What might be happening in Frans's particular case is that the
> tasklet that allocates high-order pages for the RX buffers is getting stalled
> by congestion caused by other processes doing reads from the filesystem.
> While it makes sense from a congestion point of view to halt the IO, the
> reclaim operations from direct reclaimers is getting delayed for long enough
> to cause problems for GFP_ATOMIC.
The congestion_wait code either waits for congestion to clear or for
a given timeout. The part that isn't clear is if before the patch
we waited a very short time (congestion cleared quickly) or a very long
time (we hit the timeout or congestion cleared slowly).
The easiest way to tell is to just replace the congestion_wait() calls
in direct reclaim with schedule_timeout_interruptible(10), test, then
schedule_timeout_interruptible(HZ/20), then test again.
>
> Does this sound plausible to you? If so, what's the best way of
> addressing this? Changing congestion_wait back to WRITE (assuming that
> works for Frans)? Changing it to SYNC (again, assuming it actually
> works) or a revert?
I don't think changing it to SYNC is a good plan unless we're actually
doing sync io. It would be better to just wait on one of the pages that
you've sent down (or its hashed waitqueue since the page can go away).
-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo DeleteThis @vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/ |
|
| Back to top |
|
 |
Christoph Hellwig External

Since: May 16, 2006 Posts: 757
|
Posted: Mon Oct 19, 2009 2:10 pm Post subject: Re: [Bug #14141] order 2 page allocation failures in iwlagn [Login to view extended thread Info.] Archived from groups: per prev. post (more info?) |
|
|
On Tue, Oct 20, 2009 at 01:18:15AM +0900, Chris Mason wrote:
> Waiting doesn't make it synchronous from the elevator point of view
> If you're using WB_SYNC_NONE, it's a async write. WB_SYNC_ALL makes it
> a sync write. I only see WB_SYNC_NONE in vmscan.c, so we should be
> using the async congestion wait. (the exception is xfs which always
> does async writes).
That's only because those people who did the global sweep did not bother
to convert it or even tell the list about it. I have a patch in my
QA queue to change it..
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo.DeleteThis@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/ |
|
| Back to top |
|
 |
Karol Lewandowski External

Since: Oct 02, 2009 Posts: 8
|
Posted: Mon Oct 19, 2009 2:10 pm Post subject: Re: [Bug #14141] order 2 page allocation failures (generic) [Login to view extended thread Info.] Archived from groups: per prev. post (more info?) |
|
|
On Mon, Oct 19, 2009 at 03:06:19PM +0100, Mel Gorman wrote:
> Can you test with my kswapd patch applied and commits 373c0a7e,8aa7e847
> reverted please?
It seems that your patch and Frans' reverts together *do* make
difference.
With these patches I haven't been able to trigger failures so far
(in about 6 attempts). I'll continue testing and let you know if
anything changes.
If nothing changes this looks like fix for my problem.
Thanks. Thanks a lot!
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo.DeleteThis@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/ |
|
| Back to top |
|
 |
Tobias Oetiker External

Since: Sep 15, 2009 Posts: 12
|
Posted: Mon Oct 19, 2009 5:10 pm Post subject: Re: [Bug #14141] order 2 page allocation failures (generic) [Login to view extended thread Info.] Archived from groups: per prev. post (more info?) |
|
|
Hi Mel,
Today Mel Gorman wrote:
> >
> > if you can send me a consolidated patch which does apply to
> > 2.6.31.4 I will be glad to try ...
> >
>
> Sure
>
> ==== CUT HERE ====
>
> From 6c0215af3b7c39ef7b8083ea38ca3ad93cd3f51f Mon Sep 17 00:00:00 2001
> From: Mel Gorman <mel.TakeThisOut@csn.ul.ie>
> Date: Mon, 19 Oct 2009 15:40:43 +0100
> Subject: [PATCH] Kick off kswapd after direct reclaim and revert congestion changes
>
> The following patch is http://lkml.org/lkml/2009/10/16/89 on top of
> 2.6.31.4 as well as patches 373c0a7e and 8aa7e847 reverted.
it seems to help ... the server has been running for 3 hours now
without incident, but then again it is not as active as during the
day, ... will report tomorrow.
cheers
tobi
--
Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten, Switzerland
http://it.oetiker.ch tobi.TakeThisOut@oetiker.ch ++41 62 775 9902 / sb: -9900
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo.TakeThisOut@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/ |
|
| Back to top |
|
 |
Tobias Oetiker External

Since: Sep 15, 2009 Posts: 12
|
Posted: Mon Oct 19, 2009 5:10 pm Post subject: Re: [Bug #14141] order 2 page allocation failures (generic) [Login to view extended thread Info.] Archived from groups: per prev. post (more info?) |
|
|
Hi Mel,
Today Tobias Oetiker wrote:
> Hi Mel,
>
> Today Mel Gorman wrote:
>
> > >
> > > if you can send me a consolidated patch which does apply to
> > > 2.6.31.4 I will be glad to try ...
> > >
> >
> > Sure
> >
> > ==== CUT HERE ====
> >
> > From 6c0215af3b7c39ef7b8083ea38ca3ad93cd3f51f Mon Sep 17 00:00:00 2001
> > From: Mel Gorman <mel.TakeThisOut@csn.ul.ie>
> > Date: Mon, 19 Oct 2009 15:40:43 +0100
> > Subject: [PATCH] Kick off kswapd after direct reclaim and revert congestion changes
> >
> > The following patch is http://lkml.org/lkml/2009/10/16/89 on top of
> > 2.6.31.4 as well as patches 373c0a7e and 8aa7e847 reverted.
>
> it seems to help ... the server has been running for 3 hours now
> without incident, but then again it is not as active as during the
> day, ... will report tomorrow.
while I was writing, the system found that the patch does not realy
help:
Oct 19 22:09:52 johan kernel: [11157.121506] smtpd: page allocation failure. order:5, mode:0x4020 [kern.warning]
Oct 19 22:09:52 johan kernel: [11157.121514] Pid: 19324, comm: smtpd Tainted: G D 2.6.31.4-oep #1 [kern.warning]
Oct 19 22:09:52 johan kernel: [11157.121518] Call Trace: [kern.warning]
Oct 19 22:09:52 johan kernel: [11157.121521] <IRQ> [<ffffffff810cb599>] __alloc_pages_nodemask+0x549/0x650 [kern.warning]
Oct 19 22:09:52 johan kernel: [11157.121563] [<ffffffffa02bde3b>] ? __nf_ct_refresh_acct+0xab/0x110 [nf_conntrack] [kern.warning]
Oct 19 22:09:52 johan kernel: [11157.121572] [<ffffffffa02a8337>] ? ipt_do_table+0x2f7/0x610 [ip_tables] [kern.warning]
Oct 19 22:09:52 johan kernel: [11157.121580] [<ffffffff810fac18>] kmalloc_large_node+0x68/0xc0 [kern.warning]
Oct 19 22:09:52 johan kernel: [11157.121585] [<ffffffff810fe90a>] __kmalloc_node_track_caller+0x11a/0x180 [kern.warning]
Oct 19 22:09:52 johan kernel: [11157.121592] [<ffffffff813ebd42>] ? skb_copy+0x32/0xa0 [kern.warning]
Oct 19 22:09:52 johan kernel: [11157.121596] [<ffffffff813e9606>] __alloc_skb+0x76/0x180 [kern.warning]
Oct 19 22:09:52 johan kernel: [11157.121600] [<ffffffff813ebd42>] skb_copy+0x32/0xa0 [kern.warning]
Oct 19 22:09:52 johan kernel: [11157.121615] [<ffffffffa07dd33c>] vboxNetFltLinuxPacketHandler+0x5c/0xd0 [vboxnetflt] [kern.warning]
Oct 19 22:09:52 johan kernel: [11157.121620] [<ffffffff813f2512>] dev_hard_start_xmit+0x142/0x320 [kern.warning]
Oct 19 22:09:52 johan kernel: [11157.121632] [<ffffffff8140a2c1>] __qdisc_run+0x1a1/0x230 [kern.warning]
Oct 19 22:09:52 johan kernel: [11157.121637] [<ffffffff813f41e0>] dev_queue_xmit+0x2b0/0x3a0 [kern.warning]
Oct 19 22:09:52 johan kernel: [11157.121642] [<ffffffff8142349b>] ip_finish_output+0x11b/0x2f0 [kern.warning]
Oct 19 22:09:52 johan kernel: [11157.121646] [<ffffffff814236f9>] ip_output+0x89/0xd0 [kern.warning]
Oct 19 22:09:52 johan kernel: [11157.121650] [<ffffffff81422710>] ip_local_out+0x20/0x30 [kern.warning]
Oct 19 22:09:52 johan kernel: [11157.121654] [<ffffffff81422ffb>] ip_queue_xmit+0x22b/0x3f0 [kern.warning]
cheers
tobi
--
Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten, Switzerland
http://it.oetiker.ch tobi.TakeThisOut@oetiker.ch ++41 62 775 9902 / sb: -9900
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo.TakeThisOut@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/ |
|
| Back to top |
|
 |
Chris Mason External

Since: Sep 18, 2006 Posts: 87
|
Posted: Mon Oct 19, 2009 7:10 pm Post subject: Re: [Bug #14141] order 2 page allocation failures in iwlagn [Login to view extended thread Info.] Archived from groups: per prev. post (more info?) |
|
|
On Mon, Oct 19, 2009 at 01:01:15PM -0400, Christoph Hellwig wrote:
> On Tue, Oct 20, 2009 at 01:18:15AM +0900, Chris Mason wrote:
> > Waiting doesn't make it synchronous from the elevator point of view
> > If you're using WB_SYNC_NONE, it's a async write. WB_SYNC_ALL makes it
> > a sync write. I only see WB_SYNC_NONE in vmscan.c, so we should be
> > using the async congestion wait. (the exception is xfs which always
> > does async writes).
>
> That's only because those people who did the global sweep did not bother
> to convert it or even tell the list about it. I have a patch in my
> QA queue to change it..
Yes, we just didn't realize XFS was missed. Sorry. I wasn't trying to
blame xfs for being behind, just mentioning that we've got about 10
different variables here and I'm having a hard time figuring out which
ones to push on.
-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo.RemoveThis@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/ |
|
| Back to top |
|
 |
Karol Lewandowski External

Since: Oct 02, 2009 Posts: 8
|
Posted: Mon Oct 19, 2009 10:10 pm Post subject: Re: [Bug #14141] order 2 page allocation failures (generic) [Login to view extended thread Info.] Archived from groups: per prev. post (more info?) |
|
|
On Mon, Oct 19, 2009 at 07:09:47PM +0200, Karol Lewandowski wrote:
> On Mon, Oct 19, 2009 at 03:06:19PM +0100, Mel Gorman wrote:
> > Can you test with my kswapd patch applied and commits 373c0a7e,8aa7e847
> > reverted please?
>
> It seems that your patch and Frans' reverts together *do* make
> difference.
>
> With these patches I haven't been able to trigger failures so far
> (in about 6 attempts). I'll continue testing and let you know if
> anything changes.
Damn it.
I'm sorry to inform you that yes, I still get failures (less often,
but still).
Thanks.
e100: Intel(R) PRO/100 Network Driver, 3.5.24-k2-NAPI
e100: Copyright(c) 1999-2006 Intel Corporation
e100 0000:00:03.0: PCI INT A -> Link[LNKC] -> GSI 9 (level, low) -> IRQ 9
e100 0000:00:03.0: PME# disabled
e100: eth0: e100_probe: addr 0xe8120000, irq 9, MAC addr 00:10:a4:89:e8:84
ifconfig: page allocation failure. order:5, mode:0x8020
Pid: 5151, comm: ifconfig Not tainted 2.6.31+frans2+mel-00002-g90702f9-dirty #2
Call Trace:
[<c015c4e1>] ? __alloc_pages_nodemask+0x423/0x468
[<c0104de7>] ? dma_generic_alloc_coherent+0x4a/0xab
[<c0104d9d>] ? dma_generic_alloc_coherent+0x0/0xab
[<d1614b6f>] ? e100_alloc_cbs+0xc7/0x174 [e100]
[<d1615bfe>] ? e100_up+0x1b/0xf5 [e100]
[<d1615cef>] ? e100_open+0x17/0x41 [e100]
[<c02f871f>] ? dev_open+0x8f/0xc5
[<c02f7ed9>] ? dev_change_flags+0xa2/0x155
[<c032daa6>] ? devinet_ioctl+0x22a/0x51c
[<c02ebabe>] ? sock_ioctl+0x0/0x1e4
[<c02ebc7e>] ? sock_ioctl+0x1c0/0x1e4
[<c02ebabe>] ? sock_ioctl+0x0/0x1e4
[<c017f23a>] ? vfs_ioctl+0x16/0x4a
[<c017fb01>] ? do_vfs_ioctl+0x48a/0x4c1
[<c0168137>] ? handle_mm_fault+0x1e0/0x42c
[<c0348c6b>] ? do_page_fault+0x2ce/0x2e4
[<c017fb64>] ? sys_ioctl+0x2c/0x42
[<c0102748>] ? sysenter_do_call+0x12/0x26
Mem-Info:
DMA per-cpu:
CPU 0: hi: 0, btch: 1 usd: 0
Normal per-cpu:
CPU 0: hi: 90, btch: 15 usd: 35
Active_anon:14778 active_file:10836 inactive_anon:22033
inactive_file:11854 unevictable:0 dirty:6 writeback:0 unstable:0
free:1031 slab:2083 mapped:6193 pagetables:417 bounce:0
DMA free:1096kB min:124kB low:152kB high:184kB active_anon:528kB inactive_anon:3440kB active_file:1076kB inactive_file:5580kB unevictable:0kB present:15868kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 238 238
Normal free:3028kB min:1908kB low:2384kB high:2860kB active_anon:58584kB inactive_anon:84692kB active_file:42268kB inactive_file:41836kB unevictable:0kB present:243776kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 0
DMA: 46*4kB 0*8kB 5*16kB 0*32kB 1*64kB 0*128kB 1*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 1096kB
Normal: 135*4kB 213*8kB 21*16kB 4*32kB 5*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 3028kB
25927 total pagecache pages
3010 pages in swap cache
Swap cache stats: add 205613, delete 202603, find 63665/79800
Free swap = 485236kB
Total swap = 514040kB
65520 pages RAM
1663 pages reserved
14633 pages shared
52919 pages non-shared
ifconfig: page allocation failure. order:5, mode:0x8020
Pid: 5151, comm: ifconfig Not tainted 2.6.31+frans2+mel-00002-g90702f9-dirty #2
Call Trace:
[<c015c4e1>] ? __alloc_pages_nodemask+0x423/0x468
[<c0104de7>] ? dma_generic_alloc_coherent+0x4a/0xab
[<c0104d9d>] ? dma_generic_alloc_coherent+0x0/0xab
[<d1614b6f>] ? e100_alloc_cbs+0xc7/0x174 [e100]
[<d1615bfe>] ? e100_up+0x1b/0xf5 [e100]
[<d1615cef>] ? e100_open+0x17/0x41 [e100]
[<c02f871f>] ? dev_open+0x8f/0xc5
[<c02f7ed9>] ? dev_change_flags+0xa2/0x155
[<c032daa6>] ? devinet_ioctl+0x22a/0x51c
[<c02ebabe>] ? sock_ioctl+0x0/0x1e4
[<c02ebc7e>] ? sock_ioctl+0x1c0/0x1e4
[<c02ebabe>] ? sock_ioctl+0x0/0x1e4
[<c017f23a>] ? vfs_ioctl+0x16/0x4a
[<c017fb01>] ? do_vfs_ioctl+0x48a/0x4c1
[<c0175fd1>] ? vfs_write+0xf4/0x105
[<c017fb64>] ? sys_ioctl+0x2c/0x42
[<c0102748>] ? sysenter_do_call+0x12/0x26
Mem-Info:
DMA per-cpu:
CPU 0: hi: 0, btch: 1 usd: 0
Normal per-cpu:
CPU 0: hi: 90, btch: 15 usd: 67
Active_anon:14760 active_file:10798 inactive_anon:22052
inactive_file:11862 unevictable:0 dirty:6 writeback:30 unstable:0
free:1031 slab:2083 mapped:6187 pagetables:417 bounce:0
DMA free:1096kB min:124kB low:152kB high:184kB active_anon:528kB inactive_anon:3440kB active_file:1076kB inactive_file:5580kB unevictable:0kB present:15868kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 238 238
Normal free:3028kB min:1908kB low:2384kB high:2860kB active_anon:58512kB inactive_anon:84768kB active_file:42116kB inactive_file:41868kB unevictable:0kB present:243776kB pages_scanned:100 all_unreclaimable? no
lowmem_reserve[]: 0 0 0
DMA: 46*4kB 0*8kB 5*16kB 0*32kB 1*64kB 0*128kB 1*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 1096kB
Normal: 135*4kB 213*8kB 21*16kB 4*32kB 5*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 3028kB
25924 total pagecache pages
3037 pages in swap cache
Swap cache stats: add 205644, delete 202607, find 63666/79802
Free swap = 485116kB
Total swap = 514040kB
65520 pages RAM
1663 pages reserved
14638 pages shared
52896 pages non-shared
e100 0000:00:03.0: firmware: requesting e100/d101s_ucode.bin
ADDRCONF(NETDEV_UP): eth0: link is not ready
e100: eth0 NIC Link is Up 100 Mbps Full Duplex
ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
eth0: no IPv6 routers present
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo RemoveThis @vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/ |
|
| Back to top |
|
 |
Mel Gorman External

Since: May 19, 2006 Posts: 253
|
Posted: Tue Oct 20, 2009 7:10 am Post subject: Re: [Bug #14141] order 2 page allocation failures in iwlagn [Login to view extended thread Info.] Archived from groups: per prev. post (more info?) |
|
|
On Tue, Oct 20, 2009 at 01:18:15AM +0900, Chris Mason wrote:
> On Mon, Oct 19, 2009 at 03:01:52PM +0100, Mel Gorman wrote:
> >
> > > During the 2nd phase I see the first SKB allocation errors with a music
> > > skip between reading commits 95.000 and 110.000.
> > > About commit 115.000 there is a very long pause during which the counter
> > > does not increase, music stops and the desktop freezes completely. The
> > > first 30 seconds of that freeze there is only very low disk activity (which
> > > seems strange);
> >
> > I'm just going to have to depend on Jens here. Jens, the congestion_wait() is
> > on BLK_RW_ASYNC after the commit. Reclaim usually writes pages asynchronously
> > but lumpy reclaim actually waits of pages to write out synchronously so
> > it's not always async.
>
> Waiting doesn't make it synchronous from the elevator point of view
> If you're using WB_SYNC_NONE, it's a async write. WB_SYNC_ALL makes it
> a sync write. I only see WB_SYNC_NONE in vmscan.c, so we should be
> using the async congestion wait. (the exception is xfs which always
> does async writes).
>
Right, reclaim always queues the pages for async IO but for lumpy reclaim,
it calls wait_on_page_writeback() but as you say, from an elevator point of
view, it's still async.
> But I'm honestly not 100% sure. Looking back through the emails, the
> test case is doing IO on top of a whole lot of things on top of
> dm-crypt? I just tried to figure out if dm-crypt is turning the async
> IO into sync IOs, but didn't quite make sense of it.
>
I'm not overly sure either.
> Could you also please include which filesystems were being abused during
> the test and how? Reading through the emails, I think you've got:
>
> gitk being run 3 times on some FS (NFS?)
> streaming reads on NFS
> swap on dm-crypt
>
> If other filesystems are being used, please correct me. Also please
> include if they are on crypto or straight block device.
>
I've attached a patch below that should allow us to cheat. When it's applied,
it outputs who called congestion_wait(), how long the timeout was and how
long it waited for. By comparing before and after sleep times, we should
be able to see which of the callers has significantly changed and if
it's something easily addressable.
> > Either way, reclaim is usually worried about writing pages but it would appear
> > after this change that a lot of read activity can also stall a process in
> > direct reclaim. What might be happening in Frans's particular case is that the
> > tasklet that allocates high-order pages for the RX buffers is getting stalled
> > by congestion caused by other processes doing reads from the filesystem.
> > While it makes sense from a congestion point of view to halt the IO, the
> > reclaim operations from direct reclaimers is getting delayed for long enough
> > to cause problems for GFP_ATOMIC.
>
> The congestion_wait code either waits for congestion to clear or for
> a given timeout. The part that isn't clear is if before the patch
> we waited a very short time (congestion cleared quickly) or a very long
> time (we hit the timeout or congestion cleared slowly).
>
Using the instrumentation patch, I found with a very basic test that we
are waiting for short periods of time more often with the patch applied
1 congestion_wait rw=1 delay 6 timeout 25 :: before commit
7 kswapd congestion_wait rw=1 delay 0 timeout 25 :: before commit
32 kswapd congestion_wait sync=0 delay 0 timeout 25 :: after commit
61 kswapd congestion_wait rw=1 delay 1 timeout 25 :: before commit
133 kswapd congestion_wait sync=0 delay 1 timeout 25 :: after commit
16 kswapd congestion_wait rw=1 delay 2 timeout 25 :: before commit
70 kswapd congestion_wait sync=0 delay 2 timeout 25 :: after commit
1 try_to_free_pages congestion_wait sync=0 delay 2 timeout 25 :: after commit
17 kswapd congestion_wait rw=1 delay 3 timeout 25 :: before commit
28 kswapd congestion_wait sync=0 delay 3 timeout 25 :: after commit
1 try_to_free_pages congestion_wait sync=0 delay 3 timeout 25 :: after commit
23 kswapd congestion_wait rw=1 delay 4 timeout 25 :: before commit
16 kswapd congestion_wait sync=0 delay 4 timeout 25 :: after commit
5 try_to_free_pages congestion_wait sync=0 delay 4 timeout 25 :: after commit
20 kswapd congestion_wait rw=1 delay 5 timeout 25 :: before commit
18 kswapd congestion_wait sync=0 delay 5 timeout 25 :: after commit
3 try_to_free_pages congestion_wait sync=0 delay 5 timeout 25 :: after commit
21 kswapd congestion_wait rw=1 delay 6 timeout 25 :: before commit
8 kswapd congestion_wait sync=0 delay 6 timeout 25 :: after commit
2 try_to_free_pages congestion_wait sync=0 delay 6 timeout 25 :: after commit
13 kswapd congestion_wait rw=1 delay 7 timeout 25 :: before commit
12 kswapd congestion_wait sync=0 delay 7 timeout 25 :: after commit
2 try_to_free_pages congestion_wait sync=0 delay 7 timeout 25 :: after commit
8 kswapd congestion_wait rw=1 delay 8 timeout 25 :: before commit
7 kswapd congestion_wait sync=0 delay 8 timeout 25 :: after commit
9 kswapd congestion_wait rw=1 delay 9 timeout 25 :: before commit
5 kswapd congestion_wait sync=0 delay 9 timeout 25 :: after commit
2 try_to_free_pages congestion_wait sync=0 delay 9 timeout 25 :: after commit
4 kswapd congestion_wait rw=1 delay 10 timeout 25 :: before commit
5 kswapd congestion_wait sync=0 delay 10 timeout 25 :: after commit
1 try_to_free_pages congestion_wait sync=0 delay 10 timeout 25 :: after commit
[... remaining output snipped ...]
The before and after commit are really 2.6.31 and 2.6.31-patch-reverted.
The first column is how many times we delayed for that length of time.
To generate the output, I just took the console log from both kernels with
a basic test, put the congestion_wait lines into two separate files and
cat congestion-*-sorted | sort -n -k5 | uniq -c
to give a count of how many times we delayed for a particular caller.
> The easiest way to tell is to just replace the congestion_wait() calls
> in direct reclaim with schedule_timeout_interruptible(10), test, then
> schedule_timeout_interruptible(HZ/20), then test again.
>
Reclaim can also call congestion_wait() and maybe the problem isn't
within the page allocator at all but that it's indirectly affected by
timing.
> >
> > Does this sound plausible to you? If so, what's the best way of
> > addressing this? Changing congestion_wait back to WRITE (assuming that
> > works for Frans)? Changing it to SYNC (again, assuming it actually
> > works) or a revert?
>
> I don't think changing it to SYNC is a good plan unless we're actually
> doing sync io. It would be better to just wait on one of the pages that
> you've sent down (or its hashed waitqueue since the page can go away).
>
Frans, is there any chance you could apply the following patch and get
the console logs for a vanilla kernel and with the congestion patches
reverted? I'm hoping it'll be able to tell us which of the callers has
significantly changed in timing. If there is one caller that has
significantly changed, it might be enough to address just that caller.
=====
From 757999066dc41f2e053d59589c673052fc7c1a65 Mon Sep 17 00:00:00 2001
From: Mel Gorman <mel.DeleteThis@csn.ul.ie>
Date: Tue, 20 Oct 2009 11:01:57 +0100
Subject: [PATCH] Instrument congestion_wait
This patch instruments how long congestion_wait() really waited for a
given caller.
diff --git a/mm/backing-dev.c b/mm/backing-dev.c
index 3d3accb..fc945e0 100644
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -10,6 +10,7 @@
#include <linux/module.h>
#include <linux/writeback.h>
#include <linux/device.h>
+#include <linux/kallsyms.h>
void default_unplug_io_fn(struct backing_dev_info *bdi, struct page *page)
{
@@ -729,6 +730,11 @@ EXPORT_SYMBOL(set_bdi_congested);
*/
long congestion_wait(int sync, long timeout)
{
+ unsigned long jiffies_start = jiffies;
+ char *module;
+ char buf[128];
+ const char *symbol;
+ unsigned long offset, symbolsize;
long ret;
DEFINE_WAIT(wait);
wait_queue_head_t *wqh = &congestion_wqh[sync];
@@ -736,6 +742,13 @@ long congestion_wait(int sync, long timeout)
prepare_to_wait(wqh, &wait, TASK_UNINTERRUPTIBLE);
ret = io_schedule_timeout(timeout);
finish_wait(wqh, &wait);
+
+ symbol = kallsyms_lookup(_RET_IP_, &symbolsize, &offset, &module, buf),
+ printk(KERN_INFO "%-20s congestion_wait sync=%d delay %lu timeout %ld\n",
+ symbol,
+ sync,
+ jiffies - jiffies_start,
+ timeout);
return ret;
}
EXPORT_SYMBOL(congestion_wait);
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo.DeleteThis@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/ |
|
| Back to top |
|
 |
|
|
|
You can post new topics in this forum You can reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
| |
|
|