Help!

2.6.32-rc1-git2: Reported regressions 2.6.30 -> 2.6.31

 
  

Goto page Previous  1, 2, 3, 4, 5, 6, 7, 8
Post new topic   General Reply to Topic (not reply to a specific post)    Forums Home -> Kernel RSS
Next:  [News] Openshot a Big Step Forward for GNU/Linux ..  
Author Message
Mel Gorman
External


Since: May 19, 2006
Posts: 253



PostPosted: Tue Oct 20, 2009 8:10 am    Post subject: Re: [Bug #14141] order 2 page allocation failures (generic) [Login to view extended thread Info.]
Archived from groups: linux>kernel (more info?)

On Mon, Oct 19, 2009 at 10:17:06PM +0200, Tobias Oetiker wrote:
> Hi Mel,
>
> Today Tobias Oetiker wrote:
>
> > Hi Mel,
> >
> > Today Mel Gorman wrote:
> >
> > > >
> > > > if you can send me a consolidated patch which does apply to
> > > > 2.6.31.4 I will be glad to try ...
> > > >
> > >
> > > Sure
> > >
> > > ==== CUT HERE ====
> > >
> > > From 6c0215af3b7c39ef7b8083ea38ca3ad93cd3f51f Mon Sep 17 00:00:00 2001
> > > From: Mel Gorman <mel.RemoveThis@csn.ul.ie>
> > > Date: Mon, 19 Oct 2009 15:40:43 +0100
> > > Subject: [PATCH] Kick off kswapd after direct reclaim and revert congestion changes
> > >
> > > The following patch is http://lkml.org/lkml/2009/10/16/89 on top of
> > > 2.6.31.4 as well as patches 373c0a7e and 8aa7e847 reverted.
> >
> > it seems to help ... the server has been running for 3 hours now
> > without incident, but then again it is not as active as during the
> > day, ... will report tomorrow.
>
> while I was writing, the system found that the patch does not realy
> help:
>
> Oct 19 22:09:52 johan kernel: [11157.121506] smtpd: page allocation failure. order:5, mode:0x4020 [kern.warning]
> Oct 19 22:09:52 johan kernel: [11157.121514] Pid: 19324, comm: smtpd Tainted: G D 2.6.31.4-oep #1 [kern.warning]
> Oct 19 22:09:52 johan kernel: [11157.121518] Call Trace: [kern.warning]
> Oct 19 22:09:52 johan kernel: [11157.121521] <IRQ> [<ffffffff810cb599>] __alloc_pages_nodemask+0x549/0x650 [kern.warning]
> Oct 19 22:09:52 johan kernel: [11157.121563] [<ffffffffa02bde3b>] ? __nf_ct_refresh_acct+0xab/0x110 [nf_conntrack] [kern.warning]
> Oct 19 22:09:52 johan kernel: [11157.121572] [<ffffffffa02a8337>] ? ipt_do_table+0x2f7/0x610 [ip_tables] [kern.warning]
> Oct 19 22:09:52 johan kernel: [11157.121580] [<ffffffff810fac18>] kmalloc_large_node+0x68/0xc0 [kern.warning]
> Oct 19 22:09:52 johan kernel: [11157.121585] [<ffffffff810fe90a>] __kmalloc_node_track_caller+0x11a/0x180 [kern.warning]
> Oct 19 22:09:52 johan kernel: [11157.121592] [<ffffffff813ebd42>] ? skb_copy+0x32/0xa0 [kern.warning]
> Oct 19 22:09:52 johan kernel: [11157.121596] [<ffffffff813e9606>] __alloc_skb+0x76/0x180 [kern.warning]
> Oct 19 22:09:52 johan kernel: [11157.121600] [<ffffffff813ebd42>] skb_copy+0x32/0xa0 [kern.warning]
> Oct 19 22:09:52 johan kernel: [11157.121615] [<ffffffffa07dd33c>] vboxNetFltLinuxPacketHandler+0x5c/0xd0 [vboxnetflt] [kern.warning]
> Oct 19 22:09:52 johan kernel: [11157.121620] [<ffffffff813f2512>] dev_hard_start_xmit+0x142/0x320 [kern.warning]

Are the number of failures at least reduced or are they occuring at the
same rate? Also, what was the last kernel that worked for you with this
configuration?

Thanks

> Oct 19 22:09:52 johan kernel: [11157.121632] [<ffffffff8140a2c1>] __qdisc_run+0x1a1/0x230 [kern.warning]
> Oct 19 22:09:52 johan kernel: [11157.121637] [<ffffffff813f41e0>] dev_queue_xmit+0x2b0/0x3a0 [kern.warning]
> Oct 19 22:09:52 johan kernel: [11157.121642] [<ffffffff8142349b>] ip_finish_output+0x11b/0x2f0 [kern.warning]
> Oct 19 22:09:52 johan kernel: [11157.121646] [<ffffffff814236f9>] ip_output+0x89/0xd0 [kern.warning]
> Oct 19 22:09:52 johan kernel: [11157.121650] [<ffffffff81422710>] ip_local_out+0x20/0x30 [kern.warning]
> Oct 19 22:09:52 johan kernel: [11157.121654] [<ffffffff81422ffb>] ip_queue_xmit+0x22b/0x3f0 [kern.warning]
>

--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo.RemoveThis@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Back to top
Tobias Oetiker
External


Since: Sep 15, 2009
Posts: 12



PostPosted: Tue Oct 20, 2009 8:10 am    Post subject: Re: [Bug #14141] order 2 page allocation failures (generic) [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

Hi Mel,

Today Mel Gorman wrote:

> On Mon, Oct 19, 2009 at 10:17:06PM +0200, Tobias Oetiker wrote:

> > Oct 19 22:09:52 johan kernel: [11157.121600] [<ffffffff813ebd42>] skb_copy+0x32/0xa0 [kern.warning]
> > Oct 19 22:09:52 johan kernel: [11157.121615] [<ffffffffa07dd33c>] vboxNetFltLinuxPacketHandler+0x5c/0xd0 [vboxnetflt] [kern.warning]
> > Oct 19 22:09:52 johan kernel: [11157.121620] [<ffffffff813f2512>] dev_hard_start_xmit+0x142/0x320 [kern.warning]
>
> Are the number of failures at least reduced or are they occuring at the
> same rate?

not that it would have any statistical significance, but I had 5
failure (clusters) yesterday morning and 5 this morning ...

the failures often show up in groups I saved one on
http://tobi.oetiker.ch/cluster-2009-10-20-08-31.txt

> Also, what was the last kernel that worked for you with this
> configuration?

that would be 2.6.24 ... I have not upgraded in quite some time.
But since the io performance of 2.6.31 is about double in my tests
I thought it would be a good thing todo ...

cheers
tobi

> Thanks
>
> > Oct 19 22:09:52 johan kernel: [11157.121632] [<ffffffff8140a2c1>] __qdisc_run+0x1a1/0x230 [kern.warning]
> > Oct 19 22:09:52 johan kernel: [11157.121637] [<ffffffff813f41e0>] dev_queue_xmit+0x2b0/0x3a0 [kern.warning]
> > Oct 19 22:09:52 johan kernel: [11157.121642] [<ffffffff8142349b>] ip_finish_output+0x11b/0x2f0 [kern.warning]
> > Oct 19 22:09:52 johan kernel: [11157.121646] [<ffffffff814236f9>] ip_output+0x89/0xd0 [kern.warning]
> > Oct 19 22:09:52 johan kernel: [11157.121650] [<ffffffff81422710>] ip_local_out+0x20/0x30 [kern.warning]
> > Oct 19 22:09:52 johan kernel: [11157.121654] [<ffffffff81422ffb>] ip_queue_xmit+0x22b/0x3f0 [kern.warning]
> >
>
>

--
Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten, Switzerland
http://it.oetiker.ch tobi.DeleteThis@oetiker.ch ++41 62 775 9902 / sb: -9900
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo.DeleteThis@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Back to top
Mel Gorman
External


Since: May 19, 2006
Posts: 253



PostPosted: Tue Oct 20, 2009 10:10 am    Post subject: Re: [Bug #14141] order 2 page allocation failures (generic) [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

On Tue, Oct 20, 2009 at 01:44:50PM +0200, Tobias Oetiker wrote:
> Hi Mel,
>
> Today Mel Gorman wrote:
>
> > On Mon, Oct 19, 2009 at 10:17:06PM +0200, Tobias Oetiker wrote:
>
> > > Oct 19 22:09:52 johan kernel: [11157.121600] [<ffffffff813ebd42>] skb_copy+0x32/0xa0 [kern.warning]
> > > Oct 19 22:09:52 johan kernel: [11157.121615] [<ffffffffa07dd33c>] vboxNetFltLinuxPacketHandler+0x5c/0xd0 [vboxnetflt] [kern.warning]
> > > Oct 19 22:09:52 johan kernel: [11157.121620] [<ffffffff813f2512>] dev_hard_start_xmit+0x142/0x320 [kern.warning]
> >
> > Are the number of failures at least reduced or are they occuring at the
> > same rate?
>
> not that it would have any statistical significance, but I had 5
> failure (clusters) yesterday morning and 5 this morning ...
>

Before the patches were applied, how many failures were you seeing in
the morning?

> the failures often show up in groups I saved one on
> http://tobi.oetiker.ch/cluster-2009-10-20-08-31.txt
>
> > Also, what was the last kernel that worked for you with this
> > configuration?
>
> that would be 2.6.24 ... I have not upgraded in quite some time.
> But since the io performance of 2.6.31 is about double in my tests
> I thought it would be a good thing todo ...
>

That significant a different in performance may explain differences in timing
as well. i.e. the allocator is being put under more pressure now than it
was previously as more processes make forward progress.

--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo DeleteThis @vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Back to top
Tobias Oetiker
External


Since: Sep 15, 2009
Posts: 12



PostPosted: Tue Oct 20, 2009 10:10 am    Post subject: Re: [Bug #14141] order 2 page allocation failures (generic) [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

Hi Mel,

Today Mel Gorman wrote:

> On Tue, Oct 20, 2009 at 01:44:50PM +0200, Tobias Oetiker wrote:
> > Hi Mel,
> >
> > Today Mel Gorman wrote:
> >
> > > On Mon, Oct 19, 2009 at 10:17:06PM +0200, Tobias Oetiker wrote:
> >
> > > > Oct 19 22:09:52 johan kernel: [11157.121600] [<ffffffff813ebd42>] skb_copy+0x32/0xa0 [kern.warning]
> > > > Oct 19 22:09:52 johan kernel: [11157.121615] [<ffffffffa07dd33c>] vboxNetFltLinuxPacketHandler+0x5c/0xd0 [vboxnetflt] [kern.warning]
> > > > Oct 19 22:09:52 johan kernel: [11157.121620] [<ffffffff813f2512>] dev_hard_start_xmit+0x142/0x320 [kern.warning]
> > >
> > > Are the number of failures at least reduced or are they occuring at the
> > > same rate?
> >
> > not that it would have any statistical significance, but I had 5
> > failure (clusters) yesterday morning and 5 this morning ...
> >
>
> Before the patches were applied, how many failures were you seeing in
> the morning?

5 as well ... before an after ...

> > the failures often show up in groups I saved one on
> > http://tobi.oetiker.ch/cluster-2009-10-20-08-31.txt
> >
> > > Also, what was the last kernel that worked for you with this
> > > configuration?
> >
> > that would be 2.6.24 ... I have not upgraded in quite some time.
> > But since the io performance of 2.6.31 is about double in my tests
> > I thought it would be a good thing todo ...
> >
>
> That significant a different in performance may explain differences in timing
> as well. i.e. the allocator is being put under more pressure now than it
> was previously as more processes make forward progress.

you are saing that the problem might be even older ?

we do have 8GB ram and 16 GB swap, so it should not fail to allocate all that
often

top - 14:58:34 up 19:54, 6 users, load average: 2.09, 1.94, 1.97
Tasks: 451 total, 1 running, 449 sleeping, 0 stopped, 1 zombie
Cpu(s): 3.5%us, 15.5%sy, 2.0%ni, 72.2%id, 6.5%wa, 0.1%hi, 0.3%si, 0.0%st
Mem: 8198504k total, 7599132k used, 599372k free, 1212636k buffers
Swap: 16777208k total, 83568k used, 16693640k free, 610136k cached


cheers
tobi

--
Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten, Switzerland
http://it.oetiker.ch tobi RemoveThis @oetiker.ch ++41 62 775 9902 / sb: -9900
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo RemoveThis @vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Back to top
Mel Gorman
External


Since: May 19, 2006
Posts: 253



PostPosted: Tue Oct 20, 2009 10:10 am    Post subject: Re: [Bug #14141] order 2 page allocation failures (generic) [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

On Tue, Oct 20, 2009 at 02:58:53PM +0200, Tobias Oetiker wrote:
> Hi Mel,
>
> Today Mel Gorman wrote:
>
> > On Tue, Oct 20, 2009 at 01:44:50PM +0200, Tobias Oetiker wrote:
> > > Hi Mel,
> > >
> > > Today Mel Gorman wrote:
> > >
> > > > On Mon, Oct 19, 2009 at 10:17:06PM +0200, Tobias Oetiker wrote:
> > >
> > > > > Oct 19 22:09:52 johan kernel: [11157.121600] [<ffffffff813ebd42>] skb_copy+0x32/0xa0 [kern.warning]
> > > > > Oct 19 22:09:52 johan kernel: [11157.121615] [<ffffffffa07dd33c>] vboxNetFltLinuxPacketHandler+0x5c/0xd0 [vboxnetflt] [kern.warning]
> > > > > Oct 19 22:09:52 johan kernel: [11157.121620] [<ffffffff813f2512>] dev_hard_start_xmit+0x142/0x320 [kern.warning]
> > > >
> > > > Are the number of failures at least reduced or are they occuring at the
> > > > same rate?
> > >
> > > not that it would have any statistical significance, but I had 5
> > > failure (clusters) yesterday morning and 5 this morning ...
> > >
> >
> > Before the patches were applied, how many failures were you seeing in
> > the morning?
>
> 5 as well ... before an after ...
>
> > > the failures often show up in groups I saved one on
> > > http://tobi.oetiker.ch/cluster-2009-10-20-08-31.txt
> > >
> > > > Also, what was the last kernel that worked for you with this
> > > > configuration?
> > >
> > > that would be 2.6.24 ... I have not upgraded in quite some time.
> > > But since the io performance of 2.6.31 is about double in my tests
> > > I thought it would be a good thing todo ...
> > >
> >
> > That significant a different in performance may explain differences in timing
> > as well. i.e. the allocator is being put under more pressure now than it
> > was previously as more processes make forward progress.
>
> you are saing that the problem might be even older ?
>
> we do have 8GB ram and 16 GB swap, so it should not fail to allocate all that
> often
>
> top - 14:58:34 up 19:54, 6 users, load average: 2.09, 1.94, 1.97
> Tasks: 451 total, 1 running, 449 sleeping, 0 stopped, 1 zombie
> Cpu(s): 3.5%us, 15.5%sy, 2.0%ni, 72.2%id, 6.5%wa, 0.1%hi, 0.3%si, 0.0%st
> Mem: 8198504k total, 7599132k used, 599372k free, 1212636k buffers
> Swap: 16777208k total, 83568k used, 16693640k free, 610136k cached
>

High-order atomic allocations of the type you are trying at that frequency
were always a very long shot. The most likely outcome is that something
has changed that means a burst of allocations trigger an allocation failure
where as before processes would delay long enough for the system not to notice.

1. Have MTU settings changed?
2. As order-5 allocations are required to succeed, I'm surprised in a
sense that there are only 5 failures because it implies the machine is
actually recovering and continueing on as normal. Can you think of what
happens in the morning that causes a burst of allocations to occur?
3. Other than the failures, have you noticed any other problems with the
machine or does it continue along happily?
4. Does the following patch help by any chance?

Thanks

==== CUT HERE ====
vmscan: Force kswapd to take notice faster when high-order watermarks are being hit

When a high-order allocation fails, kswapd is kicked so that it reclaims
at a higher-order to avoid direct reclaimers stall and to help GFP_ATOMIC
allocations. Something has changed in recent kernels that affect the timing
where high-order GFP_ATOMIC allocations are now failing with more frequency,
particularly under pressure. This patch forces kswapd to notice sooner that
high-order allocations are occuring by checking when watermarks are hit early
and by having kswapd restart quickly when the reclaim order is increased.

Not-signed-off-by-because-this-is-a-hatchet-job: Mel Gorman <mel RemoveThis @csn.ul.ie>
---
mm/page_alloc.c | 14 ++++++++++++--
mm/vmscan.c | 9 +++++++++
2 files changed, 21 insertions(+), 2 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 2fd7b20..fdbf8c9 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1907,6 +1906,17 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order,
zonelist, high_zoneidx, nodemask,
preferred_zone, migratetype);

+ /*
+ * If after a high-order allocation we are now below watermarks,
+ * pre-emptively kick kswapd rather than having the next allocation
+ * fail and have to wake up kswapd, potentially failing GFP_ATOMIC
+ * allocations or entering direct reclaim
+ */
+ if (unlikely(order) && page && !zone_watermark_ok(preferred_zone, order,
+ preferred_zone->watermark[ALLOC_WMARK_LOW],
+ zone_idx(preferred_zone), ALLOC_WMARK_LOW))
+ wake_all_kswapd(order, zonelist, high_zoneidx);
+
return page;
}
EXPORT_SYMBOL(__alloc_pages_nodemask);
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 9219beb..0e66a6b 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1925,6 +1925,15 @@ loop_again:
priority != DEF_PRIORITY)
continue;

+ /*
+ * Exit quickly to restart if it has been indicated
+ * that higher orders are required
+ */
+ if (pgdat->kswapd_max_order > order) {
+ all_zones_ok = 1;
+ goto out;
+ }
+
if (!zone_watermark_ok(zone, order,
high_wmark_pages(zone), end_zone, 0))
all_zones_ok = 0;
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo RemoveThis @vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Back to top
Tobias Oetiker
External


Since: Sep 15, 2009
Posts: 12



PostPosted: Tue Oct 20, 2009 11:10 am    Post subject: Re: [Bug #14141] order 2 page allocation failures (generic) [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

Hi Mel,

Today Mel Gorman wrote:

> On Tue, Oct 20, 2009 at 02:58:53PM +0200, Tobias Oetiker wrote:
> > you are saing that the problem might be even older ?
> >
> > we do have 8GB ram and 16 GB swap, so it should not fail to allocate all that
> > often
> >
> > top - 14:58:34 up 19:54, 6 users, load average: 2.09, 1.94, 1.97
> > Tasks: 451 total, 1 running, 449 sleeping, 0 stopped, 1 zombie
> > Cpu(s): 3.5%us, 15.5%sy, 2.0%ni, 72.2%id, 6.5%wa, 0.1%hi, 0.3%si, 0.0%st
> > Mem: 8198504k total, 7599132k used, 599372k free, 1212636k buffers
> > Swap: 16777208k total, 83568k used, 16693640k free, 610136k cached
> >
>
> High-order atomic allocations of the type you are trying at that frequency
> were always a very long shot. The most likely outcome is that something
> has changed that means a burst of allocations trigger an allocation failure
> where as before processes would delay long enough for the system not to notice.
>
> 1. Have MTU settings changed?

no not to my knowledge

> 2. As order-5 allocations are required to succeed, I'm surprised in a
> sense that there are only 5 failures because it implies the machine is
> actually recovering and continueing on as normal. Can you think of what
> happens in the morning that causes a burst of allocations to occur?

the burts occur all day while the machine is in use ... its just
that I was writing this at noon so only the morning had passed. So
I compared things to the day before ...

> 3. Other than the failures, have you noticed any other problems with the
> machine or does it continue along happily?

The machine seems to be fine.

> 4. Does the following patch help by any chance?

should I try this on vanilla 2.6.31.4 or ontop of your previous
patch?

we are running virtualbox 3.0.8 on this machine, virtualbox is using
the physical network interface in bridge mode access the network.
Could this have something todo with the problem ?

cheers
tobi

--
Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten, Switzerland
http://it.oetiker.ch tobi.TakeThisOut@oetiker.ch ++41 62 775 9902 / sb: -9900
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo.TakeThisOut@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Back to top
Mel Gorman
External


Since: May 19, 2006
Posts: 253



PostPosted: Tue Oct 20, 2009 11:10 am    Post subject: Re: [Bug #14141] order 2 page allocation failures (generic) [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

On Tue, Oct 20, 2009 at 03:50:12PM +0200, Tobias Oetiker wrote:
> Hi Mel,
>
> Today Mel Gorman wrote:
>
> > On Tue, Oct 20, 2009 at 02:58:53PM +0200, Tobias Oetiker wrote:
> > > you are saing that the problem might be even older ?
> > >
> > > we do have 8GB ram and 16 GB swap, so it should not fail to allocate all that
> > > often
> > >
> > > top - 14:58:34 up 19:54, 6 users, load average: 2.09, 1.94, 1.97
> > > Tasks: 451 total, 1 running, 449 sleeping, 0 stopped, 1 zombie
> > > Cpu(s): 3.5%us, 15.5%sy, 2.0%ni, 72.2%id, 6.5%wa, 0.1%hi, 0.3%si, 0.0%st
> > > Mem: 8198504k total, 7599132k used, 599372k free, 1212636k buffers
> > > Swap: 16777208k total, 83568k used, 16693640k free, 610136k cached
> > >
> >
> > High-order atomic allocations of the type you are trying at that frequency
> > were always a very long shot. The most likely outcome is that something
> > has changed that means a burst of allocations trigger an allocation failure
> > where as before processes would delay long enough for the system not to notice.
> >
> > 1. Have MTU settings changed?
>
> no not to my knowledge
>
> > 2. As order-5 allocations are required to succeed, I'm surprised in a
> > sense that there are only 5 failures because it implies the machine is
> > actually recovering and continueing on as normal. Can you think of what
> > happens in the morning that causes a burst of allocations to occur?
>
> the burts occur all day while the machine is in use ... its just
> that I was writing this at noon so only the morning had passed. So
> I compared things to the day before ...
>

Over the course of a day, how many would you see? By and large, it seems
that the problem yourself and Frans are similar except his is a lot more
severe.

> > 3. Other than the failures, have you noticed any other problems with the
> > machine or does it continue along happily?
>
> The machine seems to be fine.
>
> > 4. Does the following patch help by any chance?
>
> should I try this on vanilla 2.6.31.4 or ontop of your previous
> patch?
>

Try on top of vanilla 2.6.31.4 first plase and if failures still occur,
then on top of the previous patch.

> we are running virtualbox 3.0.8 on this machine, virtualbox is using
> the physical network interface in bridge mode access the network.
> Could this have something todo with the problem ?
>

I do not know for sure. I'm assuming the configuration is the same on
both kernels so it's unlikely to be the issue.

--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo DeleteThis @vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Back to top
Tobias Oetiker
External


Since: Sep 15, 2009
Posts: 12



PostPosted: Tue Oct 20, 2009 11:10 am    Post subject: Re: [Bug #14141] order 2 page allocation failures (generic) [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

Hi Mel,

Today Mel Gorman wrote:

>
> Over the course of a day, how many would you see? By and large, it seems
> that the problem yourself and Frans are similar except his is a lot more
> severe.

yesterday it was 19 for 24 hours, today it is 9 for 16 hours (day
is not done yet).

> Try on top of vanilla 2.6.31.4 first plase and if failures still occur,
> then on top of the previous patch.

ok

> > we are running virtualbox 3.0.8 on this machine, virtualbox is using
> > the physical network interface in bridge mode access the network.
> > Could this have something todo with the problem ?
> >
>
> I do not know for sure. I'm assuming the configuration is the same on
> both kernels so it's unlikely to be the issue.

just to be on the sure side I created a tickt with the virtualbox
people ... http://www.virtualbox.org/ticket/5260

cheers
tobi

--
Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten, Switzerland
http://it.oetiker.ch tobi RemoveThis @oetiker.ch ++41 62 775 9902 / sb: -9900
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo RemoveThis @vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Back to top
Karol Lewandowski
External


Since: Oct 02, 2009
Posts: 8



PostPosted: Wed Oct 21, 2009 5:10 pm    Post subject: [PATCH] SLUB: Don't drop __GFP_NOFAIL completely from allocate_slab() (was: Re: [Bug #14265] ifconfig: page allocation failure. order:5,ode:0x8020 w/ e100) [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

On Thu, Oct 01, 2009 at 09:56:04PM +0200, Rafael J. Wysocki wrote:
> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14265
> Subject : ifconfig: page allocation failure. order:5, mode:0x8020 w/ e100
> Submitter : Karol Lewandowski <karol.k.lewandowski.TakeThisOut@gmail.com>
> Date : 2009-09-15 12:05 (17 days old)
> References : http://marc.info/?l=linux-kernel&m=125301636509517&w=4

Guys, could anyone check if patch below helps? I think I've finally
found culprit of all allocation failures (but I might be wrong
too... Wink

Thanks.


commit d6849591e042bceb66f1b4513a1df6740d2ad762
Author: Karol Lewandowski <karol.k.lewandowski.TakeThisOut@gmail.com>
Date: Wed Oct 21 21:01:20 2009 +0200

SLUB: Don't drop __GFP_NOFAIL completely from allocate_slab()

Commit ba52270d18fb17ce2cf176b35419dab1e43fe4a3 unconditionally
cleared __GFP_NOFAIL flag on all allocations.

Preserve this flag on second attempt to allocate page (with possibly
decreased order).

This should help with bugs #14265, #14141 and similar.

Signed-off-by: Karol Lewandowski <karol.k.lewandowski.TakeThisOut@gmail.com>

diff --git a/mm/slub.c b/mm/slub.c
index b627675..ac5db65 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1084,7 +1084,7 @@ static struct page *allocate_slab(struct kmem_cache *s, gfp_t flags, int node)
{
struct page *page;
struct kmem_cache_order_objects oo = s->oo;
- gfp_t alloc_gfp;
+ gfp_t alloc_gfp, nofail;

flags |= s->allocflags;

@@ -1092,6 +1092,7 @@ static struct page *allocate_slab(struct kmem_cache *s, gfp_t flags, int node)
* Let the initial higher-order allocation fail under memory pressure
* so we fall-back to the minimum order allocation.
*/
+ nofail = flags & __GFP_NOFAIL;
alloc_gfp = (flags | __GFP_NOWARN | __GFP_NORETRY) & ~__GFP_NOFAIL;

page = alloc_slab_page(alloc_gfp, node, oo);
@@ -1100,8 +1101,10 @@ static struct page *allocate_slab(struct kmem_cache *s, gfp_t flags, int node)
/*
* Allocation may have failed due to fragmentation.
* Try a lower order alloc if possible
+ *
+ * Preserve __GFP_NOFAIL flag if previous allocation failed.
*/
- page = alloc_slab_page(flags, node, oo);
+ page = alloc_slab_page(flags | nofail, node, oo);
if (!page)
return NULL;

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo.TakeThisOut@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Back to top
David Rientjes
External


Since: Jan 29, 2007
Posts: 178



PostPosted: Wed Oct 21, 2009 6:10 pm    Post subject: Re: [PATCH] SLUB: Don't drop __GFP_NOFAIL completely from allocate_slab() (was: Re: [Bug #14265] ifconfig: page allocation failure. order:5,ode:0x8020 w/ e100) [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

On Wed, 21 Oct 2009, Karol Lewandowski wrote:

> commit d6849591e042bceb66f1b4513a1df6740d2ad762
> Author: Karol Lewandowski <karol.k.lewandowski.TakeThisOut@gmail.com>
> Date: Wed Oct 21 21:01:20 2009 +0200
>
> SLUB: Don't drop __GFP_NOFAIL completely from allocate_slab()
>
> Commit ba52270d18fb17ce2cf176b35419dab1e43fe4a3 unconditionally
> cleared __GFP_NOFAIL flag on all allocations.
>

No, it clears __GFP_NOFAIL from the first allocation of oo_order(s->oo).
If that fails (and it's easy to fail, it has __GFP_NORETRY), another
allocation is attempted with oo_order(s->min), for which __GFP_NOFAIL
would be preserved if that's the slab cache's allocflags.

> Preserve this flag on second attempt to allocate page (with possibly
> decreased order).
>
> This should help with bugs #14265, #14141 and similar.
>
> Signed-off-by: Karol Lewandowski <karol.k.lewandowski.TakeThisOut@gmail.com>
>
> diff --git a/mm/slub.c b/mm/slub.c
> index b627675..ac5db65 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -1084,7 +1084,7 @@ static struct page *allocate_slab(struct kmem_cache *s, gfp_t flags, int node)
> {
> struct page *page;
> struct kmem_cache_order_objects oo = s->oo;
> - gfp_t alloc_gfp;
> + gfp_t alloc_gfp, nofail;
>
> flags |= s->allocflags;
>
> @@ -1092,6 +1092,7 @@ static struct page *allocate_slab(struct kmem_cache *s, gfp_t flags, int node)
> * Let the initial higher-order allocation fail under memory pressure
> * so we fall-back to the minimum order allocation.
> */
> + nofail = flags & __GFP_NOFAIL;
> alloc_gfp = (flags | __GFP_NOWARN | __GFP_NORETRY) & ~__GFP_NOFAIL;
>
> page = alloc_slab_page(alloc_gfp, node, oo);
> @@ -1100,8 +1101,10 @@ static struct page *allocate_slab(struct kmem_cache *s, gfp_t flags, int node)
> /*
> * Allocation may have failed due to fragmentation.
> * Try a lower order alloc if possible
> + *
> + * Preserve __GFP_NOFAIL flag if previous allocation failed.
> */
> - page = alloc_slab_page(flags, node, oo);
> + page = alloc_slab_page(flags | nofail, node, oo);
> if (!page)
> return NULL;
>
>

This does nothing. You may have missed that the lower order allocation is
passing 'flags' (which is a union of the gfp flags passed to
allocate_slab() based on the allocation context and the cache's
allocflags), and not alloc_gfp where __GFP_NOFAIL is masked.

Nack.

Note: slub isn't going to be a culprit in order 5 allocation failures
since they have kmalloc passthrough to the page allocator.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo.TakeThisOut@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Back to top
Karol Lewandowski
External


Since: Oct 02, 2009
Posts: 8



PostPosted: Wed Oct 21, 2009 6:10 pm    Post subject: Re: [PATCH] SLUB: Don't drop __GFP_NOFAIL completely from allocate_slab() (was: Re: [Bug #14265] ifconfig: page allocation failure. order:5,ode:0x8020 w/ e100) [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

On Wed, Oct 21, 2009 at 02:06:41PM -0700, David Rientjes wrote:
> On Wed, 21 Oct 2009, Karol Lewandowski wrote:
>
> > commit d6849591e042bceb66f1b4513a1df6740d2ad762
> > Author: Karol Lewandowski <karol.k.lewandowski DeleteThis @gmail.com>
> > Date: Wed Oct 21 21:01:20 2009 +0200
> >
> > SLUB: Don't drop __GFP_NOFAIL completely from allocate_slab()
> >
> > Commit ba52270d18fb17ce2cf176b35419dab1e43fe4a3 unconditionally
> > cleared __GFP_NOFAIL flag on all allocations.
> >
>
> No, it clears __GFP_NOFAIL from the first allocation of oo_order(s->oo).
> If that fails (and it's easy to fail, it has __GFP_NORETRY), another
> allocation is attempted with oo_order(s->min), for which __GFP_NOFAIL
> would be preserved if that's the slab cache's allocflags.

Right, patch is junk.

However, I haven't been able to trigger failures since I've switched
to SLAB allocator. That patch seemed related (and wrong), but it
wasn't.

> > */
> > - page = alloc_slab_page(flags, node, oo);
> > + page = alloc_slab_page(flags | nofail, node, oo);
> > if (!page)
> > return NULL;
> >
> >
>
> This does nothing. You may have missed that the lower order allocation is
> passing 'flags' (which is a union of the gfp flags passed to
> allocate_slab() based on the allocation context and the cache's
> allocflags), and not alloc_gfp where __GFP_NOFAIL is masked.

Right, I missed that.

> Nack.
>
> Note: slub isn't going to be a culprit in order 5 allocation failures
> since they have kmalloc passthrough to the page allocator.

However, it might change fragmentation somewhat I guess. This might
make problem more/less visible.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo DeleteThis @vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Back to top
Tobias Oetiker
External


Since: Sep 15, 2009
Posts: 12



PostPosted: Thu Oct 22, 2009 7:10 am    Post subject: Re: [Bug #14141] order 2 page allocation failures (generic) [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

Hi Mel,

Tuesday Mel Gorman wrote:
> 4. Does the following patch help by any chance?
>
> Thanks
>
> ==== CUT HERE ====
> vmscan: Force kswapd to take notice faster when high-order watermarks are being hit
>
> When a high-order allocation fails, kswapd is kicked so that it reclaims
> at a higher-order to avoid direct reclaimers stall and to help GFP_ATOMIC
> allocations. Something has changed in recent kernels that affect the timing
> where high-order GFP_ATOMIC allocations are now failing with more frequency,
> particularly under pressure. This patch forces kswapd to notice sooner that
> high-order allocations are occuring by checking when watermarks are hit early
> and by having kswapd restart quickly when the reclaim order is increased.
>
> Not-signed-off-by-because-this-is-a-hatchet-job: Mel Gorman <mel RemoveThis @csn.ul.ie>
> ---

it does seem to help ... I have been running it from 6am to 12am on
our server now and have not yet seen any issues ...

will shout if I do ...

cheers
tobi

--
Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten, Switzerland
http://it.oetiker.ch tobi RemoveThis @oetiker.ch ++41 62 775 9902 / sb: -9900
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo RemoveThis @vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Back to top
Mel Gorman
External


Since: May 19, 2006
Posts: 253



PostPosted: Thu Oct 22, 2009 7:10 am    Post subject: Re: [PATCH] SLUB: Don't drop __GFP_NOFAIL completely from allocate_slab() (was: Re: [Bug #14265] ifconfig: page allocation failure. order:5,ode:0x8020 w/ e100) [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

On Wed, Oct 21, 2009 at 11:20:34PM +0200, Karol Lewandowski wrote:
> On Wed, Oct 21, 2009 at 02:06:41PM -0700, David Rientjes wrote:
> > On Wed, 21 Oct 2009, Karol Lewandowski wrote:
> >
> > > commit d6849591e042bceb66f1b4513a1df6740d2ad762
> > > Author: Karol Lewandowski <karol.k.lewandowski.RemoveThis@gmail.com>
> > > Date: Wed Oct 21 21:01:20 2009 +0200
> > >
> > > SLUB: Don't drop __GFP_NOFAIL completely from allocate_slab()
> > >
> > > Commit ba52270d18fb17ce2cf176b35419dab1e43fe4a3 unconditionally
> > > cleared __GFP_NOFAIL flag on all allocations.
> > >
> >
> > No, it clears __GFP_NOFAIL from the first allocation of oo_order(s->oo).
> > If that fails (and it's easy to fail, it has __GFP_NORETRY), another
> > allocation is attempted with oo_order(s->min), for which __GFP_NOFAIL
> > would be preserved if that's the slab cache's allocflags.
>
> Right, patch is junk.
>
> However, I haven't been able to trigger failures since I've switched
> to SLAB allocator. That patch seemed related (and wrong), but it
> wasn't.
>

Interesting. Pekka, I looked for SLUB commits in the 2.6.30..2.6.31
range for patches that might affect what order of pages SLUB allocates
but didn't spot anything obvious. Can you think of any changes that
might have altered how SLUB uses memory?

> > > */
> > > - page = alloc_slab_page(flags, node, oo);
> > > + page = alloc_slab_page(flags | nofail, node, oo);
> > > if (!page)
> > > return NULL;
> > >
> > >
> >
> > This does nothing. You may have missed that the lower order allocation is
> > passing 'flags' (which is a union of the gfp flags passed to
> > allocate_slab() based on the allocation context and the cache's
> > allocflags), and not alloc_gfp where __GFP_NOFAIL is masked.
>
> Right, I missed that.
>
> > Nack.
> >
> > Note: slub isn't going to be a culprit in order 5 allocation failures
> > since they have kmalloc passthrough to the page allocator.
>
> However, it might change fragmentation somewhat I guess. This might
> make problem more/less visible.
>

Did you have CONFIG_KMEMCHECK set by any chance?

--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo.RemoveThis@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Back to top
Karol Lewandowski
External


Since: Oct 02, 2009
Posts: 8



PostPosted: Thu Oct 22, 2009 6:10 pm    Post subject: Re: [PATCH] SLUB: Don't drop __GFP_NOFAIL completely from allocate_slab() (was: Re: [Bug #14265] ifconfig: page allocation failure. order:5,ode:0x8020 w/ e100) [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

On Thu, Oct 22, 2009 at 11:20:14AM +0100, Mel Gorman wrote:
> On Wed, Oct 21, 2009 at 11:20:34PM +0200, Karol Lewandowski wrote:
> > > Note: slub isn't going to be a culprit in order 5 allocation failures
> > > since they have kmalloc passthrough to the page allocator.
> >
> > However, it might change fragmentation somewhat I guess. This might
> > make problem more/less visible.
> >
>
> Did you have CONFIG_KMEMCHECK set by any chance?

No, kmemcheck (and kmemleak) was always disabled.

It's likely that's possible to trigger allocation failures with slab,
I just haven't been successful at it. Lack of good testcase is really
problem here -- even if I can't trigger failures I can never be sure
that these wont appear in some strange moment.

BTW I'll test your patches (from another thread) shortly.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo.TakeThisOut@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Back to top
Frans Pop
External


Since: May 04, 2006
Posts: 460



PostPosted: Sun Oct 25, 2009 4:10 pm    Post subject: Re: [Bug #14141] order 2 page allocation failures in iwlagn [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

Sorry for the delayed reply.

On Monday 19 October 2009, Chris Mason wrote:
> On Mon, Oct 19, 2009 at 03:01:52PM +0100, Mel Gorman wrote:
> > > During the 2nd phase I see the first SKB allocation errors with a
> > > music skip between reading commits 95.000 and 110.000.
> > > About commit 115.000 there is a very long pause during which the
> > > counter does not increase, music stops and the desktop freezes
> > > completely. The first 30 seconds of that freeze there is only very
> > > low disk activity (which seems strange);
> >
> > I'm just going to have to depend on Jens here. Jens, the
> > congestion_wait() is on BLK_RW_ASYNC after the commit. Reclaim usually
> > writes pages asynchronously but lumpy reclaim actually waits of pages
> > to write out synchronously so it's not always async.
>
> Waiting doesn't make it synchronous from the elevator point of view Wink
> If you're using WB_SYNC_NONE, it's a async write.  WB_SYNC_ALL makes it
> a sync write.  I only see WB_SYNC_NONE in vmscan.c, so we should be
> using the async congestion wait.  (the exception is xfs which always
> does async writes).
>
> But I'm honestly not 100% sure.  Looking back through the emails, the
> test case is doing IO on top of a whole lot of things on top of
> dm-crypt?  I just tried to figure out if dm-crypt is turning the async
> IO into sync IOs, but didn't quite make sense of it.
>
> Could you also please include which filesystems were being abused during
> the test and how?  Reading through the emails, I think you've got:
>
> gitk being run 3 times on some FS (NFS?)

gitk is run on an ext3 logical volume in a volume group that's on a LUKS
encrypted partition of the local hard disk.

So it's: SATA harddisk -> dm-crypt (dmsetup) -> LVM (lvm2) -> ext3

> streaming reads on NFS

Correct. My music share is a remote (nfs4) read-only mounted ext3
partition.

> swap on dm-crypt

Correct. Swap is another logical volume in the same volume group as
mentioned above.

So kcrypt gets to (de)encrypt both the gitk data *and* any swapping caused
by that [1].

> If other filesystems are being used, please correct me.  Also please
> include if they are on crypto or straight block device.

All my file systems are ext3. Nothing newfangled or exotic Wink
There are some bind mounts involved, but I expect that's transparent.

Cheers,
FJP

[1] I've plans to move some of my data outside the encrypted volume, but
currently everything except /boot is in the encrypted VG.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo.RemoveThis@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Back to top
Frans Pop
External


Since: May 04, 2006
Posts: 460



PostPosted: Tue Oct 27, 2009 10:10 am    Post subject: Re: [Bug #14141] order 2 page allocation failures in iwlagn [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

Sorry for the delay in replying.

On Saturday 17 October 2009, reinette chatre wrote:
> Prompted by this thread we are in process of moving allocation to paged
> skb. This will definitely reduce the allocation size (from order 2 to
> order 1) and hopefully help with this problem also. Could you please try
> with the attached two patches? They are based on 2.6.32-rc4.

Looks very good! With these patches I no longer get any SKB allocation
errors, even during the heaviest freezes while gitk is loading. I do still
get (long) music skips during the freezes, but that's not unexpected.
AFAICT the wireless connection is stable.

Tested on top of current mainline git: v2.6.32-rc5-81-g964fe08.

Please add, if you feel it's appropriate, my:
Reported-and-tested-by: Frans Pop <elendil.RemoveThis@planet.nl>

Cheers,
FJP
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo.RemoveThis@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Back to top
Mel Gorman
External


Since: May 19, 2006
Posts: 253



PostPosted: Tue Oct 27, 2009 1:10 pm    Post subject: Re: [Bug #14141] order 2 page allocation failures in iwlagn [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

On Tue, Oct 27, 2009 at 02:54:35PM +0000, Mel Gorman wrote:
> On Mon, Oct 26, 2009 at 10:06:09PM +0100, Frans Pop wrote:
> > On Tuesday 20 October 2009, Mel Gorman wrote:
> > > I've attached a patch below that should allow us to cheat. When it's
> > > applied, it outputs who called congestion_wait(), how long the timeout
> > > was and how long it waited for. By comparing before and after sleep
> > > times, we should be able to see which of the callers has significantly
> > > changed and if it's something easily addressable.
> >
> > The results from this look fairly interesting (although I may be a bad
> > judge as I don't really know what I'm looking at Wink.
> >
> > I've tested with two kernels:
> > 1) 2.6.31.1: 1 test run
> > 2) 2.6.31.1 + congestion_wait() reverts: 2 test runs
> >
> > The 1st kernel had the expected "freeze" while reading commits in gitk;
> > reading commits with the 2nd kernel was more fluent.
> > I did 2 runs with the 2nd kernel as the first run had a fairly long music
> > skip and more SKB errors than expected. The second run was fairly normal
> > with no music skips at all even though it had a few SKB errors.
> >
> > Data for the tests:
> > 1st kernel 2nd kernel 1 2nd kernel 2
> > end reading commits 1:15 1:00 0:55
> > "freeze" yes no no
> > branch data shown 1:55 1:15 1:10
> > system quiet 2:25 1:50 1:45
> > # SKB allocation errors 10 53 5
> >
> > Note that the test is substantially faster with the 2nd kernel and that the
> > SKB errors don't really affect the duration of the test.
> >
>
> Ok. I think that despite expectations, the writeback changes have
> changed the timing significantly enough to be worth examining closer.
>
> >
> > - without the revert 'background_writeout' is called a lot less frequently,
> > but when it's called it gets long delays
> > - without the revert you have 'wb_kupdate', which is relatively expensive
> > - with the revert 'shrink_list' is relatively expensive, although not
> > really in absolute terms
> >
>
> Lets look at the callers that waited in congestion_wait() for at least
> 25 jiffies.
>
> 2.6.31.1-async-sync-congestion-wait i.e. vanilla kernel
> generated with: cat kern.log_1_test | awk -F ] '{print $2}' | sort -k 5 -n | uniq -c
> 24 background_writeout congestion_wait sync=0 delay 25 timeout 25
> 203 kswapd congestion_wait sync=0 delay 25 timeout 25
> 5 shrink_list congestion_wait sync=0 delay 25 timeout 25
> 155 try_to_free_pages congestion_wait sync=0 delay 25 timeout 25
> 145 wb_kupdate congestion_wait sync=0 delay 25 timeout 25
> 2 kswapd congestion_wait sync=0 delay 26 timeout 25
> 8 wb_kupdate congestion_wait sync=0 delay 26 timeout 25
> 1 try_to_free_pages congestion_wait sync=0 delay 54 timeout 25
>
> 2.6.31.1-write-congestion-wait i.e. kernel with patch reverted
> generated with: cat kern.log_2.1_test | awk -F ] '{print $2}' | sort -k 5 -n | uniq -c
> 2 background_writeout congestion_wait rw=1 delay 25 timeout 25
> 188 kswapd congestion_wait rw=1 delay 25 timeout 25
> 14 shrink_list congestion_wait rw=1 delay 25 timeout 25
> 181 try_to_free_pages congestion_wait rw=1 delay 25 timeout 25
> 5 kswapd congestion_wait rw=1 delay 26 timeout 25
> 10 try_to_free_pages congestion_wait rw=1 delay 26 timeout 25
> 3 try_to_free_pages congestion_wait rw=1 delay 27 timeout 25
> 1 kswapd congestion_wait rw=1 delay 29 timeout 25
> 1 __alloc_pages_nodemask congestion_wait rw=1 delay 30 timeout 5
> 1 try_to_free_pages congestion_wait rw=1 delay 31 timeout 25
> 1 try_to_free_pages congestion_wait rw=1 delay 35 timeout 25
> 1 kswapd congestion_wait rw=1 delay 51 timeout 25
> 1 try_to_free_pages congestion_wait rw=1 delay 56 timeout 25
>
> So, wb_kupdate and background_writeout are the big movers in terms of waiting,
> not the direct reclaimers which is what we were expecting. Of those big
> movers, wb_kupdate is the most interested because compare the following
>

Bah, this part is right, but I got the next section the wrong way
around. I should have renamed the damn things instead of remember what
was 1 and what was 2.

1 == vanilla
2 == with-revert

> $ cat kern.log_2.1_test | awk -F ] '{print $2}' | sort -k 5 -n | uniq -c | grep wb_kup
> [ no output ]
> $ $ cat kern.log_1_test | awk -F ] '{print $2}' | sort -k 5 -n | uniq -c | grep wb_kup
> 1 wb_kupdate congestion_wait sync=0 delay 15 timeout 25
> 1 wb_kupdate congestion_wait sync=0 delay 23 timeout 25
> 145 wb_kupdate congestion_wait sync=0 delay 25 timeout 25
> 8 wb_kupdate congestion_wait sync=0 delay 26 timeout 25
>
> The vanilla kernel is not waiting in wb_kupdate at all.
>

The vanilla kernel *is* waiting. The reverted kernel is not. If my patch
makes any difference, it's not for the right reasons.

> Jens, before the congestion_wait() changes, wb_kupdate was waiting on
> congestion and afterwards it's not. Furthermore, look at the number of pages
> that are queued for writeback in the two page allocation failure reports.
>
> without-revert: writeback:65653
> with-revert: writeback:21713
>

and got it back right again.

kernel 1 == vanilla kernel == without-revert writeback:65653
kernel 2 == revert kernel == with-revert writeback:21713

> So, after the move to async/sync, a lot more pages are getting queued
> for writeback - more than three times the number of pages are queued for
> writeback with the vanilla kernel. This amount of congestion might be why
> direct reclaimers and kswapd's timings have changed so much.
>

Or more accurately, the vanilla kernel has queued up a lot more pages for
IO than when the patch is reverted. I'm not seeing yet why this is.

> Chris Mason hinted at this but I didn't quite "get it" at the time but is it
> possible that writeback_inodes() is converting what is expected to be async
> IO into sync IO? One way of checking this is if Frans could test the patch
> below that makes wb_kupdate wait on sync instead of async.
>

This reasoning is rubbish. If the patch makes any difference, it's because
it changes timing. It's probably more important to figure out if a) if the
different number of pages for writeback is relevant and if so b) why has
it changed.

--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo.RemoveThis@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Back to top
Chris Mason
External


Since: Sep 18, 2006
Posts: 87



PostPosted: Tue Oct 27, 2009 1:10 pm    Post subject: Re: [Bug #14141] order 2 page allocation failures in iwlagn [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

On Tue, Oct 27, 2009 at 03:52:24PM +0000, Mel Gorman wrote:
>
> > So, after the move to async/sync, a lot more pages are getting queued
> > for writeback - more than three times the number of pages are queued for
> > writeback with the vanilla kernel. This amount of congestion might be why
> > direct reclaimers and kswapd's timings have changed so much.
> >
>
> Or more accurately, the vanilla kernel has queued up a lot more pages for
> IO than when the patch is reverted. I'm not seeing yet why this is.

[ sympathies over confusion about congestion...lots of variables here ]

If wb_kupdate has been able to queue more writes it is because the
congestion logic isn't stopping it. We have congestion_wait(), but
before calling that in the writeback paths it says: are you congested?
and then backs off if the answer is yes.

Ideally, direct reclaim will never do writeback. We want it to be able
to find clean pages that kupdate and friends have already processed.

Waiting for congestion is a funny thing, it only tells us the device has
managed to finish some IO or that a timeout has passed. Neither event has
any relation to figuring out if the IO for reclaimable pages has
finished.

One option is to have the VM remember the hashed waitqueue for one of
the pages it direct reclaims and then wait on it.

-chris

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo RemoveThis @vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Back to top
reinette chatre
External


Since: Mar 27, 2009
Posts: 10



PostPosted: Tue Oct 27, 2009 1:10 pm    Post subject: Re: [Bug #14141] order 2 page allocation failures in iwlagn [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

Hi Frans,

On Tue, 2009-10-27 at 04:10 -0700, Frans Pop wrote:
> Sorry for the delay in replying.
>
> On Saturday 17 October 2009, reinette chatre wrote:
> > Prompted by this thread we are in process of moving allocation to paged
> > skb. This will definitely reduce the allocation size (from order 2 to
> > order 1) and hopefully help with this problem also. Could you please try
> > with the attached two patches? They are based on 2.6.32-rc4.
>
> Looks very good! With these patches I no longer get any SKB allocation
> errors, even during the heaviest freezes while gitk is loading. I do still
> get (long) music skips during the freezes, but that's not unexpected.
> AFAICT the wireless connection is stable.
>
> Tested on top of current mainline git: v2.6.32-rc5-81-g964fe08.
>
> Please add, if you feel it's appropriate, my:
> Reported-and-tested-by: Frans Pop <elendil RemoveThis @planet.nl>

Thank you very much for testing these patches so thoroughly. They are
both on their way upstream already so I am not able to add your
signature at this time. Since these are pretty big changes these patches
will be in 2.6.33.

Reinette


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo RemoveThis @vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Back to top
Frans Pop
External


Since: May 04, 2006
Posts: 460



PostPosted: Tue Oct 27, 2009 2:10 pm    Post subject: Re: [Bug #14141] order 2 page allocation failures in iwlagn [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

On Tuesday 27 October 2009, Chris Mason wrote:
> On Tue, Oct 27, 2009 at 03:52:24PM +0000, Mel Gorman wrote:
> > > So, after the move to async/sync, a lot more pages are getting
> > > queued for writeback - more than three times the number of pages are
> > > queued for writeback with the vanilla kernel. This amount of
> > > congestion might be why direct reclaimers and kswapd's timings have
> > > changed so much.
> >
> > Or more accurately, the vanilla kernel has queued up a lot more pages
> > for IO than when the patch is reverted. I'm not seeing yet why this
> > is.
>
> [ sympathies over confusion about congestion...lots of variables here ]
>
> If wb_kupdate has been able to queue more writes it is because the
> congestion logic isn't stopping it. We have congestion_wait(), but
> before calling that in the writeback paths it says: are you congested?
> and then backs off if the answer is yes.
>
> Ideally, direct reclaim will never do writeback. We want it to be able
> to find clean pages that kupdate and friends have already processed.
>
> Waiting for congestion is a funny thing, it only tells us the device has
> managed to finish some IO or that a timeout has passed. Neither event
> has any relation to figuring out if the IO for reclaimable pages has
> finished.
>
> One option is to have the VM remember the hashed waitqueue for one of
> the pages it direct reclaims and then wait on it.

What people should be aware of is the behavior of the system I see at this
point. I've already mentioned this in other mails, but it's probably good
to repeat it here.

While gitk is reading commits with vanilla .31 and .32 kernels there is at
some point a fairly long period (10-20 seconds) where I see:
- a completely frozen desktop, including frozen mouse cursor
- really very little disk activity (HD led flashes very briefly less than
once per second)
- reading commits stops completely during this period
- no music.
After that there is a period (another 5-15 seconds) with a huge amount of
disk activity during which the system gradually becomes responsive again
and in gitk the count of commits that have been read starts increasing
again (without a jump in the counter which confirms that no commits were
read during the freeze).

I cannot really tell what the system is doing during those freezes. Because
of the frozen desktop I cannot for example see CPU usage. I suspect that,
as there is hardly any disk activity, the system must be reorganizing RAM
or something. But it seems quite bad that that gets "bunched up" instead
of happening more gradually.

With the congestion_wait() change reverted I never see these freezes, only
much more normal minor latencies (< 2 seconds; mostly < 0.5 seconds),
which is probably unavoidable during heavy swapping.

Hth,
FJP
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo RemoveThis @vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Back to top
Display posts from previous:   
Post new topic   General Reply to Topic (not reply to a specific post)    Forums Home -> Kernel All times are: Eastern Time (US & Canada) (change)
Goto page Previous  1, 2, 3, 4, 5, 6, 7, 8
Page 7 of 8

 
You can post new topics in this forum
You can reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum