|
|
| Next: Grub options |
| Author |
Message |
Martin Gregorie External

Since: Aug 20, 2005 Posts: 284
|
Posted: Sun Jun 10, 2007 1:09 pm Post subject: What's an oom-killer when its at home? Archived from groups: uk>comp>os>linux (more info?) |
|
|
Yesterday, after rebooting FC6 under the 2.6.20-1.2952.fc6 kernel I
started to see slowdowns to the point of total unresponsiveness with
page rates going sky-high - all apparently associated with a kernel
internal feature called oom-killer. Its apparently been part of the
kernel for a while, but I haven't noticed it bursting into life before,
so what's changed with this kernel version to make it so much more
noticeable?
--
martin@ | Martin Gregorie
gregorie. | Essex, UK
org | |
|
| Back to top |
|
 |
Gordon Henderson External

Since: Jun 10, 2007 Posts: 19
|
Posted: Sun Jun 10, 2007 1:22 pm Post subject: Re: What's an oom-killer when its at home? [Login to view extended thread Info.] Archived from groups: per prev. post (more info?) |
|
|
In article <m96tj4-90o.ln1 DeleteThis @zoogz.gregorie.org>,
Martin Gregorie <martin DeleteThis @see.sig.for.address> wrote:
>Yesterday, after rebooting FC6 under the 2.6.20-1.2952.fc6 kernel I
>started to see slowdowns to the point of total unresponsiveness with
>page rates going sky-high - all apparently associated with a kernel
>internal feature called oom-killer. Its apparently been part of the
>kernel for a while, but I haven't noticed it bursting into life before,
>so what's changed with this kernel version to make it so much more
>noticeable?
Out Of Memory killer.
The kernel is struggling, no more memory and swap is full, so it will pick
what it feels is the best process to kill off and kill it, to free up
some memory.
This is vaguely configurable - somewhere.
If you get it, you either have some really rogue processes or you really
do need to increase memory and/or swap ...
Run 'top' then push capital-M to sort by memory usage.
Gordon |
|
| Back to top |
|
 |
Andy Burns External

Since: Nov 19, 2006 Posts: 60
|
Posted: Sun Jun 10, 2007 4:08 pm Post subject: Re: What's an oom-killer when its at home? [Login to view extended thread Info.] Archived from groups: per prev. post (more info?) |
|
|
On 10/06/2007 14:22, Gordon Henderson wrote:
> Out Of Memory killer.
>
> The kernel is struggling, no more memory and swap is full, so it will pick
> what it feels is the best process to kill off
Where "best" usually means the one that is guaranteed to get your
attention pretty quickly because it would be the *last* process you
would choose to have killed
Anyway the kernel is between a rock and a hard place when it decides to
deploy the OOMK, the other choice would be to abend. |
|
| Back to top |
|
 |
Martin Gregorie External

Since: Aug 20, 2005 Posts: 284
|
Posted: Sun Jun 10, 2007 9:20 pm Post subject: Re: What's an oom-killer when its at home? [Login to view extended thread Info.] Archived from groups: per prev. post (more info?) |
|
|
Andy Burns wrote:
> On 10/06/2007 14:22, Gordon Henderson wrote:
>
>> Out Of Memory killer.
>>
>> The kernel is struggling, no more memory and swap is full, so it will
>> pick
>> what it feels is the best process to kill off
>
> Where "best" usually means the one that is guaranteed to get your
> attention pretty quickly because it would be the *last* process you
> would choose to have killed
>
> Anyway the kernel is between a rock and a hard place when it decides to
> deploy the OOMK, the other choice would be to abend.
>
Thanks for the information, guys.
The mystery for me is why oom-killer has suddenly burst into life - my
workload hasn't changed appreciably. In this case I was running my usual
mix of services and BOINC was running Malaria Control niced in the
background. Doing an rsync backup seems to have triggered the problem.
Fedora docs say that oom-killer should pick on big niced programs first,
which would make Malaria-control its prime target. However, that
continued to run according to TOP and, infuriatingly the syslog only
shows what process triggered oom-killer but not what was killed off.
Is it possible that oom-killer and BOINC were fighting over whether
Malaria-control was meant to be running?
If so that's bad news. I have a Java program under development that
co-existed happily with everything else last time I test ran it. The
usual workload includes named, which it interrogates for MX information.
The program is capable of oversubscribing memory all by itself: it has
successfully run to completion despite expanding its heap so it occupies
up to 400 MB of virtual memory despite my box only having 256 MB RAM and
1 GB of swap space.
--
martin@ | Martin Gregorie
gregorie. | Essex, UK
org | |
|
| Back to top |
|
 |
Nix External

Since: Jul 29, 2004 Posts: 686
|
Posted: Mon Jun 11, 2007 8:54 pm Post subject: Re: What's an oom-killer when its at home? [Login to view extended thread Info.] Archived from groups: per prev. post (more info?) |
|
|
On 10 Jun 2007, Martin Gregorie verbalised:
> infuriatingly the syslog only shows what process triggered oom-killer
> but not what was killed off.
If it actually kills anything it will tell you, e.g.
May 3 00:31:27 loki: kernel: postgres invoked oom-killer: gfp_mask=0x201d2, order=0, oomkilladj=0
May 3 00:31:27 loki: kernel: [<c012b051>] out_of_memory+0x69/0x15c
May 3 00:31:27 loki: kernel: [<c012c217>] __alloc_pages+0x1fc/0x286
May 3 00:31:27 loki: kernel: [<c012d508>] __do_page_cache_readahead+0x79/0x181
May 3 00:31:27 loki: kernel: [<c02eb0a0>] io_schedule+0xe/0x16
May 3 00:31:27 loki: kernel: [<c0128626>] sync_page+0x0/0x3b
May 3 00:31:27 loki: kernel: [<c02eb29b>] __wait_on_bit_lock+0x4a/0x51
May 3 00:31:27 loki: kernel: [<c028ab32>] dm_table_any_congested+0x32/0x48
May 3 00:31:27 loki: kernel: [<c02896c7>] dm_any_congested+0x2f/0x35
May 3 00:31:27 loki: kernel: [<c012a5f3>] filemap_nopage+0x122/0x2af
May 3 00:31:27 loki: kernel: [<c0131cf3>] __handle_mm_fault+0x123/0x699
May 3 00:31:27 loki: kernel: [<c0132ca7>] free_pgtables+0x90/0xa0
May 3 00:31:27 loki: kernel: [<c02ecfcc>] do_page_fault+0x213/0x525
May 3 00:31:27 loki: kernel: [<c01261d1>] handle_IRQ_event+0x1a/0x3f
May 3 00:31:27 loki: kernel: [<c02ecdb9>] do_page_fault+0x0/0x525
May 3 00:31:27 loki: kernel: [<c02ebe1c>] error_code+0x74/0x7c
May 3 00:31:27 loki: kernel: =======================
May 3 00:31:27 loki: kernel: Mem-info:
May 3 00:31:27 loki: kernel: DMA per-cpu:
May 3 00:31:27 loki: kernel: CPU 0: Hot: hi: 0, btch: 1 usd: 0 Cold: hi: 0, btch: 1 usd: 0
May 3 00:31:27 loki: kernel: Normal per-cpu:
May 3 00:31:27 loki: kernel: CPU 0: Hot: hi: 90, btch: 15 usd: 3 Cold: hi: 30, btch: 7 usd: 28
May 3 00:31:27 loki: kernel: Active:35840 inactive:35895 dirty:0 writeback:0 unstable:0 free:904 slab:3438 mapped:5 pagetables:1221
May 3 00:31:27 loki: kernel: DMA free:1316kB min:112kB low:140kB high:168kB active:5644kB inactive:5624kB present:16256kB pages_scanned:19291 all_unreclaimable? yes
May 3 00:31:27 loki: kernel: lowmem_reserve[]: 0 301
May 3 00:31:27 loki: kernel: Normal free:2300kB min:2164kB low:2704kB high:3244kB active:137716kB inactive:137956kB present:308788kB pages_scanned:500098 all_unreclaimable? yes
May 3 00:31:27 loki: kernel: lowmem_reserve[]: 0 0
May 3 00:31:27 loki: kernel: DMA: 1*4kB 0*8kB 2*16kB 0*32kB 0*64kB 0*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 1316kB
May 3 00:31:27 loki: kernel: Normal: 31*4kB 8*8kB 6*16kB 3*32kB 4*64kB 1*128kB 0*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 2300kB
May 3 00:31:27 loki: kernel: Swap cache: add 2586920, delete 2586917, find 3679386/3873861, race 0+0
May 3 00:31:27 loki: kernel: Free swap = 0kB
May 3 00:31:27 loki: kernel: Total swap = 1851368kB
May 3 00:31:27 loki: kernel: Free swap: 0kB
May 3 00:31:27 loki: kernel: 81900 pages of RAM
May 3 00:31:27 loki: kernel: 0 pages of HIGHMEM
May 3 00:31:27 loki: kernel: 1607 reserved pages
May 3 00:31:27 loki: kernel: 2727 pages shared
May 3 00:31:27 loki: kernel: 3 pages swap cached
May 3 00:31:27 loki: kernel: 0 pages dirty
May 3 00:31:27 loki: kernel: 0 pages writeback
May 3 00:31:27 loki: kernel: 5 pages mapped
May 3 00:31:27 loki: kernel: 3438 pages slab
May 3 00:31:27 loki: kernel: 1221 pages pagetables
May 3 00:31:27 loki: kernel: Out of memory: kill process 27930 (postgres) score 3313 or a child
May 3 00:31:27 loki: kernel: Killed process 27939 (postgres)
(this was caused by a memory leak in libblkid from e2fsprogs <1.40 which
was tripped by rpc.mountd 1.0.12, which led to rpc.mountd bloating to
multiple gigabytes in a day or so. Of course as a root-owned daemon it
got a big anti-OOM bonus, so lots of other things got slaughtered
first...)
--
`... in the sense that dragons logically follow evolution so they would
be able to wield metal.' --- Kenneth Eng's colourless green ideas sleep
furiously |
|
| Back to top |
|
 |
Big and Blue External

Since: Jan 07, 2005 Posts: 71
|
Posted: Mon Jun 11, 2007 9:23 pm Post subject: Re: What's an oom-killer when its at home? [Login to view extended thread Info.] Archived from groups: per prev. post (more info?) |
|
|
Martin Gregorie wrote:
>
> If so that's bad news. I have a Java program under development that
> co-existed happily with everything else last time I test ran it.
If it's under development then presumably it's changing and/or parts of
it may be previously unused.
And since it's Java it may well eat memory for breakfast.
--
Just because I've written it doesn't mean that
either you or I have to believe it. |
|
| Back to top |
|
 |
Martin Gregorie External

Since: Aug 20, 2005 Posts: 284
|
Posted: Mon Jun 11, 2007 9:29 pm Post subject: Re: What's an oom-killer when its at home? [Login to view extended thread Info.] Archived from groups: per prev. post (more info?) |
|
|
Nix wrote:
> If it actually kills anything it will tell you, e.g.
>
Thanks Nix. I've just rescanned the syslog and found the 'killed'
messages. I missed them first time because most times it ran it exited
without killing anything.
Of the few times it did kill stuff it mostly got unimportant processes
(both BOINC projects and nscd), BUT it did get postmaster once and named
once and I'd rather it left those in piece. Postmaster currently runs as
postgres and named runs in a chroot jail as named.
Is there any documentation on it and/or config files I can use to tell
it to lay off specific non-root daemons?
--
martin@ | Martin Gregorie
gregorie. | Essex, UK
org | |
|
| Back to top |
|
 |
Nix External

Since: Jul 29, 2004 Posts: 686
|
Posted: Mon Jun 11, 2007 11:47 pm Post subject: Re: What's an oom-killer when its at home? [Login to view extended thread Info.] Archived from groups: per prev. post (more info?) |
|
|
On 11 Jun 2007, Martin Gregorie spake thusly:
> Nix wrote:
>> If it actually kills anything it will tell you, e.g.
>>
> Thanks Nix. I've just rescanned the syslog and found the 'killed'
> messages. I missed them first time because most times it ran it exited
> without killing anything.
>
> Of the few times it did kill stuff it mostly got unimportant processes
> (both BOINC projects and nscd),
Some BOINC projects are *real* memory hogs. (Some also allocate and free
memory in tight loops, which could easily confuse the OOM-killer.)
The OOM killer tries to get processes which are allocating or touching
lots of memory over a short period of time, concentrating on
`unimportant' ones so as to avoid killing e.g. the X server.
But it's a bit of a hopeless quest really because major processes can
allocate memory on behalf of others: e.g. apps can and do trivially
allocate lots of memory in the X server...
> BUT it did get postmaster once and
> named once and I'd rather it left those in piece. Postmaster currently
> runs as postgres and named runs in a chroot jail as named.
>
> Is there any documentation on it and/or config files I can use to tell
> it to lay off specific non-root daemons?
There's a file under /proc, /proc/{pid}/oom_adj. From
linux/Documentation/filesystems/proc.txt:
2.12 /proc/<pid>/oom_adj - Adjust the oom-killer score
------------------------------------------------------
This file can be used to adjust the score used to select which processes
should be killed in an out-of-memory situation. Giving it a high score will
increase the likelihood of this process being killed by the oom-killer. Valid
values are in the range -16 to +15, plus the special value -17, which disables
oom-killing altogether for this process.
2.13 /proc/<pid>/oom_score - Display current oom-killer score
-------------------------------------------------------------
This file can be used to check the current score used by the oom-killer is for
any given <pid>. Use it together with /proc/<pid>/oom_adj to tune which
process should be killed in an out-of-memory situation.
--
`... in the sense that dragons logically follow evolution so they would
be able to wield metal.' --- Kenneth Eng's colourless green ideas sleep
furiously |
|
| Back to top |
|
 |
Nix External

Since: Jul 29, 2004 Posts: 686
|
Posted: Mon Jun 11, 2007 11:48 pm Post subject: Re: What's an oom-killer when its at home? [Login to view extended thread Info.] Archived from groups: per prev. post (more info?) |
|
|
On 11 Jun 2007, Big and Blue spake thusly:
> And since it's Java it may well eat memory for breakfast.
The Java VM chews up a lot of *address space*, but that's not to say
that much of it is necessarily touched. (Although a lot of Java apps
*are* indeed memory hogs.)
--
`... in the sense that dragons logically follow evolution so they would
be able to wield metal.' --- Kenneth Eng's colourless green ideas sleep
furiously |
|
| Back to top |
|
 |
Martin Gregorie External

Since: Aug 20, 2005 Posts: 284
|
Posted: Tue Jun 12, 2007 12:27 pm Post subject: Re: What's an oom-killer when its at home? [Login to view extended thread Info.] Archived from groups: per prev. post (more info?) |
|
|
Nix wrote:
> On 11 Jun 2007, Big and Blue spake thusly:
>> And since it's Java it may well eat memory for breakfast.
>
> The Java VM chews up a lot of *address space*, but that's not to say
> that much of it is necessarily touched. (Although a lot of Java apps
> *are* indeed memory hogs.)
>
This one is reading through a message archive with Javamail. I suspect
message bodies are getting replicated at least once: the program is a
fairly normal size when started (a few MB) but has been known to expand
to 400 MB when chewing of (a group of?) large messages. Its a little
puzzling considering that I'm using standard Post message size limit of
10 MB.
--
martin@ | Martin Gregorie
gregorie. | Essex, UK
org | |
|
| Back to top |
|
 |
Martin Gregorie External

Since: Aug 20, 2005 Posts: 284
|
Posted: Sat Jun 16, 2007 3:52 pm Post subject: Re: What's an oom-killer: final report [Login to view extended thread Info.] Archived from groups: per prev. post (more info?) |
|
|
I've now found out why oom-killer suddenly started bursting into life.
Turns out that at some stage (probably the last FC6 kernel upgrade to
2.6.20-1.2952.fc6) swap got disabled. If I'd actually read the top's
bottom header line I would have seen that sooner.
At some point an upgrade changed the swap partition's /etc/fstab entry -
I certainly didn't edit it. Anyway, it erroneously replaced the 1st
field (the partition device name) with "LABEL=alabel" as my swap
partition doesn't have a label, this effectively disabled swapping at
boot time.
I've manually edited /etc/fstab to put the partition device name back,
after which "swapon -a -e" (which is what's in /etc/rc.d/rc.sysinit) was
able to enable swapping.
If you're running a fully patched copy of FC6 it might pay you see if
your system has also had swapping disabled.
--
martin@ | Martin Gregorie
gregorie. | Essex, UK
org | |
|
| Back to top |
|
 |
|
|
|
You can post new topics in this forum You can reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
| |
|
|