Help!

missing frame in stack backtrace

 
  

Post new topic   General Reply to Topic (not reply to a specific post)    Forums Home -> App Development RSS
Next:  [News] LugRadio Live 2009 Coverage; Softpedia Lin..  
Author Message
Joe Pfeiffer
External


Since: Dec 21, 2004
Posts: 94



PostPosted: Wed Nov 04, 2009 6:15 pm    Post subject: missing frame in stack backtrace
Archived from groups: comp>os>linux>development>apps (more info?)

Some background: I'm developing a small daemon for some home automation
stuff; it's generally working well but will occasionally (typically
after several weeks of uptime) get a SEGV dereferencing a null pointer
or hang. I'm trying to track the problem down.

The obvious next step is to run it inside gdb, and watch where it
crashes. Unfortunately, I've never seen it crash inside gdb. I've got
no idea how the environment could be different in such a way that it
crashes when not in gdb, but not in gdb -- maybe if I find the bug I'll
get a clue there, too...

So I've implemented a signal handler, which writes a stack trace to a
log file, and then terminate. The relevant part of the code is:

fprintf(stderr, "plmd received signal %d\n", signum);
fprintf(stderr, "fault address 0x%08x\n", info->si_addr);
fprintf(stderr, "code is %d\n", info->si_code);
fprintf(stderr, "EIP is 0x%08x\n", ctx->uc_mcontext.gregs[REG_EIP]);

fprintf(stderr, "backtrace:\n");
levels = backtrace(buffer, MAXDEPTH);
backtrace_symbols_fd(buffer, levels, 2);

Unfortunately, near as I can tell, it is consistently
failing to print the information for the activation record that is
running when the signal handler is called. For instance, a few minutes
ago I had a hang, so I sent a kill SEGV to the daemon. It printed the
following:

plmd received signal 11
fault address 0x000068a0
code is 0
EIP is 0xb8006424
backtrace:
/usr/local/bin/plmd(signal_handler+0xc7)[0x804a75b]
[0xb800640c]
/usr/local/bin/plmd(sendplm+0x92)[0x8049adc]
/usr/local/bin/plmd(main+0x977)[0x804a55f]
/lib/i686/cmov/libc.so.6(__libc_start_main+0xe5)[0xb7e247a5]
/usr/local/bin/plmd[0x80490b1]

Inspecting the code with gdb, main() does indeed call sendplm()
somewhere near address 0x804a55f. But the only thing near 0x8049adc is
a call to another function of mine, called readloop(). It's not
unreasonable that it could have hung in readloop(), but that doesn't
appear in the trace! My signal handler does; and seems to be claiming
that the EIP at the time the signal was intercepted was somewhere up in
the system libs. It seems perfectly reasonable to me that that might be
where the real hang is, but

(1) why am I not seeing my call to that function?
and
(2) how can I figure out what function it is?

Any help would be welcome...
--
As we enjoy great advantages from the inventions of others, we should
be glad of an opportunity to serve others by any invention of ours;
and this we should do freely and generously. (Benjamin Franklin)
Back to top
Display posts from previous:   
Post new topic   General Reply to Topic (not reply to a specific post)    Forums Home -> App Development All times are: Eastern Time (US & Canada) (change)
Page 1 of 1

 
You can post new topics in this forum
You can reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum