Help!

2.6.18-rt1

 
  

Goto page Previous  1, 2, 3, 4, 5
Post new topic   General Reply to Topic (not reply to a specific post)    Forums Home -> Kernel (archive) RSS
Next:  Please pull git390 'for-linus' branch  
Author Message
Eric W. Biederman
External


Since: May 19, 2006
Posts: 1134



PostPosted: Wed Sep 27, 2006 9:40 am    Post subject: Re: [PATCH] move put_task_struct() reaping into a thread [Re: 2.6.18-rt1] [Login to view extended thread Info.]
Archived from groups: linux>kernel (more info?)

Bill Huey (hui) <billh.RemoveThis@gnuppy.monkey.org> writes:

> On Wed, Sep 27, 2006 at 12:02:21AM -0600, Eric W. Biederman wrote:
>> Bill Huey (hui) <billh.RemoveThis@gnuppy.monkey.org> writes:
>> Interesting. I think the easy solution would just be to assert that put_task_struct
>> can sleep and to fix any callers that expect differently. I haven't looked very
>> closely but I don't recall anything that needs put_task_struct to be atomic.
>> With a function that complex I certainly would not expect it to never sleep unless
>> it had a big fat comment.
>
> One of the main claims about the -rt patch is that the kernel is basically
> conversion of the current locking semantics in the kernel. What you mentioned
> above would deviate from that and and clutter non-preemptive kernel maintenance.
> If you're suggesting a more general change to that function, then it wouldn't
> violate that.

Yes I am. The motivator would be the RT work but I don't see a reason why
the it couldn't be put in the mainline kernel. If not at least we need
the big fat comment in the mainline kernel that says put_task_struct must
be safe to call with interrupts disabled.

The way the code is structured now it deviates from the mainline kernel in
more than just changing locking behavior. Which is what brought me into
this conversation in the first place. So removing that point of discord
would be good.

>> Well I did find an instance where we call put_task_struct with a
>> spinlock held. Inside of lib/rwsem.c:rwsem_down_failed_common().
>>
>> Still that may be the only user that cares. I suspect with a little
>> code rearrangement that case is fixable. It's not like that code is a
>> fast path or anything. It should just be a matter of passing the
>> task struct outside of the spinlock before calling put_task_struct.
>
> Yeah, that's what the small patch of mine does outside of doing an RCU
> callback.

If you have fixed rwsem_down_failed_common() like it sounds like you have.
That would be a nice patch to for the mainline kernel.


>> But this is a path where we are freeing data, so GFP_ATOMIC should not come
>> into it. I just read through the code and there are not allocations
>> there.
>
> Correct, I see it as a possibly bigger problem since memory allocators and
> destructors aren't available readily in preempt_disable critical sections.
> It's not the case in the regular kernel which could be a significant concern.

Yes. I would say that memory allocators have never been readily available in
sections where we have preemption disabled, but they at least are available
and sometimes they will even give you the memory you asked for.

> There are a couple of issues here.
>
> (1) The RCU callback isn't used in this case to back other RCU read critical
> sections. It's just used as a generic callback mechanism in this case. Some
> consider it a hack.
>
> (2) RCU isn't necessarily bad for -rt since Paul McKenney and folks are
> working on making it preempt friendly. Any talk of removing the use RCU in -rt
> is premature and probably unlikely because of this work. There are many options
> here. RCU very useful for scalability so seeing it go away in -rt would be
> generally a bad thing IMO.

Agreed. However until the issues are resolved with call_rcu it appears quite
silly to increase the usage of it in the RT tree.

About the rcu removal discussion I heard it was more the possibility was
suggested because the downside was significant, and normal locks were
more deterministic. The emphasis was that call_rcu could be a problem and
that something needs to happen to fix that.

> The use of lock-free techinques is something that could eventually be more
> widely used in -rt since it'll make kernel call paths potentially more
> deterministic without worrying about hitting a contending mutex and that can
> effect determinism. It's not expect to be the case where calls into the kernel
> are going to be deterministic but to extend the basic idea of determinism into
> certain critical kernel paths like a read() to a special device driver, etc...
>
> Route table look up also comes to mind here that benefits from this case if
> there's a need for real time facilities in networking.
>
> Just some ideas here.

Agreed. The normal rcu path is quite nice. It is the call_rcu part used
to implement delayed freeing where things get ugly.

>> My gut feel now that I understand the pieces is that this approach has
>> all of the hallmarks of a hack, both the kmalloc/kfree thing and even
>> more calling put_task_struct in an atomic context. If the callers
>> were fixed put_task_struct could safely sleep so kmalloc/kfree
>> sleeping would not be a problem.
>
> I can't say. The current logic just queues the request into a list and a
> thread wakes to reap that task_struct. It's sufficient and the need to
> convert put_task_struct so that it can block.

Yes the current logic is simple and requires no changes elsewhere.
It is always nice when you can do that. But in this case it adds unnecessary
overhead, and indeterminism. From what little I understand of RCU both
of those are bad things. Thus my gut feeling that the approach is a hack
and should get fixed properly.

Anyway short of submitting a patch I think I have said enough.
Thank you for the explanation.

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo.RemoveThis@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Back to top
Ingo Molnar
External


Since: May 15, 2006
Posts: 3111



PostPosted: Wed Sep 27, 2006 10:50 am    Post subject: Re: 2.6.18-rt1 [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

* Mike Kravetz <kravetz.TakeThisOut@us.ibm.com> wrote:

> On Wed, Sep 20, 2006 at 04:19:07PM +0200, Ingo Molnar wrote:
> > In particular, a nasty softirq performance bug has been fixed, which
> > caused the "5x slowdown under TCP" bug reported to lkml - those TCP
> > performance figures are now on par with vanilla performance.
>
> Just curious about the cause and fix for this issue. I tried
> searching the mail lists for discussion but came up empty. The patch
> it too big for me to determine what specific changes addressed this
> issue. If anyone can point me in the right direction, that would be
> appreciated.

i /think/ it's this bit of code commented out in kernel/softirq.c:

//if (!in_interrupt() && local_softirq_pending())
// invoke_softirq();

try to uncomment that. (this should i think do the trick on PREEMPT_RT -
but no guarantees. It might cause problems on !PREEMPT_RT configs, i had
to juggle around some code here. Maybe the easiest would be if you tried
to take the new softirq.c, sans the lockdep changes. Not sure how
feasible that is though.)

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo.TakeThisOut@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Back to top
Ingo Molnar
External


Since: May 15, 2006
Posts: 3111



PostPosted: Wed Sep 27, 2006 10:50 am    Post subject: Re: 2.6.18-rt1 [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

* Daniel Walker <dwalker.DeleteThis@mvista.com> wrote:

> On closer inspection I still think this is wrong. (Although it looks
> really nice..) find below speaking only in term of !PREEMPT_RT ,

> > - } else if (oops_in_progress) {
> > - locked = spin_trylock(&up->port.lock);
> > - } else
> > - spin_lock(&up->port.lock);
> > + if (up->port.sysrq || oops_in_progress)
> > + locked = spin_trylock_irqsave(&up->port.lock, flags);
>
> Now in the new version interrupts are only off if you _get the lock_.
> Presumably the lock is taken in the calling function, but interrupts
> aren't disabled.
>
> I'm assuming the code is disabling interrupts for a good reason, I
> don't know enough about the code to say it isn't.

yeah, agreed - behavior now changed due to my patch. This is really
twisted code...

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo.DeleteThis@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Back to top
Ingo Molnar
External


Since: May 15, 2006
Posts: 3111



PostPosted: Wed Sep 27, 2006 11:10 am    Post subject: Re: [PATCH] move put_task_struct() reaping into a thread [Re: 2.6.18-rt1] [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

* Bill Huey <billh.TakeThisOut@gnuppy.monkey.org> wrote:

> Because the conversion of memory allocation routines like kmalloc and
> kfree aren't safely callable within a preempt_disable critical section
> since they were incompletely converted in the -rt. [...]

they were not 'incompletely converted' - they are /intentionally/ fully
preemptible.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo.TakeThisOut@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Back to top
Ingo Molnar
External


Since: May 15, 2006
Posts: 3111



PostPosted: Wed Sep 27, 2006 11:20 am    Post subject: Re: [PATCH] move put_task_struct() reaping into a thread [Re: 2.6.18-rt1] [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

* Eric W. Biederman <ebiederm RemoveThis @xmission.com> wrote:

> About the rcu removal discussion I heard it was more the possibility
> was suggested because the downside was significant, and normal locks
> were more deterministic. The emphasis was that call_rcu could be a
> problem and that something needs to happen to fix that.

RCU is really mostly used as a garbage-collection scheme, and hence its
latency, while it can be practically problematic in some cases, never is
directly visible in terms of application or kernel behavior.

the same is in this case: the call_rcu() use is for gathering totally
unused task structs. There should be no side-effects.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo RemoveThis @vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Back to top
Ingo Molnar
External


Since: May 15, 2006
Posts: 3111



PostPosted: Wed Sep 27, 2006 11:20 am    Post subject: Re: [PATCH] move put_task_struct() reaping into a thread [Re: 2.6.18-rt1] [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

* Bill Huey <billh RemoveThis @gnuppy.monkey.org> wrote:

> > more than just changing locking behavior. Which is what brought me
> > into this conversation in the first place. So removing that point
> > of discord would be good.
>
> Yes, there are few places. It was primarily to handle the reaping
> during the finishing parts of a fork. All in all, it's not at all a
> critical path.

no. It was primarily to move the put_task_struct() off the
context-switch fastpath. (In comparison the fork use is much rarer)

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo RemoveThis @vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Back to top
Bill Huey
External


Since: Jul 24, 2006
Posts: 84



PostPosted: Wed Sep 27, 2006 11:20 am    Post subject: Re: [PATCH] move put_task_struct() reaping into a thread [Re: 2.6.18-rt1] [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

On Wed, Sep 27, 2006 at 01:29:44AM -0600, Eric W. Biederman wrote:
> Bill Huey (hui) <billh.DeleteThis@gnuppy.monkey.org> writes:
> > One of the main claims about the -rt patch is that the kernel is basically
> > conversion of the current locking semantics in the kernel. What you mentioned
> > above would deviate from that and and clutter non-preemptive kernel maintenance.
> > If you're suggesting a more general change to that function, then it wouldn't
> > violate that.
>
> Yes I am. The motivator would be the RT work but I don't see a reason why
> the it couldn't be put in the mainline kernel. If not at least we need
> the big fat comment in the mainline kernel that says put_task_struct must
> be safe to call with interrupts disabled.
>
> The way the code is structured now it deviates from the mainline kernel in
> more than just changing locking behavior. Which is what brought me into
> this conversation in the first place. So removing that point of discord
> would be good.

Yes, there are few places. It was primarily to handle the reaping during the
finishing parts of a fork. All in all, it's not at all a critical path.

> If you have fixed rwsem_down_failed_common() like it sounds like you have.
> That would be a nice patch to for the mainline kernel.

The problem only exists under -rt in that scheduling path. I can't comment if
it's desired for mainline or not.

> > There are a couple of issues here.
> >
> > (1) The RCU callback isn't used in this case to back other RCU read critical
> > sections. It's just used as a generic callback mechanism in this case. Some
> > consider it a hack.
> >
> > (2) RCU isn't necessarily bad for -rt since Paul McKenney and folks are
> > working on making it preempt friendly. Any talk of removing the use RCU in -rt
> > is premature and probably unlikely because of this work. There are many options
> > here. RCU very useful for scalability so seeing it go away in -rt would be
> > generally a bad thing IMO.
>
> Agreed. However until the issues are resolved with call_rcu it appears quite
> silly to increase the usage of it in the RT tree.
>
> About the rcu removal discussion I heard it was more the possibility was
> suggested because the downside was significant, and normal locks were
> more deterministic. The emphasis was that call_rcu could be a problem and
> that something needs to happen to fix that.

In a general purpose system as complicated as Linux, "locks being more
deterministic" is kind of ridiculous. Tons of stuff can happen that you
can't control. The RCU thing that you heard was probably just ideas being
thrown around and there are many options that haven't been explored yet.
The actual scenario is more complicated than what you might have initially
heard.

> Agreed. The normal rcu path is quite nice. It is the call_rcu part used
> to implement delayed freeing where things get ugly.

The RCU callback stuff runs as a thread. The key part of a preemptible system
that can make guarantees is the ability to response to a higher priority thread
in a deterministic amount of time. That's the only real guarantee that can be
made in that system. Because of it running in a thread, callbacks in RCU don't
effect the preemptibility of the system at all.

Any kernel thread taking the risk of acquiring a lock cannot be deterministic
in any reasonable sense. Any claims of a guarantee is pretty much a lie since
the system is too complicated to analysize for determinism in any meaningful
way using the analysis techniques out there. The kernel is just too complicated.

In contrast, specialize RT apps with few threads is another scenario that's much
more controllable.

> Yes the current logic is simple and requires no changes elsewhere.
> It is always nice when you can do that. But in this case it adds unnecessary
> overhead, and indeterminism. From what little I understand of RCU both
> of those are bad things. Thus my gut feeling that the approach is a hack
> and should get fixed properly.

I agree. It's a hack and it should go away. You're the fourth person to mention
something about this, but it's up to Ingo.

> Anyway short of submitting a patch I think I have said enough.
> Thank you for the explanation.

No problem, any time Smile

bill

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo.DeleteThis@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Back to top
Ingo Molnar
External


Since: May 15, 2006
Posts: 3111



PostPosted: Wed Sep 27, 2006 11:20 am    Post subject: Re: [PATCH] move put_task_struct() reaping into a thread [Re: 2.6.18-rt1] [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

* Eric W. Biederman <ebiederm.TakeThisOut@xmission.com> wrote:

> Yes I am. The motivator would be the RT work but I don't see a reason
> why the it couldn't be put in the mainline kernel. If not at least we
> need the big fat comment in the mainline kernel that says
> put_task_struct must be safe to call with interrupts disabled.
>
> The way the code is structured now it deviates from the mainline
> kernel in more than just changing locking behavior. Which is what
> brought me into this conversation in the first place. So removing
> that point of discord would be good.

well, this is one of those few cases (out of ~50,000 lock uses in the
kernel) where such a change was unavoidable: put_task_struct() is used
in the scheduler context-switch path. (see sched.c:finish_task_switch())

So that's why i first turned it into a separate, extra delayed-free via
the "desched thread", and later on picked up the RCUification from Paul
McKenney. The RCUification was the simpler (and hence easier to
maintain) change. There is no problem with putting this into the RCU
path on PREEMPT_RT, as this is a resource-freeing act. I.e. whatever
'delay' there might be in RCU processing, it does not impact program
logic. I agree with you that on !PREEMPT_RT there's no reason to
complicate things with an extra layer of indirection.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo.TakeThisOut@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Back to top
Bill Huey
External


Since: Jul 24, 2006
Posts: 84



PostPosted: Wed Sep 27, 2006 11:20 am    Post subject: Re: [PATCH] move put_task_struct() reaping into a thread [Re: 2.6.18-rt1] [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

On Wed, Sep 27, 2006 at 10:57:12AM +0200, Ingo Molnar wrote:
> * Bill Huey <billh.DeleteThis@gnuppy.monkey.org> wrote:
>
> > Because the conversion of memory allocation routines like kmalloc and
> > kfree aren't safely callable within a preempt_disable critical section
> > since they were incompletely converted in the -rt. [...]
>
> they were not 'incompletely converted' - they are /intentionally/ fully
> preemptible.

What I meant by "incompletely converted" is that the allocators could be
made more safe in non-preemptible scenarios under -rt. It's potentially
a valuable thing to have since GFP_ATOMIC semantics already exist in the
current allocators and a newer category could be added as a new feature of
that allocator for those scenarios. I'm happy dequeuing things off of my
own free list, but that's just me.

-rt semanatics created a couple of new locking scenarios that the previous
kernel didn't really have to address. That's all that I meant by that. Smile

bill

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo.DeleteThis@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Back to top
Ingo Molnar
External


Since: May 15, 2006
Posts: 3111



PostPosted: Wed Sep 27, 2006 11:30 am    Post subject: Re: [PATCH] move put_task_struct() reaping into a thread [Re: 2.6.18-rt1] [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

* Bill Huey <billh.TakeThisOut@gnuppy.monkey.org> wrote:

> On Wed, Sep 27, 2006 at 10:57:12AM +0200, Ingo Molnar wrote:
> > * Bill Huey <billh.TakeThisOut@gnuppy.monkey.org> wrote:
> >
> > > Because the conversion of memory allocation routines like kmalloc and
> > > kfree aren't safely callable within a preempt_disable critical section
> > > since they were incompletely converted in the -rt. [...]
> >
> > they were not 'incompletely converted' - they are /intentionally/ fully
> > preemptible.
>
> What I meant by "incompletely converted" is that the allocators could
> be made more safe in non-preemptible scenarios under -rt. [...]

no, the -rt kernel intentionally does not do that and wont do that.
There's lots of complex stuff going on within allocators, even in the
GFP_ATOMIC path. We might be able to plug in more deterministic
allocators (like SLOB), but even they must be fully preemptible. In the
-rt kernel there's basically no compromise on the "do as little as
possible in non-preemptible regions" stance.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo.TakeThisOut@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Back to top
Ingo Molnar
External


Since: May 15, 2006
Posts: 3111



PostPosted: Wed Sep 27, 2006 3:20 pm    Post subject: Re: 2.6.18-rt4 [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

* john cooper <john.cooper RemoveThis @third-harmonic.com> wrote:

> >ok, i've added this no-4M-limit patch to -rt. John, does that solve
> >your problem?
>
> A few of us hit a snag in Lennert's original patch. He has a fix
> which addresses this, attached.

ok, applied.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo RemoveThis @vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Back to top
Eric W. Biederman
External


Since: May 19, 2006
Posts: 1134



PostPosted: Wed Sep 27, 2006 4:10 pm    Post subject: Re: [PATCH] move put_task_struct() reaping into a thread [Re: 2.6.18-rt1] [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

Ingo Molnar <mingo RemoveThis @elte.hu> writes:

> * Eric W. Biederman <ebiederm RemoveThis @xmission.com> wrote:
>
>> Yes I am. The motivator would be the RT work but I don't see a reason
>> why the it couldn't be put in the mainline kernel. If not at least we
>> need the big fat comment in the mainline kernel that says
>> put_task_struct must be safe to call with interrupts disabled.
>>
>> The way the code is structured now it deviates from the mainline
>> kernel in more than just changing locking behavior. Which is what
>> brought me into this conversation in the first place. So removing
>> that point of discord would be good.
>
> well, this is one of those few cases (out of ~50,000 lock uses in the
> kernel) where such a change was unavoidable: put_task_struct() is used
> in the scheduler context-switch path. (see sched.c:finish_task_switch())

I had missed that was in a preempt disable path when I skimmed through
the users.

> So that's why i first turned it into a separate, extra delayed-free via
> the "desched thread", and later on picked up the RCUification from Paul
> McKenney. The RCUification was the simpler (and hence easier to
> maintain) change. There is no problem with putting this into the RCU
> path on PREEMPT_RT, as this is a resource-freeing act. I.e. whatever
> 'delay' there might be in RCU processing, it does not impact program
> logic. I agree with you that on !PREEMPT_RT there's no reason to
> complicate things with an extra layer of indirection.

I'm still wondering if we can move put_task_struct a little lower in
the logic in the places where it is called, so it isn't called under a
lock, or with preemption disabled. The only downside I see is that it
might convolute the logic into unreadability.

In general I get nervous about calling big functions while holding locks.

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo RemoveThis @vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Back to top
Ingo Molnar
External


Since: May 15, 2006
Posts: 3111



PostPosted: Wed Sep 27, 2006 4:20 pm    Post subject: Re: [PATCH] move put_task_struct() reaping into a thread [Re: 2.6.18-rt1] [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

* Eric W. Biederman <ebiederm DeleteThis @xmission.com> wrote:

> I'm still wondering if we can move put_task_struct a little lower in
> the logic in the places where it is called, so it isn't called under a
> lock, or with preemption disabled. The only downside I see is that it
> might convolute the logic into unreadability.

well it's all a function of the task reaping logic: right now we in
essence complete the reaping from the scheduler, via prev_state ==
TASK_DEAD. We cannot do it sooner because the task is still in use. I
had one other implementation upstream some time ago, which was a
single-slot cache for reaped tasks - but that uglified other codepaths
because _something_ has to notice that the task has been unscheduled.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo DeleteThis @vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Back to top
Paul E. McKenney
External


Since: Jun 08, 2006
Posts: 109



PostPosted: Wed Sep 27, 2006 6:20 pm    Post subject: Re: [PATCH] move put_task_struct() reaping into a thread [Re: 2.6.18-rt1] [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

On Wed, Sep 27, 2006 at 04:06:59PM +0200, Ingo Molnar wrote:
>
> * Eric W. Biederman <ebiederm.TakeThisOut@xmission.com> wrote:
>
> > I'm still wondering if we can move put_task_struct a little lower in
> > the logic in the places where it is called, so it isn't called under a
> > lock, or with preemption disabled. The only downside I see is that it
> > might convolute the logic into unreadability.
>
> well it's all a function of the task reaping logic: right now we in
> essence complete the reaping from the scheduler, via prev_state ==
> TASK_DEAD. We cannot do it sooner because the task is still in use. I
> had one other implementation upstream some time ago, which was a
> single-slot cache for reaped tasks - but that uglified other codepaths
> because _something_ has to notice that the task has been unscheduled.

I believe that we are way too far into the task-teardown process for
something like synchronize_rcu() to be feasible (not enough of the
task left to be able to sleep!), but thought I should bring up the
possibility on the off-chance that it caused someone to come up with a
better approach.

Another possible approach would be workqueues. The disadvantages here are
(1) higher overhead (2) workqueues can be delayed for a -long- time in a
realtime environment, which increases vulnerability to memory exhaustion.

Again, hoping this provokes some better ideas...

Thanx, Paul
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo.TakeThisOut@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Back to top
Esben Nielsen
External


Since: Aug 14, 2006
Posts: 5



PostPosted: Wed Sep 27, 2006 10:30 pm    Post subject: Re: [PATCH] move put_task_struct() reaping into a thread [Re: 2.6.18-rt1] [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

On Wed, 27 Sep 2006, Eric W. Biederman wrote:

> Bill Huey (hui) <billh.RemoveThis@gnuppy.monkey.org> writes:
>
>> On Tue, Sep 26, 2006 at 08:55:41PM -0600, Eric W. Biederman wrote:
>>> Bill Huey (hui) <billh.RemoveThis@gnuppy.monkey.org> writes:
>>>> This patch moves put_task_struct() reaping into a thread instead of an
>>>> RCU callback function as discussed with Esben publically and Ingo privately:
>>>
>>> Stupid question.
>>
>> It's a great question actually.
>>
>>> Why does the rt tree make all calls to put_task_struct an rcu action?
>>> We only need the rcu callback from kernel/exit.c
>>
>> Because the conversion of memory allocation routines like kmalloc and kfree aren't
>> safely callable within a preempt_disable critical section since they were incompletely
>> converted in the -rt. It can run into the sleeping in atomic scenario which can result
>> in a deadlock since those routines use blocking locks internally in the implementation
>> now as a result of the spinlock_t conversion to blocking locks.
>
> Interesting. I think the easy solution would just be to assert that put_task_struct
> can sleep and to fix any callers that expect differently. I haven't looked very
> closely but I don't recall anything that needs put_task_struct to be atomic.
> With a function that complex I certainly would not expect it to never sleep unless
> it had a big fat comment.
>
> Well I did find an instance where we call put_task_struct with a
> spinlock held. Inside of lib/rwsem.c:rwsem_down_failed_common().
>
> Still that may be the only user that cares. I suspect with a little
> code rearrangement that case is fixable. It's not like that code is a
> fast path or anything. It should just be a matter of passing the
> task struct outside of the spinlock before calling put_task_struct.
>
>>> Nothing else needs those semantics.
>>
>> Right, blame it on the incomplete conversion of the kmalloc and friends. GFP_ATOMIC is
>> is kind of meaningless in the -rt tree and it might be a good thing to add something
>> like GFP_RT_ATOMIC for cases like this to be handled properly and restore that
>> particular semantic in a more meaningful way.
>
> But this is a path where we are freeing data, so GFP_ATOMIC should not come
> into it. I just read through the code and there are not allocations
> there.
>
>>> I agree that put_task_struct is the most common point so this is unlikely
>>> to remove your issues with rcu callbacks but it just seems completely backwards
>>> to increase the number of rcu callbacks in the rt tree.
>>
>> I'm not sure what mean here, but if you mean that you don't like the RCU API abuse then
>> I agree with you on that. However, Ingo disagrees and I'm not going to argue it with him.
>> Although, I'm not going stop you if you do. Smile
>
> What I was thinking is that rcu isn't terribly friendly to realtime
> activities because it postpones work and can wind up with a lot of
> work to do at some random time later which can be bad for latencies.
>

Only activities with no deadlines are postponed. And therefore RCU is good
for the latencies of the application. No high-priority, low-latency task
should bother spend time traversing and freeing a complicated datastructure.
Defer that to some lower priority task.

Esben

> So I was very surprised to see an rt patch making more things under
> rcu protection. Especially as I have heard other developers worried
> about rt issues discussing removing the rcu functionality.
>
> My gut feel now that I understand the pieces is that this approach has
> all of the hallmarks of a hack, both the kmalloc/kfree thing and even
> more calling put_task_struct in an atomic context. If the callers
> were fixed put_task_struct could safely sleep so kmalloc/kfree
> sleeping would not be a problem.
>

That put_task_struct uses RCU is a hack to defer the job to a lower
priority task. I think the right solution is to defer the job to a lower
priority task using a cheaper mechanism. put_task_struct() is used from
high priority tasks in the priority inheritance code and should only do
the minimal job of defering the real work to another task.

Esben

> Eric
>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo.RemoveThis@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Back to top
john stultz
External


Since: May 16, 2006
Posts: 260



PostPosted: Thu Sep 28, 2006 2:50 am    Post subject: Re: 2.6.18-rt1 [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

On Fri, 2006-09-22 at 13:58 +0200, Ingo Molnar wrote:
> * john stultz <johnstul DeleteThis @us.ibm.com> wrote:
>
> > I'm seeing a similar issue. Although the log is a bit futzed. Maybe
> > its the sd_mod?
> >
> > at virtual address 75010000le kernel paging requestproc filesystem
>
> would be nice to figure out why it crashes - unfortunately i cannot
> trigger it. Could it be some build tool incompatibility perhaps? Some
> sizing issue (some module struct gets too large)?

Been looking a bit deeper into this again:

BUG: unable to handle kernel paging request at virtual address 75010000
printing eip:
c01354ea
*pde = cccccccc
stopped custom tracer.
Oops: 0000 [#1]
SMP
Modules linked in:
CPU: 1
EIP: 0060:[<c01354ea>] Not tainted VLI
EFLAGS: 00010282 (2.6.18-rtjohn #9)
EIP is at lookup_symbol+0x37/0x5b
eax: 00000018 ebx: c0387a10 ecx: f7df7e58 edx: c0349c7e
esi: 75010000 edi: f881a229 ebp: c038a524 esp: f7df7e5c
ds: 007b es: 007b ss: 0068 preempt: 00000000
Process insmod (pid: 453, ti=f7df6000 task=c2b54030 task.ti=f7df6000)
Stack: f881f940 f8820c80 f881a229 f7df7ea4 c013555c f881a229 c03855a0
c038a524
f881f940 f8820c80 f881a229 00000000 c01362dd f881a229 f7df7ea0
f7df7ea4
00000001 00000000 f881a229 f881f940 000004f0 0000003c c0136855
f881922c
Call Trace:
[<c013555c>] __find_symbol+0x26/0x2e0
[<c01362dd>] resolve_symbol+0x23/0x5f
[<c0136855>] simplify_symbols+0x7e/0xf0
[<c01375b0>] load_module+0x7c4/0xc14
[<c0137a60>] sys_init_module+0x3d/0x171
[<c0102855>] sysenter_past_esp+0x56/0x79
Code: 55 53 ff 74 24 1c 68 5f 9c 34 c0 e8 df 80 fe ff 83 c4 10 39 eb 73
31 53 68 7e 9c 34 c0 e8 cd 80 fe ff 5e 5f 8b 73 04 8b 7c 24 14 <ac> ae
75 08 84 c0 75 f8 31 c0 eb 04 19 c0 0c 01 85 c0 75 04 89
EIP: [<c01354ea>] lookup_symbol+0x37/0x5b SS:ESP 0068:f7df7e5c


Put some debugging into __find_symbol() and came up w/ this:

lookup_symbol: scsi_print_sense_hdr 0xc0385590 0xc038a514

Where it goes through that range in kernel_symbol increments.

The last one it checks before crashing is: 0xc0387a10

>From Symbol.map:
c0385590 R __start___ksymtab
c038a514 R __stop___ksymtab

That looks right. Now looking up 0xc0387a10, there's no symbol there.

c03879e8 r __ksymtab_find_next_bit
c03879f0 r __ksymtab_find_next_zero_bit
c03879f8 R __write_lock_failed
c0387a18 R __read_lock_failed
c0387a2c r __ksymtab___delay
c0387a34 r __ksymtab___const_udelay
c0387a3c r __ksymtab___udelay
c0387a44 r __ksymtab___ndelay

That __read/__write_lock_failed bit looks wrong.

That's as far as I've gotten so far, but will email with more as I find
it.

thanks
-john

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo DeleteThis @vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Back to top
john stultz
External


Since: May 16, 2006
Posts: 260



PostPosted: Fri Sep 29, 2006 1:00 am    Post subject: Re: 2.6.18-rt1 [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

On Wed, 2006-09-27 at 17:42 -0700, john stultz wrote:
> On Fri, 2006-09-22 at 13:58 +0200, Ingo Molnar wrote:
> > * john stultz <johnstul.RemoveThis@us.ibm.com> wrote:
> >
> > > I'm seeing a similar issue. Although the log is a bit futzed. Maybe
> > > its the sd_mod?
> > >
> > > at virtual address 75010000le kernel paging requestproc filesystem
> >
> > would be nice to figure out why it crashes - unfortunately i cannot
> > trigger it. Could it be some build tool incompatibility perhaps? Some
> > sizing issue (some module struct gets too large)?
>
> Been looking a bit deeper into this again:
[snip]
> c03879e8 r __ksymtab_find_next_bit
> c03879f0 r __ksymtab_find_next_zero_bit
> c03879f8 R __write_lock_failed
> c0387a18 R __read_lock_failed
> c0387a2c r __ksymtab___delay
> c0387a34 r __ksymtab___const_udelay
> c0387a3c r __ksymtab___udelay
> c0387a44 r __ksymtab___ndelay
>
> That __read/__write_lock_failed bit looks wrong.


So it seems gcc 3.4.4 misplaces the __write_lock_failed function into
the ksymtab. It doesn't happen w/ 4.0.3.

Anyway, this patch explicitly defines the section and fixes the issue
for me. Would the other reporters of this issue give it a whirl as well?

thanks
-john


Index: linux-rtj14/arch/i386/lib/bitops.c
===================================================================
--- linux-rtj14.orig/arch/i386/lib/bitops.c 2006-09-28 15:24:08.000000000 -0700
+++ linux-rtj14/arch/i386/lib/bitops.c 2006-09-28 15:35:29.000000000 -0700
@@ -75,6 +75,7 @@
*/
#ifdef CONFIG_SMP
asm(
+".section .sched.text\n"
".align 4\n"
".globl __write_lock_failed\n"
"__write_lock_failed:\n\t"
@@ -88,6 +89,7 @@
);

asm(
+".section .sched.text\n"
".align 4\n"
".globl __read_lock_failed\n"
"__read_lock_failed:\n\t"


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo.RemoveThis@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Back to top
K.R. Foley
External


Since: May 26, 2006
Posts: 17



PostPosted: Fri Sep 29, 2006 4:20 am    Post subject: Re: 2.6.18-rt1 [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

john stultz wrote:
> On Wed, 2006-09-27 at 17:42 -0700, john stultz wrote:
>> On Fri, 2006-09-22 at 13:58 +0200, Ingo Molnar wrote:
>>> * john stultz <johnstul.TakeThisOut@us.ibm.com> wrote:
>>>
>>>> I'm seeing a similar issue. Although the log is a bit futzed. Maybe
>>>> its the sd_mod?
>>>>
>>>> at virtual address 75010000le kernel paging requestproc filesystem
>>> would be nice to figure out why it crashes - unfortunately i cannot
>>> trigger it. Could it be some build tool incompatibility perhaps? Some
>>> sizing issue (some module struct gets too large)?
>> Been looking a bit deeper into this again:
> [snip]
>> c03879e8 r __ksymtab_find_next_bit
>> c03879f0 r __ksymtab_find_next_zero_bit
>> c03879f8 R __write_lock_failed
>> c0387a18 R __read_lock_failed
>> c0387a2c r __ksymtab___delay
>> c0387a34 r __ksymtab___const_udelay
>> c0387a3c r __ksymtab___udelay
>> c0387a44 r __ksymtab___ndelay
>>
>> That __read/__write_lock_failed bit looks wrong.
>
>
> So it seems gcc 3.4.4 misplaces the __write_lock_failed function into
> the ksymtab. It doesn't happen w/ 4.0.3.
>
> Anyway, this patch explicitly defines the section and fixes the issue
> for me. Would the other reporters of this issue give it a whirl as well?
>
> thanks
> -john
>

John,

This fixes my problem on my fc3 box here at home. I will check my other
development boxes at work tomorrow. Nice catch and thanks for effort.


--
kr
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo.TakeThisOut@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Back to top
Ingo Molnar
External


Since: May 15, 2006
Posts: 3111



PostPosted: Fri Sep 29, 2006 2:40 pm    Post subject: Re: 2.6.18-rt1 [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

* john stultz <johnstul.TakeThisOut@us.ibm.com> wrote:

> > That __read/__write_lock_failed bit looks wrong.
>
> So it seems gcc 3.4.4 misplaces the __write_lock_failed function into
> the ksymtab. It doesn't happen w/ 4.0.3.
>
> Anyway, this patch explicitly defines the section and fixes the issue
> for me. Would the other reporters of this issue give it a whirl as
> well?

nice catch! applied.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo.TakeThisOut@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Back to top
Ingo Molnar
External


Since: May 15, 2006
Posts: 3111



PostPosted: Fri Sep 29, 2006 2:50 pm    Post subject: Re: 2.6.18-rt1 [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

* john stultz <johnstul.TakeThisOut@us.ibm.com> wrote:

> [snip]
> > c03879e8 r __ksymtab_find_next_bit
> > c03879f0 r __ksymtab_find_next_zero_bit
> > c03879f8 R __write_lock_failed
> > c0387a18 R __read_lock_failed
> > c0387a2c r __ksymtab___delay
> > c0387a34 r __ksymtab___const_udelay
> > c0387a3c r __ksymtab___udelay
> > c0387a44 r __ksymtab___ndelay
> >
> > That __read/__write_lock_failed bit looks wrong.
>
>
> So it seems gcc 3.4.4 misplaces the __write_lock_failed function into
> the ksymtab. It doesn't happen w/ 4.0.3.

i think it's an ld bug, and it's related to the nested
..section/.previous use in the LOCK_PREFIX macro in
include/asm-i386/alternatives.h.

this is the second time i have seen smp-alternatives related linker
breakage.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo.TakeThisOut@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Back to top
Display posts from previous:   
Post new topic   General Reply to Topic (not reply to a specific post)    Forums Home -> Kernel (archive) All times are: Eastern Time (US & Canada) (change)
Goto page Previous  1, 2, 3, 4, 5
Page 4 of 5

 
You can post new topics in this forum
You can reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum