Help!

[PATCH 1/3] sched: Enable wake balancing for the SMT/HT do..

 
  

Goto page 1, 2
Post new topic   General Reply to Topic (not reply to a specific post)    Forums Home -> Kernel RSS
Next:  [gentoo-user] ebuild help: java main class?  
Author Message
Arjan van de Ven
External


Since: Oct 24, 2009
Posts: 1



PostPosted: Sat Oct 24, 2009 5:10 pm    Post subject: [PATCH 1/3] sched: Enable wake balancing for the SMT/HT domain
Archived from groups: linux>kernel (more info?)

Subject: sched: Enable wake balancing for the SMT/HT domain
From: Arjan van de Ven <arjan RemoveThis @linux.intel.com>

Logical CPUs that are part of a hyperthreading/SMT set are equivalent
in terms of where to execute a task; after all they share pretty much
all resources including the L1 cache.

This means that if task A wakes up task B, we should really consider
all logical CPUs in the SMT/HT set to run task B, not just the CPU that
task A is running on; in case task A keeps running, task B now gets to
execute with no latency. In the case where task A then immediately goes
to wait for a response from task B, nothing is lost due to the aforementioned
equivalency.

This patch turns on the "balance on wakup" and turns of "affine wakeups"
for the SMT/HT scheduler domain to get this lower latency behavior.

Signed-off-by: Arjan van de Ven <arjan RemoveThis @linux.intel.com>

diff --git a/include/linux/topology.h b/include/linux/topology.h
index fc0bf3e..3665dc2 100644
--- a/include/linux/topology.h
+++ b/include/linux/topology.h
@@ -95,8 +95,8 @@ int arch_update_cpu_topology(void);
| 1*SD_BALANCE_NEWIDLE \
| 1*SD_BALANCE_EXEC \
| 1*SD_BALANCE_FORK \
- | 0*SD_BALANCE_WAKE \
- | 1*SD_WAKE_AFFINE \
+ | 1*SD_BALANCE_WAKE \
+ | 0*SD_WAKE_AFFINE \
| 1*SD_SHARE_CPUPOWER \
| 0*SD_POWERSAVINGS_BALANCE \
| 0*SD_SHARE_PKG_RESOURCES \


--
Arjan van de Ven Intel Open Source Technology Centre
For development, discussion and tips for power savings,
visit http://www.lesswatts.org
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo RemoveThis @vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Back to top
Mike Galbraith
External


Since: May 26, 2006
Posts: 368



PostPosted: Sun Oct 25, 2009 4:10 am    Post subject: Re: [PATCH 3/3] sched: Disable affine wakeups by default [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

On Sat, 2009-10-24 at 13:07 -0700, Arjan van de Ven wrote:
> Subject: sched: Disable affine wakeups by default
> From: Arjan van de Ven <arjan RemoveThis @linux.intel.com>
>
> The global affine wakeup scheduler feature sounds nice, but there is a problem
> with this: This is ALSO a per scheduler domain feature already.
> By having the global scheduler feature enabled by default, the scheduler domains
> no longer have the option to opt out.

? The affine decision is qualified by SD_WAKE_AFFINE.

if (want_affine && (tmp->flags & SD_WAKE_AFFINE) &&
cpumask_test_cpu(prev_cpu, sched_domain_span(tmp))) {

affine_sd = tmp;
want_affine = 0;
}

> There are domains (for example the HT/SMT domain) that have good reason to want
> to opt out of this feature.

Even if you're sharing a cache, there are reasons to wake affine. If
the wakee can preempt the waker while it's still eligible to run, wakee
not only eats toasty warm data, it can hand the cpu back to the waker so
it can make more and repeat this procedure for a while without someone
else getting in between, and trashing cache. Also, for a task which
wakes another, then checks to see if it has more work, sleeps if not,
this preemption can keep that task running, saving wakeups. If you put
the wakee on a runqueue where it may have to wait even a tiny bit, buddy
goes to sleep, so that benefit is gone. These things have a HUGE effect
on scalability, as you can see below.

There are times when not waking affine is good, eg immediately after
fork(), it's _generally_ a good idea to not wake affine, because there
may be more no the way, a work generator like make, for example doing
it's thing, and fork() also frequently means an exec is on the way.
That's not usually a producer/consumer situation.

At low load, with producer/consumer, iff you can hit a shared cache,
it's a good idea to not wake affine, any waker/wakee overlap is pure
performance loss in that case. On my Q6600, there's a 1:3 chance of
hitting if left to random chance. You can see that case happening in
the pgsql+oltp numbers below. That wants further examination.

> With this patch they can opt out, while all other domains currently default to
> the affine setting anyway.

Patch globally disabled affine wakeups. Not good.

Oh, btw, wrt affinity vs interrupt, a long time ago, I tried disabling
affine wakeups in hard/soft and both contexts. In all cases, it was a
losing proposition here.

One thing that would be nice for some mixed loads, including the desktop
is, if a cpu is doing high frequency sync/affine wakeups, try to keep
other things away from that cpu by considering synchronous tasks to
count as two instead of one load balancing wise.

(damn, i'm rambling.. time to shut up;)

Sorry for verbosity, numbers probably would have sufficed. I've been
overdosing on boring affinity/scalability testing Wink

tip v2.6.32-rc5-1691-g9a8523b

tbench 4
tip 936.314 MB/sec 8 procs
tip+patches 869.153 MB/sec 8 procs
.928

vmark
tip 125307 messages per second
tip+patches 103743 messages per second
.827

mysql+oltp
clients 1 2 4 8 16 32 64 128 256
tip 10013.90 18526.84 34900.38 34420.14 33069.83 32083.40 30578.30 28010.71 25605.47
tip+patches 8436.34 17826.34 34524.32 31471.92 29188.59 27896.10 26036.43 23774.57 19524.33
.842 .962 .989 .914 .882 .869 .851 .848 .762

pgsql+oltp
clients 1 2 4 8 16 32 64 128 256
tip 13907.85 27135.87 52951.98 52514.04 51742.52 50705.43 49947.97 48374.19 46227.94
tip+patches 15277.63 23050.99 51943.13 51937.16 42246.60 38397.86 34998.71 31154.21 26335.68
1.098 .849 .980 .989 .816 .757 .700 .644 .569


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo RemoveThis @vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Back to top
Peter Zijlstra
External


Since: Jun 06, 2007
Posts: 205



PostPosted: Sun Oct 25, 2009 5:10 am    Post subject: Re: [PATCH 1/3] sched: Enable wake balancing for the SMT/HT domain [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

On Sat, 2009-10-24 at 12:58 -0700, Arjan van de Ven wrote:
> Subject: sched: Enable wake balancing for the SMT/HT domain
> From: Arjan van de Ven <arjan DeleteThis @linux.intel.com>
>
> Logical CPUs that are part of a hyperthreading/SMT set are equivalent
> in terms of where to execute a task; after all they share pretty much
> all resources including the L1 cache.
>
> This means that if task A wakes up task B, we should really consider
> all logical CPUs in the SMT/HT set to run task B, not just the CPU that
> task A is running on; in case task A keeps running, task B now gets to
> execute with no latency. In the case where task A then immediately goes
> to wait for a response from task B, nothing is lost due to the aforementioned
> equivalency.
>
> This patch turns on the "balance on wakup" and turns of "affine wakeups"
> for the SMT/HT scheduler domain to get this lower latency behavior.
>
> Signed-off-by: Arjan van de Ven <arjan DeleteThis @linux.intel.com>
>
> diff --git a/include/linux/topology.h b/include/linux/topology.h
> index fc0bf3e..3665dc2 100644
> --- a/include/linux/topology.h
> +++ b/include/linux/topology.h
> @@ -95,8 +95,8 @@ int arch_update_cpu_topology(void);
> | 1*SD_BALANCE_NEWIDLE \
> | 1*SD_BALANCE_EXEC \
> | 1*SD_BALANCE_FORK \
> - | 0*SD_BALANCE_WAKE \
> - | 1*SD_WAKE_AFFINE \
> + | 1*SD_BALANCE_WAKE \
> + | 0*SD_WAKE_AFFINE \
> | 1*SD_SHARE_CPUPOWER \
> | 0*SD_POWERSAVINGS_BALANCE \
> | 0*SD_SHARE_PKG_RESOURCES \
>

So you're poking at SD_SIBLING_INIT, right?

That seems to make sense. Now doing the same for a cache level domain
(MC is almost that) might also make sense.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo DeleteThis @vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Back to top
Peter Zijlstra
External


Since: Jun 06, 2007
Posts: 205



PostPosted: Sun Oct 25, 2009 5:10 am    Post subject: Re: [PATCH 2/3] sched: Add aggressive load balancing for certain situations [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

On Sat, 2009-10-24 at 13:04 -0700, Arjan van de Ven wrote:
> Subject: sched: Add aggressive load balancing for certain situations
> From: Arjan van de Ven <arjan.RemoveThis@linux.intel.com>
>
> The scheduler, in it's "find idlest group" function currently has an unconditional
> threshold for an imbalance, before it will consider moving a task.
>
> However, there are situations where this is undesireable, and we want to opt in to a
> more aggressive load balancing algorithm to minimize latencies.
>
> This patch adds the infrastructure for this and also adds two cases for which
> we select the aggressive approach
> 1) From interrupt context. Events that happen in irq context are very likely,
> as a heuristic, to show latency sensitive behavior
> 2) When doing a wake_up() and the scheduler domain we're investigating has the
> flag set that opts in to load balancing during wake_up()
> (for example the SMT/HT domain)
>
>
> Signed-off-by: Arjan van de Ven <arjan.RemoveThis@linux.intel.com>



> diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
> index 4e777b4..fe9b95b 100644
> --- a/kernel/sched_fair.c
> +++ b/kernel/sched_fair.c
> @@ -1246,7 +1246,7 @@ static int wake_affine(struct sched_domain *sd, struct task_struct *p, int sync)
> */
> static struct sched_group *
> find_idlest_group(struct sched_domain *sd, struct task_struct *p,
> - int this_cpu, int load_idx)
> + int this_cpu, int load_idx, int agressive)
> {

can't we fold that into load_idx? like -1 or something?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo.RemoveThis@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Back to top
Peter Zijlstra
External


Since: Jun 06, 2007
Posts: 205



PostPosted: Sun Oct 25, 2009 5:10 am    Post subject: Re: [PATCH 3/3] sched: Disable affine wakeups by default [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

On Sat, 2009-10-24 at 13:07 -0700, Arjan van de Ven wrote:
> Subject: sched: Disable affine wakeups by default
> From: Arjan van de Ven <arjan.RemoveThis@linux.intel.com>
>
> The global affine wakeup scheduler feature sounds nice, but there is a problem
> with this: This is ALSO a per scheduler domain feature already.
> By having the global scheduler feature enabled by default, the scheduler domains
> no longer have the option to opt out.
>
> There are domains (for example the HT/SMT domain) that have good reason to want
> to opt out of this feature.
>
> With this patch they can opt out, while all other domains currently default to
> the affine setting anyway.
>
> Signed-off-by: Arjan van de Ven <arjan.RemoveThis@linux.intel.com>
>

Hell no, that'll destroy many workloads. What you could possibly do is
disable it for sched domains that are known to share cache, maybe.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo.RemoveThis@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Back to top
Peter Zijlstra
External


Since: Jun 06, 2007
Posts: 205



PostPosted: Sun Oct 25, 2009 8:10 am    Post subject: Re: [PATCH 2/3] sched: Add aggressive load balancing for certain situations [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

On Sun, 2009-10-25 at 09:01 +0100, Peter Zijlstra wrote:
>
> > diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
> > index 4e777b4..fe9b95b 100644
> > --- a/kernel/sched_fair.c
> > +++ b/kernel/sched_fair.c
> > @@ -1246,7 +1246,7 @@ static int wake_affine(struct sched_domain *sd, struct task_struct *p, int sync)
> > */
> > static struct sched_group *
> > find_idlest_group(struct sched_domain *sd, struct task_struct *p,
> > - int this_cpu, int load_idx)
> > + int this_cpu, int load_idx, int agressive)
> > {
>
> can't we fold that into load_idx? like -1 or something?

A better alternative might be passing imbalance along instead.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo.RemoveThis@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Back to top
Arjan van de Ven
External


Since: May 15, 2006
Posts: 901



PostPosted: Sun Oct 25, 2009 2:10 pm    Post subject: Re: [PATCH 3/3] sched: Disable affine wakeups by default [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

On Sun, 25 Oct 2009 07:55:25 +0100
Mike Galbraith <efault RemoveThis @gmx.de> wrote:
> Even if you're sharing a cache, there are reasons to wake affine. If
> the wakee can preempt the waker while it's still eligible to run,
> wakee not only eats toasty warm data, it can hand the cpu back to the
> waker so it can make more and repeat this procedure for a while
> without someone else getting in between, and trashing cache.

and on the flipside, and this is the workload I'm looking at,
this is halving your performance roughly due to one core being totally
busy while the other one is idle.

My workload is a relatively simple situation: firefox is starting up
and talking to X. I suspect this is representative for many X using
applications in the field. The application sends commands to X, but is
not (yet) going to wait for a response, it has more work to do.
In this case the affine behavior does not only cause latency, but it
also eats the throughput performance.

This is due to a few things that compound, but a key one is this code:

if (sd_flag & SD_BALANCE_WAKE) {
if (sched_feat(AFFINE_WAKEUPS) &&
cpumask_test_cpu(cpu, &p->cpus_allowed))
want_affine = 1;
new_cpu = prev_cpu;
}

the problem is that

if (affine_sd && wake_affine(affine_sd, p, sync)) {
new_cpu = cpu;
goto out;
}

this then will trigger later, as long as there is any domain that has
SD_WAKE_AFFINE set ;(

(part of that problem is that the code that sets affine_sd is done
before the
if (!(tmp->flags & sd_flag))
continue;
test)


The numbers you posted are for a database, and only measure throughput.
There's more to the world than just databases / throughput-only
computing, and I'm trying to find low impact ways to reduce the latency
aspect of things. One obvious candidate is hyperthreading/SMT where it
IS basically free to switch to a sibbling, so wake-affine does not
really make sense there.

--
Arjan van de Ven Intel Open Source Technology Centre
For development, discussion and tips for power savings,
visit http://www.lesswatts.org
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo RemoveThis @vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Back to top
Mike Galbraith
External


Since: May 26, 2006
Posts: 368



PostPosted: Sun Oct 25, 2009 2:10 pm    Post subject: Re: [PATCH 3/3] sched: Disable affine wakeups by default [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

On Sun, 2009-10-25 at 09:51 -0700, Arjan van de Ven wrote:
> On Sun, 25 Oct 2009 07:55:25 +0100
> Mike Galbraith <efault.RemoveThis@gmx.de> wrote:
> > Even if you're sharing a cache, there are reasons to wake affine. If
> > the wakee can preempt the waker while it's still eligible to run,
> > wakee not only eats toasty warm data, it can hand the cpu back to the
> > waker so it can make more and repeat this procedure for a while
> > without someone else getting in between, and trashing cache.
>
> and on the flipside, and this is the workload I'm looking at,
> this is halving your performance roughly due to one core being totally
> busy while the other one is idle.

Yeah, the "one pgsql+oltp pair" in the numbers I posted show that
problem really well. If you can hit an idle shared cache at low load,
go for it every time. The rest of the numbers just show how big the
penalty is if you solve affinity problems with an 8" howitzer Smile

> My workload is a relatively simple situation: firefox is starting up
> and talking to X. I suspect this is representative for many X using
> applications in the field. The application sends commands to X, but is
> not (yet) going to wait for a response, it has more work to do.
> In this case the affine behavior does not only cause latency, but it
> also eats the throughput performance.

Yeah. Damned if you do, damned if you don't.

> This is due to a few things that compound, but a key one is this code:
>
> if (sd_flag & SD_BALANCE_WAKE) {
> if (sched_feat(AFFINE_WAKEUPS) &&
> cpumask_test_cpu(cpu, &p->cpus_allowed))
> want_affine = 1;
> new_cpu = prev_cpu;
> }
>
> the problem is that
>
> if (affine_sd && wake_affine(affine_sd, p, sync)) {
> new_cpu = cpu;
> goto out;
> }
>
> this then will trigger later, as long as there is any domain that has
> SD_WAKE_AFFINE set ;(

And the task looks like a synchronous task.

> (part of that problem is that the code that sets affine_sd is done
> before the
> if (!(tmp->flags & sd_flag))
> continue;
> test)

Hm. That looks like a bug, but after any task has scheduled a few
times, if it looks like a synchronous task, it'll glue itself to it's
waker's runqueue regardless. Initial wakeup may disperse, but it will
come back if it's not overlapping.

> The numbers you posted are for a database, and only measure throughput.
> There's more to the world than just databases / throughput-only
> computing, and I'm trying to find low impact ways to reduce the latency
> aspect of things. One obvious candidate is hyperthreading/SMT where it
> IS basically free to switch to a sibbling, so wake-affine does not
> really make sense there.

It's also almost free on my Q6600 if we aimed for idle shared cache.

I agree fully that affinity decisions could be more perfect than they
are. Getting it wrong is very expensive either way.

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo.RemoveThis@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Back to top
Arjan van de Ven
External


Since: May 15, 2006
Posts: 901



PostPosted: Sun Oct 25, 2009 4:10 pm    Post subject: Re: [PATCH 3/3] sched: Disable affine wakeups by default [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

On Sun, 25 Oct 2009 18:38:09 +0100
Mike Galbraith <efault RemoveThis @gmx.de> wrote:
> > > Even if you're sharing a cache, there are reasons to wake
> > > affine. If the wakee can preempt the waker while it's still
> > > eligible to run, wakee not only eats toasty warm data, it can
> > > hand the cpu back to the waker so it can make more and repeat
> > > this procedure for a while without someone else getting in
> > > between, and trashing cache.
> >
> > and on the flipside, and this is the workload I'm looking at,
> > this is halving your performance roughly due to one core being
> > totally busy while the other one is idle.
>
> Yeah, the "one pgsql+oltp pair" in the numbers I posted show that
> problem really well. If you can hit an idle shared cache at low load,
> go for it every time.

sadly the current code does not do this ;(
my patch might be too big an axe for it, but it does solve this part Wink

I'll keep digging to see if we can do a more micro-incursion.

> Hm. That looks like a bug, but after any task has scheduled a few
> times, if it looks like a synchronous task, it'll glue itself to it's
> waker's runqueue regardless. Initial wakeup may disperse, but it will
> come back if it's not overlapping.

the problem is the "synchronous to WHAT" question.
It may be synchronous to the disk for example; in the testcase I'm
looking at, we get "send message to X. do some more code. hit a page
cache miss and do IO" quite a bit.

> > The numbers you posted are for a database, and only measure
> > throughput. There's more to the world than just databases /
> > throughput-only computing, and I'm trying to find low impact ways
> > to reduce the latency aspect of things. One obvious candidate is
> > hyperthreading/SMT where it IS basically free to switch to a
> > sibbling, so wake-affine does not really make sense there.
>
> It's also almost free on my Q6600 if we aimed for idle shared cache.

yeah multicore with shared cache falls for me in the same bucket.

> I agree fully that affinity decisions could be more perfect than they
> are. Getting it wrong is very expensive either way.

Looks like we agree on a key principle:
If there is a free cpu "close enough" (SMT or MC basically), the
wakee should just run on that.

we may not agree on what to do if there's no completely free logical
cpu, but a much lighter loaded one instead.
but first we need to let code speak Wink

--
Arjan van de Ven Intel Open Source Technology Centre
For development, discussion and tips for power savings,
visit http://www.lesswatts.org
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo RemoveThis @vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Back to top
Peter Zijlstra
External


Since: Jun 06, 2007
Posts: 205



PostPosted: Sun Oct 25, 2009 11:10 pm    Post subject: Re: [PATCH 3/3] sched: Disable affine wakeups by default [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

On Sun, 2009-10-25 at 23:04 +0100, Mike Galbraith wrote:
> if (want_affine && (tmp->flags & SD_WAKE_AFFINE) &&
> - cpumask_test_cpu(prev_cpu, sched_domain_span(tmp))) {
> + (level == SD_LV_SIBLING || level == SD_LV_MC)) {

quick comment without actually having looked at the patch, we should
really get rid of sd->level and encode properties of the sched domains
in sd->flags.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo RemoveThis @vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Back to top
Mike Galbraith
External


Since: May 26, 2006
Posts: 368



PostPosted: Mon Oct 26, 2009 1:10 am    Post subject: Re: [PATCH 3/3] sched: Disable affine wakeups by default [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

On Mon, 2009-10-26 at 02:53 +0100, Peter Zijlstra wrote:
> On Sun, 2009-10-25 at 23:04 +0100, Mike Galbraith wrote:
> > if (want_affine && (tmp->flags & SD_WAKE_AFFINE) &&
> > - cpumask_test_cpu(prev_cpu, sched_domain_span(tmp))) {
> > + (level == SD_LV_SIBLING || level == SD_LV_MC)) {
>
> quick comment without actually having looked at the patch, we should
> really get rid of sd->level and encode properties of the sched domains
> in sd->flags.

Yeah, sounds right, while writing that, it looked kinda ugly. I suppose
arch land needs to encode cache property somehow if I really want to be
able to target cache on multicore. Booting becomes.. exciting when I
tinker down there.

While tinkering with this, I noticed that when mysql+oltp starts
tripping over itself, if you move to any momentarily idle cpu, it helps
get the load moving again, the tail improves. Not hugely, but quite
measurable. There seems to be benefit to be had throughout the load
spectrum, just gotta figure out how to retrieve it without losing
anything.

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo.DeleteThis@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Back to top
Arjan van de Ven
External


Since: May 15, 2006
Posts: 901



PostPosted: Mon Oct 26, 2009 2:10 am    Post subject: Re: [PATCH 3/3] sched: Disable affine wakeups by default [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

On Mon, 26 Oct 2009 05:38:27 +0100
Mike Galbraith <efault.DeleteThis@gmx.de> wrote:

> On Mon, 2009-10-26 at 02:53 +0100, Peter Zijlstra wrote:
> > On Sun, 2009-10-25 at 23:04 +0100, Mike Galbraith wrote:
> > > if (want_affine && (tmp->flags & SD_WAKE_AFFINE)
> > > &&
> > > - cpumask_test_cpu(prev_cpu,
> > > sched_domain_span(tmp))) {
> > > + (level == SD_LV_SIBLING || level
> > > == SD_LV_MC)) {
> >
> > quick comment without actually having looked at the patch, we should
> > really get rid of sd->level and encode properties of the sched
> > domains in sd->flags.
>
> Yeah, sounds right, while writing that, it looked kinda ugly. I
> suppose arch land needs to encode cache property somehow if I really
> want to be able to target cache on multicore. Booting becomes..
> exciting when I tinker down there.

or we just use SD_WAKE_AFFINE / SD_BALANCE_WAKE for this...



--
Arjan van de Ven Intel Open Source Technology Centre
For development, discussion and tips for power savings,
visit http://www.lesswatts.org
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo.DeleteThis@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Back to top
Mike Galbraith
External


Since: May 26, 2006
Posts: 368



PostPosted: Mon Oct 26, 2009 2:10 am    Post subject: Re: [PATCH 3/3] sched: Disable affine wakeups by default [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

On Sun, 2009-10-25 at 21:52 -0700, Arjan van de Ven wrote:
> On Mon, 26 Oct 2009 05:38:27 +0100
> Mike Galbraith <efault RemoveThis @gmx.de> wrote:
>
> > On Mon, 2009-10-26 at 02:53 +0100, Peter Zijlstra wrote:
> > > On Sun, 2009-10-25 at 23:04 +0100, Mike Galbraith wrote:
> > > > if (want_affine && (tmp->flags & SD_WAKE_AFFINE)
> > > > &&
> > > > - cpumask_test_cpu(prev_cpu,
> > > > sched_domain_span(tmp))) {
> > > > + (level == SD_LV_SIBLING || level
> > > > == SD_LV_MC)) {
> > >
> > > quick comment without actually having looked at the patch, we should
> > > really get rid of sd->level and encode properties of the sched
> > > domains in sd->flags.
> >
> > Yeah, sounds right, while writing that, it looked kinda ugly. I
> > suppose arch land needs to encode cache property somehow if I really
> > want to be able to target cache on multicore. Booting becomes..
> > exciting when I tinker down there.
>
> or we just use SD_WAKE_AFFINE / SD_BALANCE_WAKE for this...

I don't see how. Oh, you mean another domain level, top level being
cache property, and turn off when degenerating? That looks like it'd be
a problem, but adding SD_CACHE_SIBLING or whatnot should work. Problem
is how to gain knowledge of whether multicores share a cache or not.

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo RemoveThis @vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Back to top
Arjan van de Ven
External


Since: May 15, 2006
Posts: 901



PostPosted: Mon Oct 26, 2009 2:10 am    Post subject: Re: [PATCH 3/3] sched: Disable affine wakeups by default [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

On Mon, 26 Oct 2009 06:08:54 +0100
Mike Galbraith <efault.RemoveThis@gmx.de> wrote:

> On Sun, 2009-10-25 at 21:52 -0700, Arjan van de Ven wrote:
> > On Mon, 26 Oct 2009 05:38:27 +0100
> > Mike Galbraith <efault.RemoveThis@gmx.de> wrote:
> >
> > > On Mon, 2009-10-26 at 02:53 +0100, Peter Zijlstra wrote:
> > > > On Sun, 2009-10-25 at 23:04 +0100, Mike Galbraith wrote:
> > > > > if (want_affine && (tmp->flags &
> > > > > SD_WAKE_AFFINE) &&
> > > > > - cpumask_test_cpu(prev_cpu,
> > > > > sched_domain_span(tmp))) {
> > > > > + (level == SD_LV_SIBLING ||
> > > > > level == SD_LV_MC)) {
> > > >
> > > > quick comment without actually having looked at the patch, we
> > > > should really get rid of sd->level and encode properties of the
> > > > sched domains in sd->flags.
> > >
> > > Yeah, sounds right, while writing that, it looked kinda ugly. I
> > > suppose arch land needs to encode cache property somehow if I
> > > really want to be able to target cache on multicore. Booting
> > > becomes.. exciting when I tinker down there.
> >
> > or we just use SD_WAKE_AFFINE / SD_BALANCE_WAKE for this...
>
> I don't see how. Oh, you mean another domain level, top level being
> cache property, and turn off when degenerating? That looks like it'd
> be a problem, but adding SD_CACHE_SIBLING or whatnot should work.
> Problem is how to gain knowledge of whether multicores share a cache
> or not.

Actually I meant setting the SD_BALANCE_WAKE flag for the SMT and MC
domains (and then making sure that "MC" really means "shares LLC" in
the arch code), and then using this as indication in the sched code..

if you're a multicore domain you better have a shared cache.. that's
what it should mean. If it does not we should fix that.

--
Arjan van de Ven Intel Open Source Technology Centre
For development, discussion and tips for power savings,
visit http://www.lesswatts.org
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo.RemoveThis@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Back to top
Mike Galbraith
External


Since: May 26, 2006
Posts: 368



PostPosted: Mon Oct 26, 2009 2:10 am    Post subject: Re: [PATCH 3/3] sched: Disable affine wakeups by default [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

On Sun, 2009-10-25 at 22:36 -0700, Arjan van de Ven wrote:
> On Mon, 26 Oct 2009 06:08:54 +0100
> Mike Galbraith <efault.DeleteThis@gmx.de> wrote:

> > >
> > > or we just use SD_WAKE_AFFINE / SD_BALANCE_WAKE for this...
> >
> > I don't see how. Oh, you mean another domain level, top level being
> > cache property, and turn off when degenerating? That looks like it'd
> > be a problem, but adding SD_CACHE_SIBLING or whatnot should work.
> > Problem is how to gain knowledge of whether multicores share a cache
> > or not.
>
> Actually I meant setting the SD_BALANCE_WAKE flag for the SMT and MC
> domains (and then making sure that "MC" really means "shares LLC" in
> the arch code), and then using this as indication in the sched code..

I don't think we can do that, because SD_WAKE_BALANCE already has a
different meaning. SD_WAKE_AFFINE could be used though, affine wakeups
have always been a cache thing, and for trying to keep things affine to
a package or whatnot, we have SD_PREFER_LOCAL. Sounds clean to me.

> if you're a multicore domain you better have a shared cache.. that's
> what it should mean. If it does not we should fix that.

Sounds reasonable to me. I'll go make explosions.

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo.DeleteThis@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Back to top
Mike Galbraith
External


Since: May 26, 2006
Posts: 368



PostPosted: Mon Oct 26, 2009 3:10 am    Post subject: Re: [PATCH 3/3] sched: Disable affine wakeups by default [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

On Mon, 2009-10-26 at 06:47 +0100, Mike Galbraith wrote:
> On Sun, 2009-10-25 at 22:36 -0700, Arjan van de Ven wrote:

> > if you're a multicore domain you better have a shared cache.. that's
> > what it should mean. If it does not we should fix that.
>
> Sounds reasonable to me. I'll go make explosions.

(Actually, if multicode and sibling does indeed mean shared cache, no
arch tinkering should be necessary, just reset SD_WAKE_AFFINE when
degenerating should work fine. Only thing is multicore with siblings..
and test test test)

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo.DeleteThis@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Back to top
Arjan van de Ven
External


Since: May 15, 2006
Posts: 901



PostPosted: Mon Oct 26, 2009 4:10 am    Post subject: Re: [PATCH 3/3] sched: Disable affine wakeups by default [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

On Mon, 26 Oct 2009 08:01:12 +0100
Ingo Molnar <mingo.TakeThisOut@elte.hu> wrote:

>
> * Mike Galbraith <efault.TakeThisOut@gmx.de> wrote:
>
> > On Mon, 2009-10-26 at 06:47 +0100, Mike Galbraith wrote:
> > > On Sun, 2009-10-25 at 22:36 -0700, Arjan van de Ven wrote:
> >
> > > > if you're a multicore domain you better have a shared cache..
> > > > that's what it should mean. If it does not we should fix that.
> > >
> > > Sounds reasonable to me. I'll go make explosions.
> >
> > (Actually, if multicode and sibling does indeed mean shared cache,
> > no arch tinkering should be necessary, just reset SD_WAKE_AFFINE
> > when degenerating should work fine. Only thing is multicore with
> > siblings.. and test test test)
>
> Correct. There's a few cpus where multicore means separate caches but
> all modern CPUs have shared caches for cores so we want to tune for
> that.

for those cpus where mc means separate caches we should fix the arch
code to set up separate MC domains to be honest..
I can look into that in a bit..


--
Arjan van de Ven Intel Open Source Technology Centre
For development, discussion and tips for power savings,
visit http://www.lesswatts.org
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo.TakeThisOut@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Back to top
Ingo Molnar
External


Since: May 15, 2006
Posts: 3112



PostPosted: Mon Oct 26, 2009 4:10 am    Post subject: Re: [PATCH 3/3] sched: Disable affine wakeups by default [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

* Mike Galbraith <efault.RemoveThis@gmx.de> wrote:

> On Mon, 2009-10-26 at 06:47 +0100, Mike Galbraith wrote:
> > On Sun, 2009-10-25 at 22:36 -0700, Arjan van de Ven wrote:
>
> > > if you're a multicore domain you better have a shared cache..
> > > that's what it should mean. If it does not we should fix that.
> >
> > Sounds reasonable to me. I'll go make explosions.
>
> (Actually, if multicode and sibling does indeed mean shared cache, no
> arch tinkering should be necessary, just reset SD_WAKE_AFFINE when
> degenerating should work fine. Only thing is multicore with
> siblings.. and test test test)

Correct. There's a few cpus where multicore means separate caches but
all modern CPUs have shared caches for cores so we want to tune for
that.

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo.RemoveThis@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Back to top
Suresh Siddha
External


Since: Jan 12, 2009
Posts: 22



PostPosted: Mon Oct 26, 2009 8:10 am    Post subject: Re: [PATCH 3/3] sched: Disable affine wakeups by default [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

On Mon, 2009-10-26 at 00:05 -0700, Arjan van de Ven wrote:
> On Mon, 26 Oct 2009 08:01:12 +0100
> Ingo Molnar <mingo RemoveThis @elte.hu> wrote:
>
> > Correct. There's a few cpus where multicore means separate caches but
> > all modern CPUs have shared caches for cores so we want to tune for
> > that.
>
> for those cpus where mc means separate caches we should fix the arch
> code to set up separate MC domains to be honest..
> I can look into that in a bit..

In the default performance mode, multi-core domain is populated with
only cores sharing last-level cache. In the case where the cores don't
share caches, we represent them in the smp domain.

thanks,
suresh

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo RemoveThis @vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Back to top
Mike Galbraith
External


Since: May 26, 2006
Posts: 368



PostPosted: Tue Oct 27, 2009 11:10 am    Post subject: Re: [PATCH 3/3] sched: Disable affine wakeups by default [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

On Mon, 2009-10-26 at 02:53 +0100, Peter Zijlstra wrote:
> On Sun, 2009-10-25 at 23:04 +0100, Mike Galbraith wrote:
> > if (want_affine && (tmp->flags & SD_WAKE_AFFINE) &&
> > - cpumask_test_cpu(prev_cpu, sched_domain_span(tmp))) {
> > + (level == SD_LV_SIBLING || level == SD_LV_MC)) {
>
> quick comment without actually having looked at the patch, we should
> really get rid of sd->level and encode properties of the sched domains
> in sd->flags.

I used SD_PREFER_SIBLING in the below. Did I break anything?

(wonder what it does for pgsql+oltp on beefy box with siblings)

tip v2.6.32-rc5-1724-g77a088c

mysql+oltp
clients 1 2 4 8 16 32 64 128 256
tip 9999.77 18472.11 34931.60 34412.09 33006.76 32104.36 30700.47 28111.31 25535.09
10082.75 18625.12 34928.17 34476.91 33088.70 32002.36 30695.77 28173.94 25551.05
9949.05 18466.54 34942.66 34420.74 33092.45 32041.10 30666.43 28090.90 25467.63
tip avg 10010.52 18521.25 34934.14 34436.58 33062.63 32049.27 30687.55 28125.38 25517.92

tip+ 9622.23 18297.65 34496.12 34230.85 32704.20 31796.54 30480.45 27740.20 25394.12
10207.79 18275.83 34622.39 34222.47 32996.69 31936.48 30551.29 28144.48 25616.62
10225.32 18515.02 34538.41 34278.06 33014.14 31965.31 30363.90 28089.41 25531.81
tip+ avg 10018.44 18362.83 34552.30 34243.79 32905.01 31899.44 30465.21 27991.36 25514.18
vs tip 1.000 .991 .989 .994 .995 .995 .992 .995 .999

pgsql+oltp
clients 1 2 4 8 16 32 64 128 256
tip 13945.42 26973.91 52504.18 52613.32 51310.82 50442.61 49826.52 48760.62 45570.45
13921.41 27021.48 52722.64 52565.16 51483.19 50638.83 49499.51 48621.31 46115.77
13924.94 26961.02 52624.45 52365.49 51384.91 50499.44 49622.83 48065.03 45743.14
tip avg 13930.59 26985.47 52617.09 52514.65 51392.97 50526.96 49649.62 48482.32 45809.78

tip+ 15259.79 29162.31 52609.01 52562.16 51578.48 50631.90 49537.41 48376.23 46058.95
15156.54 29114.10 52760.02 52524.86 51412.94 50656.30 48774.34 47968.77 45905.02
15118.64 29190.73 52929.34 52503.58 51574.34 50232.27 49599.15 48283.42 45766.74
tip+ avg 15178.32 29155.71 52766.12 52530.20 51521.92 50506.82 49303.63 48209.47 45910.23
vs tip 1.089 1.080 1.002 1.000 1.002 .999 .993 .994 1.002

sched: check for an idle shared cache in select_task_rq_fair()

When waking affine, check for an idle shared cache, and if found, wake to
that CPU/sibling instead of the waker's CPU. This improves pgsql+oltp
ramp up by roughly 8%. Possibly more for other loads, depending on overlap.
The trade-off is a roughly 1% peak downturn if tasks are truly synchronous.

Signed-off-by: Mike Galbraith <efault.RemoveThis@gmx.de>
Cc: Ingo Molnar <mingo.RemoveThis@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra.RemoveThis@chello.nl>
LKML-Reference: <new-submission>

---
kernel/sched_fair.c | 33 +++++++++++++++++++++++++++++----
1 file changed, 29 insertions(+), 4 deletions(-)

Index: linux-2.6/kernel/sched_fair.c
===================================================================
--- linux-2.6.orig/kernel/sched_fair.c
+++ linux-2.6/kernel/sched_fair.c
@@ -1398,11 +1398,36 @@ static int select_task_rq_fair(struct ta
want_sd = 0;
}

- if (want_affine && (tmp->flags & SD_WAKE_AFFINE) &&
- cpumask_test_cpu(prev_cpu, sched_domain_span(tmp))) {
+ if (want_affine && (tmp->flags & SD_WAKE_AFFINE)) {
+ int candidate = -1, i;

- affine_sd = tmp;
- want_affine = 0;
+ if (cpumask_test_cpu(prev_cpu, sched_domain_span(tmp)))
+ candidate = cpu;
+
+ /*
+ * Check for an idle shared cache.
+ */
+ if (tmp->flags & SD_PREFER_SIBLING) {
+ if (candidate == cpu) {
+ if (!cpu_rq(prev_cpu)->cfs.nr_running)
+ candidate = prev_cpu;
+ }
+
+ if (candidate == -1 || candidate == cpu) {
+ for_each_cpu(i, sched_domain_span(tmp)) {
+ if (!cpu_rq(i)->cfs.nr_running) {
+ candidate = i;
+ break;
+ }
+ }
+ }
+ }
+
+ if (candidate >= 0) {
+ affine_sd = tmp;
+ want_affine = 0;
+ cpu = candidate;
+ }
}

if (!want_sd && !want_affine)


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo.RemoveThis@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Back to top
Display posts from previous:   
Post new topic   General Reply to Topic (not reply to a specific post)    Forums Home -> Kernel All times are: Eastern Time (US & Canada) (change)
Goto page 1, 2
Page 1 of 2

 
You can post new topics in this forum
You can reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum