Help!

[PATCH 00/19] Hardware Accelerated MD RAID5: Introduction

 
  

Goto page Previous  1, 2, 3
Post new topic   General Reply to Topic (not reply to a specific post)    Forums Home -> Kernel (archive) RSS
Next:  [PATCH] i386-pda: Initialize the PDA early, befor..  
Author Message
Jakob Oestergaard
External


Since: May 16, 2006
Posts: 16



PostPosted: Thu Sep 14, 2006 9:50 am    Post subject: Re: [PATCH 00/19] Hardware Accelerated MD RAID5: Introduction [Login to view extended thread Info.]
Archived from groups: linux>kernel (more info?)

On Wed, Sep 13, 2006 at 12:17:55PM -0700, Dan Williams wrote:
....
> >Out of curiosity; how does accelerated compare to non-accelerated?
>
> One quick example:
> 4-disk SATA array rebuild on iop321 without acceleration - 'top'
> reports md0_resync and md0_raid5 dueling for the CPU each at ~50%
> utilization.
>
> With acceleration - 'top' reports md0_resync cpu utilization at ~90%
> with the rest split between md0_raid5 and md0_raid5_ops.
>
> The sync speed reported by /proc/mdstat is ~40% higher in the accelerated
> case.

Ok, nice Smile

>
> That being said, array resync is a special case, so your mileage may
> vary with other applications.

Every-day usage I/O performance data would be nice indeed Smile

> I will put together some data from bonnie++, iozone, maybe contest,
> and post it on SourceForge.

Great!

--

/ jakob

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo.DeleteThis@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Back to top
Olof Johansson
External


Since: Aug 16, 2005
Posts: 34



PostPosted: Fri Sep 15, 2006 4:50 pm    Post subject: Re: [PATCH 09/19] dmaengine: reduce backend address permutations [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

Hi,

On Mon, 11 Sep 2006 16:18:23 -0700 Dan Williams <dan.j.williams DeleteThis @intel.com> wrote:

> From: Dan Williams <dan.j.williams DeleteThis @intel.com>
>
> Change the backend dma driver API to accept a 'union dmaengine_addr'. The
> intent is to be able to support a wide range of frontend address type
> permutations without needing an equal number of function type permutations
> on the backend.

Please do the cleanup of existing code before you apply new function.
Earlier patches in this series added code that you're modifying here.
If you modify the existing code first it's less churn for everyone to
review.


Thanks,

Olof
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo DeleteThis @vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Back to top
Olof Johansson
External


Since: Aug 16, 2005
Posts: 34



PostPosted: Fri Sep 15, 2006 5:00 pm    Post subject: Re: [PATCH 16/19] dmaengine: Driver for the Intel IOP 32x, 33x, and 13xx RAID engines [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

Hi,

On Mon, 11 Sep 2006 16:19:00 -0700 Dan Williams <dan.j.williams DeleteThis @intel.com> wrote:

> From: Dan Williams <dan.j.williams DeleteThis @intel.com>
>
> This is a driver for the iop DMA/AAU/ADMA units which are capable of pq_xor,
> pq_update, pq_zero_sum, xor, dual_xor, xor_zero_sum, fill, copy+crc, and copy
> operations.

You implement a bunch of different functions here. I agree with Jeff's
feedback related to the lack of scalability the way the API is going
right now.

Another example of this is that the driver is doing it's own self-test
of the functions. This means that every backend driver will need to
duplicate this code. Wouldn't it be easier for everyone if the common
infrastructure did a test call at the time of registration of a
function instead, and return failure if it doesn't pass?

> drivers/dma/Kconfig | 27 +
> drivers/dma/Makefile | 1
> drivers/dma/iop-adma.c | 1501 +++++++++++++++++++++++++++++++++++
> include/asm-arm/hardware/iop_adma.h | 98 ++

ioatdma.h is currently under drivers/dma/. If the contents is strictly
device-related please add them under drivers/dma.


-Olof
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo DeleteThis @vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Back to top
Olof Johansson
External


Since: Aug 16, 2005
Posts: 34



PostPosted: Fri Sep 15, 2006 6:50 pm    Post subject: Re: [PATCH 08/19] dmaengine: enable multiple clients and operations [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

On Mon, 11 Sep 2006 19:44:16 -0400 Jeff Garzik <jeff.DeleteThis@garzik.org> wrote:

> Dan Williams wrote:
> > @@ -759,8 +755,10 @@ #endif
> > device->common.device_memcpy_buf_to_buf = ioat_dma_memcpy_buf_to_buf;
> > device->common.device_memcpy_buf_to_pg = ioat_dma_memcpy_buf_to_pg;
> > device->common.device_memcpy_pg_to_pg = ioat_dma_memcpy_pg_to_pg;
> > - device->common.device_memcpy_complete = ioat_dma_is_complete;
> > - device->common.device_memcpy_issue_pending = ioat_dma_memcpy_issue_pending;
> > + device->common.device_operation_complete = ioat_dma_is_complete;
> > + device->common.device_xor_pgs_to_pg = dma_async_xor_pgs_to_pg_err;
> > + device->common.device_issue_pending = ioat_dma_memcpy_issue_pending;
> > + device->common.capabilities = DMA_MEMCPY;
>
>
> Are we really going to add a set of hooks for each DMA engine whizbang
> feature?
>
> That will get ugly when DMA engines support memcpy, xor, crc32, sha1,
> aes, and a dozen other transforms.


Yes, it will be unmaintainable. We need some sort of multiplexing with
per-function registrations.

Here's a first cut at it, just very quick. It could be improved further
but it shows that we could exorcise most of the hardcoded things pretty
easily.

Dan, would this fit with your added XOR stuff as well? If so, would you
mind rebasing on top of something like this (with your further cleanups
going in before added function, please. Smile

(Build tested only, since I lack Intel hardware).


It would be nice if we could move the type specification to only be
needed in the channel allocation. I don't know how well that fits the
model for some of the hardware platforms though, since a single channel
might be shared for different types of functions. Maybe we need a
different level of abstraction there instead, i.e. divorce the hardware
channel and software channel model and have several software channels
map onto a hardware one.





Clean up the DMA API a bit, allowing each engine to register an array
of supported functions instead of allocating static names for each possible
function.


Signed-off-by: Olof Johansson <olof.DeleteThis@lixom.net>


diff --git a/drivers/dma/dmaengine.c b/drivers/dma/dmaengine.c
index 1527804..282ce85 100644
--- a/drivers/dma/dmaengine.c
+++ b/drivers/dma/dmaengine.c
@@ -80,7 +80,7 @@ static ssize_t show_memcpy_count(struct
int i;

for_each_possible_cpu(i)
- count += per_cpu_ptr(chan->local, i)->memcpy_count;
+ count += per_cpu_ptr(chan->local, i)->count;

return sprintf(buf, "%lu\n", count);
}
@@ -105,7 +105,7 @@ static ssize_t show_in_use(struct class_
}

static struct class_device_attribute dma_class_attrs[] = {
- __ATTR(memcpy_count, S_IRUGO, show_memcpy_count, NULL),
+ __ATTR(count, S_IRUGO, show_memcpy_count, NULL),
__ATTR(bytes_transferred, S_IRUGO, show_bytes_transferred, NULL),
__ATTR(in_use, S_IRUGO, show_in_use, NULL),
__ATTR_NULL
@@ -402,11 +402,11 @@ subsys_initcall(dma_bus_init);
EXPORT_SYMBOL(dma_async_client_register);
EXPORT_SYMBOL(dma_async_client_unregister);
EXPORT_SYMBOL(dma_async_client_chan_request);
-EXPORT_SYMBOL(dma_async_memcpy_buf_to_buf);
-EXPORT_SYMBOL(dma_async_memcpy_buf_to_pg);
-EXPORT_SYMBOL(dma_async_memcpy_pg_to_pg);
-EXPORT_SYMBOL(dma_async_memcpy_complete);
-EXPORT_SYMBOL(dma_async_memcpy_issue_pending);
+EXPORT_SYMBOL(dma_async_buf_to_buf);
+EXPORT_SYMBOL(dma_async_buf_to_pg);
+EXPORT_SYMBOL(dma_async_pg_to_pg);
+EXPORT_SYMBOL(dma_async_complete);
+EXPORT_SYMBOL(dma_async_issue_pending);
EXPORT_SYMBOL(dma_async_device_register);
EXPORT_SYMBOL(dma_async_device_unregister);
EXPORT_SYMBOL(dma_chan_cleanup);
diff --git a/drivers/dma/ioatdma.c b/drivers/dma/ioatdma.c
index dbd4d6c..6cbed42 100644
--- a/drivers/dma/ioatdma.c
+++ b/drivers/dma/ioatdma.c
@@ -40,6 +40,7 @@
#define to_ioat_device(dev) container_of(dev, struct ioat_device, common)
#define to_ioat_desc(lh) container_of(lh, struct ioat_desc_sw, node)

+
/* internal functions */
static int __devinit ioat_probe(struct pci_dev *pdev, const struct pci_device_id *ent);
static void __devexit ioat_remove(struct pci_dev *pdev);
@@ -681,6 +682,14 @@ out:
return err;
}

+struct dma_function ioat_memcpy_functions = {
+ .buf_to_buf = ioat_dma_memcpy_buf_to_buf,
+ .buf_to_pg = ioat_dma_memcpy_buf_to_pg,
+ .pg_to_pg = ioat_dma_memcpy_pg_to_pg,
+ .complete = ioat_dma_is_complete,
+ .issue_pending = ioat_dma_memcpy_issue_pending,
+};
+
static int __devinit ioat_probe(struct pci_dev *pdev,
const struct pci_device_id *ent)
{
@@ -756,11 +765,8 @@ static int __devinit ioat_probe(struct p

device->common.device_alloc_chan_resources = ioat_dma_alloc_chan_resources;
device->common.device_free_chan_resources = ioat_dma_free_chan_resources;
- device->common.device_memcpy_buf_to_buf = ioat_dma_memcpy_buf_to_buf;
- device->common.device_memcpy_buf_to_pg = ioat_dma_memcpy_buf_to_pg;
- device->common.device_memcpy_pg_to_pg = ioat_dma_memcpy_pg_to_pg;
- device->common.device_memcpy_complete = ioat_dma_is_complete;
- device->common.device_memcpy_issue_pending = ioat_dma_memcpy_issue_pending;
+ device->common.funcs[DMAFUNC_MEMCPY] = &ioat_memcpy_functions;
+
printk(KERN_INFO "Intel(R) I/OAT DMA Engine found, %d channels\n",
device->common.chancnt);

diff --git a/drivers/dma/iovlock.c b/drivers/dma/iovlock.c
index d637555..8a2f642 100644
--- a/drivers/dma/iovlock.c
+++ b/drivers/dma/iovlock.c
@@ -151,11 +151,8 @@ static dma_cookie_t dma_memcpy_to_kernel
while (len > 0) {
if (iov->iov_len) {
int copy = min_t(unsigned int, iov->iov_len, len);
- dma_cookie = dma_async_memcpy_buf_to_buf(
- chan,
- iov->iov_base,
- kdata,
- copy);
+ dma_cookie = dma_async_buf_to_buf(DMAFUNC_MEMCPY, chan,
+ iov->iov_base, kdata, copy);
kdata += copy;
len -= copy;
iov->iov_len -= copy;
@@ -210,7 +207,7 @@ dma_cookie_t dma_memcpy_to_iovec(struct
copy = min_t(int, PAGE_SIZE - iov_byte_offset, len);
copy = min_t(int, copy, iov[iovec_idx].iov_len);

- dma_cookie = dma_async_memcpy_buf_to_pg(chan,
+ dma_cookie = dma_async_buf_to_pg(DMAFUNC_MEMCPY, chan,
page_list->pages[page_idx],
iov_byte_offset,
kdata,
@@ -274,7 +271,7 @@ dma_cookie_t dma_memcpy_pg_to_iovec(stru
copy = min_t(int, PAGE_SIZE - iov_byte_offset, len);
copy = min_t(int, copy, iov[iovec_idx].iov_len);

- dma_cookie = dma_async_memcpy_pg_to_pg(chan,
+ dma_cookie = dma_async_pg_to_pg(DMAFUNC_MEMCPY, chan,
page_list->pages[page_idx],
iov_byte_offset,
page,
diff --git a/include/linux/dmaengine.h b/include/linux/dmaengine.h
index c94d8f1..317a7f2 100644
--- a/include/linux/dmaengine.h
+++ b/include/linux/dmaengine.h
@@ -67,14 +67,14 @@ enum dma_status {
/**
* struct dma_chan_percpu - the per-CPU part of struct dma_chan
* @refcount: local_t used for open-coded "bigref" counting
- * @memcpy_count: transaction counter
+ * @count: transaction counter
* @bytes_transferred: byte counter
*/

struct dma_chan_percpu {
local_t refcount;
/* stats */
- unsigned long memcpy_count;
+ unsigned long count;
unsigned long bytes_transferred;
};

@@ -157,6 +157,34 @@ struct dma_client {
struct list_head global_node;
};

+enum dma_function_type {
+ DMAFUNC_MEMCPY = 0,
+ DMAFUNC_XOR,
+ DMAFUNC_MAX
+};
+
+/* struct dma_function
+ * @buf_to_pg: buf pointer to struct page
+ * @pg_to_pg: struct page/offset to struct page/offset
+ * @complete: poll the status of a DMA transaction
+ * @issue_pending: push appended descriptors to hardware
+ */
+struct dma_function {
+ dma_cookie_t (*buf_to_buf)(struct dma_chan *chan,
+ void *dest, void *src, size_t len);
+ dma_cookie_t (*buf_to_pg)(struct dma_chan *chan,
+ struct page *page, unsigned int offset,
+ void *kdata, size_t len);
+ dma_cookie_t (*pg_to_pg)(struct dma_chan *chan,
+ struct page *dest_pg, unsigned int dest_off,
+ struct page *src_pg, unsigned int src_off,
+ size_t len);
+ enum dma_status (*complete)(struct dma_chan *chan,
+ dma_cookie_t cookie, dma_cookie_t *last,
+ dma_cookie_t *used);
+ void (*issue_pending)(struct dma_chan *chan);
+};
+
/**
* struct dma_device - info on the entity supplying DMA services
* @chancnt: how many DMA channels are supported
@@ -168,14 +196,8 @@ struct dma_client {
* @device_alloc_chan_resources: allocate resources and return the
* number of allocated descriptors
* @device_free_chan_resources: release DMA channel's resources
- * @device_memcpy_buf_to_buf: memcpy buf pointer to buf pointer
- * @device_memcpy_buf_to_pg: memcpy buf pointer to struct page
- * @device_memcpy_pg_to_pg: memcpy struct page/offset to struct page/offset
- * @device_memcpy_complete: poll the status of an IOAT DMA transaction
- * @device_memcpy_issue_pending: push appended descriptors to hardware
*/
struct dma_device {
-
unsigned int chancnt;
struct list_head channels;
struct list_head global_node;
@@ -185,20 +207,10 @@ struct dma_device {

int dev_id;

+ struct dma_function *funcs[DMAFUNC_MAX];
+
int (*device_alloc_chan_resources)(struct dma_chan *chan);
void (*device_free_chan_resources)(struct dma_chan *chan);
- dma_cookie_t (*device_memcpy_buf_to_buf)(struct dma_chan *chan,
- void *dest, void *src, size_t len);
- dma_cookie_t (*device_memcpy_buf_to_pg)(struct dma_chan *chan,
- struct page *page, unsigned int offset, void *kdata,
- size_t len);
- dma_cookie_t (*device_memcpy_pg_to_pg)(struct dma_chan *chan,
- struct page *dest_pg, unsigned int dest_off,
- struct page *src_pg, unsigned int src_off, size_t len);
- enum dma_status (*device_memcpy_complete)(struct dma_chan *chan,
- dma_cookie_t cookie, dma_cookie_t *last,
- dma_cookie_t *used);
- void (*device_memcpy_issue_pending)(struct dma_chan *chan);
};

/* --- public DMA engine API --- */
@@ -209,7 +221,7 @@ void dma_async_client_chan_request(struc
unsigned int number);

/**
- * dma_async_memcpy_buf_to_buf - offloaded copy between virtual addresses
+ * dma_async_buf_to_buf - offloaded copy between virtual addresses
* @chan: DMA channel to offload copy to
* @dest: destination address (virtual)
* @src: source address (virtual)
@@ -220,19 +232,24 @@ void dma_async_client_chan_request(struc
* Both @dest and @src must stay memory resident (kernel memory or locked
* user space pages).
*/
-static inline dma_cookie_t dma_async_memcpy_buf_to_buf(struct dma_chan *chan,
- void *dest, void *src, size_t len)
+static inline dma_cookie_t dma_async_buf_to_buf(enum dma_function_type type,
+ struct dma_chan *chan, void *dest, void *src, size_t len)
{
- int cpu = get_cpu();
+ int cpu;
+
+ if (!chan->device->funcs[type])
+ return -ENXIO;
+
+ cpu = get_cpu();
per_cpu_ptr(chan->local, cpu)->bytes_transferred += len;
- per_cpu_ptr(chan->local, cpu)->memcpy_count++;
+ per_cpu_ptr(chan->local, cpu)->count++;
put_cpu();

- return chan->device->device_memcpy_buf_to_buf(chan, dest, src, len);
+ return chan->device->funcs[type]->buf_to_buf(chan, dest, src, len);
}

/**
- * dma_async_memcpy_buf_to_pg - offloaded copy from address to page
+ * dma_async_buf_to_pg - offloaded copy from address to page
* @chan: DMA channel to offload copy to
* @page: destination page
* @offset: offset in page to copy to
@@ -244,20 +261,26 @@ static inline dma_cookie_t dma_async_mem
* Both @page/@offset and @kdata must stay memory resident (kernel memory or
* locked user space pages)
*/
-static inline dma_cookie_t dma_async_memcpy_buf_to_pg(struct dma_chan *chan,
- struct page *page, unsigned int offset, void *kdata, size_t len)
+static inline dma_cookie_t dma_async_buf_to_pg(enum dma_function_type type,
+ struct dma_chan *chan, struct page *page, unsigned int offset,
+ void *kdata, size_t len)
{
- int cpu = get_cpu();
+ int cpu;
+
+ if (!chan->device->funcs[type])
+ return -ENXIO;
+
+ cpu = get_cpu();
per_cpu_ptr(chan->local, cpu)->bytes_transferred += len;
- per_cpu_ptr(chan->local, cpu)->memcpy_count++;
+ per_cpu_ptr(chan->local, cpu)->count++;
put_cpu();

- return chan->device->device_memcpy_buf_to_pg(chan, page, offset,
- kdata, len);
+ return chan->device->funcs[type]->buf_to_pg(chan, page, offset,
+ kdata, len);
}

/**
- * dma_async_memcpy_pg_to_pg - offloaded copy from page to page
+ * dma_async_pg_to_pg - offloaded copy from page to page
* @chan: DMA channel to offload copy to
* @dest_pg: destination page
* @dest_off: offset in page to copy to
@@ -270,33 +293,40 @@ static inline dma_cookie_t dma_async_mem
* Both @dest_page/@dest_off and @src_page/@src_off must stay memory resident
* (kernel memory or locked user space pages).
*/
-static inline dma_cookie_t dma_async_memcpy_pg_to_pg(struct dma_chan *chan,
- struct page *dest_pg, unsigned int dest_off, struct page *src_pg,
- unsigned int src_off, size_t len)
+static inline dma_cookie_t dma_async_pg_to_pg(enum dma_function_type type,
+ struct dma_chan *chan, struct page *dest_pg, unsigned int dest_off,
+ struct page *src_pg, unsigned int src_off, size_t len)
{
- int cpu = get_cpu();
+ int cpu;
+
+ if (!chan->device->funcs[type])
+ return -ENXIO;
+
+ cpu = get_cpu();
per_cpu_ptr(chan->local, cpu)->bytes_transferred += len;
- per_cpu_ptr(chan->local, cpu)->memcpy_count++;
+ per_cpu_ptr(chan->local, cpu)->count++;
put_cpu();

- return chan->device->device_memcpy_pg_to_pg(chan, dest_pg, dest_off,
- src_pg, src_off, len);
+ return chan->device->funcs[type]->pg_to_pg(chan, dest_pg, dest_off,
+ src_pg, src_off, len);
}

/**
- * dma_async_memcpy_issue_pending - flush pending copies to HW
+ * dma_async_issue_pending - flush pending copies to HW
* @chan: target DMA channel
*
* This allows drivers to push copies to HW in batches,
* reducing MMIO writes where possible.
*/
-static inline void dma_async_memcpy_issue_pending(struct dma_chan *chan)
+static inline void dma_async_issue_pending(enum dma_function_type type,
+ struct dma_chan *chan)
{
- return chan->device->device_memcpy_issue_pending(chan);
+ if (chan->device->funcs[type])
+ return chan->device->funcs[type]->issue_pending(chan);
}

/**
- * dma_async_memcpy_complete - poll for transaction completion
+ * dma_async_complete - poll for transaction completion
* @chan: DMA channel
* @cookie: transaction identifier to check status of
* @last: returns last completed cookie, can be NULL
@@ -306,10 +336,14 @@ static inline void dma_async_memcpy_issu
* internal state and can be used with dma_async_is_complete() to check
* the status of multiple cookies without re-checking hardware state.
*/
-static inline enum dma_status dma_async_memcpy_complete(struct dma_chan *chan,
- dma_cookie_t cookie, dma_cookie_t *last, dma_cookie_t *used)
+static inline enum dma_status dma_async_complete(enum dma_function_type type,
+ struct dma_chan *chan, dma_cookie_t cookie, dma_cookie_t *last,
+ dma_cookie_t *used)
{
- return chan->device->device_memcpy_complete(chan, cookie, last, used);
+ if (!chan->device->funcs[type])
+ return -ENXIO;
+ else
+ return chan->device->funcs[type]->complete(chan, cookie, last, used);
}

/**
@@ -318,7 +352,7 @@ static inline enum dma_status dma_async_
* @last_complete: last know completed transaction
* @last_used: last cookie value handed out
*
- * dma_async_is_complete() is used in dma_async_memcpy_complete()
+ * dma_async_is_complete() is used in dma_async_complete()
* the test logic is seperated for lightweight testing of multiple cookies
*/
static inline enum dma_status dma_async_is_complete(dma_cookie_t cookie,
diff --git a/net/core/dev.c b/net/core/dev.c
index d4a1ec3..e8a8ee9 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1945,7 +1945,7 @@ out:
struct dma_chan *chan;
rcu_read_lock();
list_for_each_entry_rcu(chan, &net_dma_client->channels, client_node)
- dma_async_memcpy_issue_pending(chan);
+ dma_async_issue_pending(DMAFUNC_MEMCPY, chan);
rcu_read_unlock();
}
#endif
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 934396b..c270837 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -1431,9 +1431,9 @@ skip_copy:
struct sk_buff *skb;
dma_cookie_t done, used;

- dma_async_memcpy_issue_pending(tp->ucopy.dma_chan);
+ dma_async_issue_pending(DMAFUNC_MEMCPY, tp->ucopy.dma_chan);

- while (dma_async_memcpy_complete(tp->ucopy.dma_chan,
+ while (dma_async_complete(DMAFUNC_MEMCPY, tp->ucopy.dma_chan,
tp->ucopy.dma_cookie, &done,
&used) == DMA_IN_PROGRESS) {
/* do partial cleanup of sk_async_wait_queue */


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo.DeleteThis@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Back to top
Olof Johansson
External


Since: Aug 16, 2005
Posts: 34



PostPosted: Fri Sep 15, 2006 9:50 pm    Post subject: [PATCH] dmaengine: clean up and abstract function types (was Re: [PATCH 08/19] dmaengine: enable multiple clients and operations) [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

On Fri, 15 Sep 2006 11:38:17 -0500 Olof Johansson <olof.DeleteThis@lixom.net> wrote:

> On Mon, 11 Sep 2006 19:44:16 -0400 Jeff Garzik <jeff.DeleteThis@garzik.org> wrote:

> > Are we really going to add a set of hooks for each DMA engine whizbang
> > feature?
> >
> > That will get ugly when DMA engines support memcpy, xor, crc32, sha1,
> > aes, and a dozen other transforms.
>
>
> Yes, it will be unmaintainable. We need some sort of multiplexing with
> per-function registrations.
>
> Here's a first cut at it, just very quick. It could be improved further
> but it shows that we could exorcise most of the hardcoded things pretty
> easily.

Ok, that was obviously a naive and not so nice first attempt, but I
figured it was worth it to show how it can be done.

This is a little more proper: Specify at client registration time what
the function the client will use is, and make the channel use it. This
way most of the error checking per call can be removed too.

Chris/Dan: Please consider picking this up as a base for the added
functionality and cleanups.





Clean up dmaengine a bit. Make the client registration specify which
channel functions ("type") the client will use. Also, make devices
register which functions they will provide.

Also exorcise most of the memcpy-specific references from the generic
dma engine code. There's still some left in the iov stuff.


Signed-off-by: Olof Johansson <olof.DeleteThis@lixom.net>

Index: linux-2.6/drivers/dma/dmaengine.c
===================================================================
--- linux-2.6.orig/drivers/dma/dmaengine.c
+++ linux-2.6/drivers/dma/dmaengine.c
@@ -73,14 +73,14 @@ static LIST_HEAD(dma_client_list);

/* --- sysfs implementation --- */

-static ssize_t show_memcpy_count(struct class_device *cd, char *buf)
+static ssize_t show_count(struct class_device *cd, char *buf)
{
struct dma_chan *chan = container_of(cd, struct dma_chan, class_dev);
unsigned long count = 0;
int i;

for_each_possible_cpu(i)
- count += per_cpu_ptr(chan->local, i)->memcpy_count;
+ count += per_cpu_ptr(chan->local, i)->count;

return sprintf(buf, "%lu\n", count);
}
@@ -105,7 +105,7 @@ static ssize_t show_in_use(struct class_
}

static struct class_device_attribute dma_class_attrs[] = {
- __ATTR(memcpy_count, S_IRUGO, show_memcpy_count, NULL),
+ __ATTR(count, S_IRUGO, show_count, NULL),
__ATTR(bytes_transferred, S_IRUGO, show_bytes_transferred, NULL),
__ATTR(in_use, S_IRUGO, show_in_use, NULL),
__ATTR_NULL
@@ -142,6 +142,10 @@ static struct dma_chan *dma_client_chan_

/* Find a channel, any DMA engine will do */
list_for_each_entry(device, &dma_device_list, global_node) {
+ /* Skip devices that don't provide the right function */
+ if (!device->funcs[client->type])
+ continue;
+
list_for_each_entry(chan, &device->channels, device_node) {
if (chan->client)
continue;
@@ -241,7 +245,8 @@ static void dma_chans_rebalance(void)
* dma_async_client_register - allocate and register a &dma_client
* @event_callback: callback for notification of channel addition/removal
*/
-struct dma_client *dma_async_client_register(dma_event_callback event_callback)
+struct dma_client *dma_async_client_register(enum dma_function_type type,
+ dma_event_callback event_callback)
{
struct dma_client *client;

@@ -254,6 +259,7 @@ struct dma_client *dma_async_client_regi
client->chans_desired = 0;
client->chan_count = 0;
client->event_callback = event_callback;
+ client->type = type;

mutex_lock(&dma_list_mutex);
list_add_tail(&client->global_node, &dma_client_list);
@@ -402,11 +408,11 @@ subsys_initcall(dma_bus_init);
EXPORT_SYMBOL(dma_async_client_register);
EXPORT_SYMBOL(dma_async_client_unregister);
EXPORT_SYMBOL(dma_async_client_chan_request);
-EXPORT_SYMBOL(dma_async_memcpy_buf_to_buf);
-EXPORT_SYMBOL(dma_async_memcpy_buf_to_pg);
-EXPORT_SYMBOL(dma_async_memcpy_pg_to_pg);
-EXPORT_SYMBOL(dma_async_memcpy_complete);
-EXPORT_SYMBOL(dma_async_memcpy_issue_pending);
+EXPORT_SYMBOL(dma_async_buf_to_buf);
+EXPORT_SYMBOL(dma_async_buf_to_pg);
+EXPORT_SYMBOL(dma_async_pg_to_pg);
+EXPORT_SYMBOL(dma_async_complete);
+EXPORT_SYMBOL(dma_async_issue_pending);
EXPORT_SYMBOL(dma_async_device_register);
EXPORT_SYMBOL(dma_async_device_unregister);
EXPORT_SYMBOL(dma_chan_cleanup);
Index: linux-2.6/drivers/dma/ioatdma.c
===================================================================
--- linux-2.6.orig/drivers/dma/ioatdma.c
+++ linux-2.6/drivers/dma/ioatdma.c
@@ -681,6 +682,14 @@ out:
return err;
}

+struct dma_function ioat_memcpy_functions = {
+ .buf_to_buf = ioat_dma_memcpy_buf_to_buf,
+ .buf_to_pg = ioat_dma_memcpy_buf_to_pg,
+ .pg_to_pg = ioat_dma_memcpy_pg_to_pg,
+ .complete = ioat_dma_is_complete,
+ .issue_pending = ioat_dma_memcpy_issue_pending,
+};
+
static int __devinit ioat_probe(struct pci_dev *pdev,
const struct pci_device_id *ent)
{
@@ -756,11 +765,8 @@ static int __devinit ioat_probe(struct p

device->common.device_alloc_chan_resources = ioat_dma_alloc_chan_resources;
device->common.device_free_chan_resources = ioat_dma_free_chan_resources;
- device->common.device_memcpy_buf_to_buf = ioat_dma_memcpy_buf_to_buf;
- device->common.device_memcpy_buf_to_pg = ioat_dma_memcpy_buf_to_pg;
- device->common.device_memcpy_pg_to_pg = ioat_dma_memcpy_pg_to_pg;
- device->common.device_memcpy_complete = ioat_dma_is_complete;
- device->common.device_memcpy_issue_pending = ioat_dma_memcpy_issue_pending;
+ device->common.funcs[DMAFUNC_MEMCPY] = &ioat_memcpy_functions;
+
printk(KERN_INFO "Intel(R) I/OAT DMA Engine found, %d channels\n",
device->common.chancnt);

Index: linux-2.6/include/linux/dmaengine.h
===================================================================
--- linux-2.6.orig/include/linux/dmaengine.h
+++ linux-2.6/include/linux/dmaengine.h
@@ -67,14 +67,14 @@ enum dma_status {
/**
* struct dma_chan_percpu - the per-CPU part of struct dma_chan
* @refcount: local_t used for open-coded "bigref" counting
- * @memcpy_count: transaction counter
+ * @count: transaction counter
* @bytes_transferred: byte counter
*/

struct dma_chan_percpu {
local_t refcount;
/* stats */
- unsigned long memcpy_count;
+ unsigned long count;
unsigned long bytes_transferred;
};

@@ -138,6 +138,15 @@ static inline void dma_chan_put(struct d
typedef void (*dma_event_callback) (struct dma_client *client,
struct dma_chan *chan, enum dma_event event);

+/*
+ * dma_function_type - one entry for every possible function type provided
+ */
+enum dma_function_type {
+ DMAFUNC_MEMCPY = 0,
+ DMAFUNC_XOR,
+ DMAFUNC_MAX
+};
+
/**
* struct dma_client - info on the entity making use of DMA services
* @event_callback: func ptr to call when something happens
@@ -152,11 +161,35 @@ struct dma_client {
unsigned int chan_count;
unsigned int chans_desired;

+ enum dma_function_type type;
+
spinlock_t lock;
struct list_head channels;
struct list_head global_node;
};

+/* struct dma_function
+ * @buf_to_pg: buf pointer to struct page
+ * @pg_to_pg: struct page/offset to struct page/offset
+ * @complete: poll the status of a DMA transaction
+ * @issue_pending: push appended descriptors to hardware
+ */
+struct dma_function {
+ dma_cookie_t (*buf_to_buf)(struct dma_chan *chan,
+ void *dest, void *src, size_t len);
+ dma_cookie_t (*buf_to_pg)(struct dma_chan *chan,
+ struct page *page, unsigned int offset,
+ void *kdata, size_t len);
+ dma_cookie_t (*pg_to_pg)(struct dma_chan *chan,
+ struct page *dest_pg, unsigned int dest_off,
+ struct page *src_pg, unsigned int src_off,
+ size_t len);
+ enum dma_status (*complete)(struct dma_chan *chan,
+ dma_cookie_t cookie, dma_cookie_t *last,
+ dma_cookie_t *used);
+ void (*issue_pending)(struct dma_chan *chan);
+};
+
/**
* struct dma_device - info on the entity supplying DMA services
* @chancnt: how many DMA channels are supported
@@ -168,14 +201,8 @@ struct dma_client {
* @device_alloc_chan_resources: allocate resources and return the
* number of allocated descriptors
* @device_free_chan_resources: release DMA channel's resources
- * @device_memcpy_buf_to_buf: memcpy buf pointer to buf pointer
- * @device_memcpy_buf_to_pg: memcpy buf pointer to struct page
- * @device_memcpy_pg_to_pg: memcpy struct page/offset to struct page/offset
- * @device_memcpy_complete: poll the status of an IOAT DMA transaction
- * @device_memcpy_issue_pending: push appended descriptors to hardware
*/
struct dma_device {
-
unsigned int chancnt;
struct list_head channels;
struct list_head global_node;
@@ -185,31 +212,24 @@ struct dma_device {

int dev_id;

+ struct dma_function *funcs[DMAFUNC_MAX];
+
int (*device_alloc_chan_resources)(struct dma_chan *chan);
void (*device_free_chan_resources)(struct dma_chan *chan);
- dma_cookie_t (*device_memcpy_buf_to_buf)(struct dma_chan *chan,
- void *dest, void *src, size_t len);
- dma_cookie_t (*device_memcpy_buf_to_pg)(struct dma_chan *chan,
- struct page *page, unsigned int offset, void *kdata,
- size_t len);
- dma_cookie_t (*device_memcpy_pg_to_pg)(struct dma_chan *chan,
- struct page *dest_pg, unsigned int dest_off,
- struct page *src_pg, unsigned int src_off, size_t len);
- enum dma_status (*device_memcpy_complete)(struct dma_chan *chan,
- dma_cookie_t cookie, dma_cookie_t *last,
- dma_cookie_t *used);
- void (*device_memcpy_issue_pending)(struct dma_chan *chan);
};

+#define CHAN2FUNCS(chan) (chan->device->funcs[chan->client->type])
+
/* --- public DMA engine API --- */

-struct dma_client *dma_async_client_register(dma_event_callback event_callback);
+struct dma_client *dma_async_client_register(enum dma_function_type type,
+ dma_event_callback event_callback);
void dma_async_client_unregister(struct dma_client *client);
void dma_async_client_chan_request(struct dma_client *client,
unsigned int number);

/**
- * dma_async_memcpy_buf_to_buf - offloaded copy between virtual addresses
+ * dma_async_buf_to_buf - offloaded copy between virtual addresses
* @chan: DMA channel to offload copy to
* @dest: destination address (virtual)
* @src: source address (virtual)
@@ -220,19 +240,19 @@ void dma_async_client_chan_request(struc
* Both @dest and @src must stay memory resident (kernel memory or locked
* user space pages).
*/
-static inline dma_cookie_t dma_async_memcpy_buf_to_buf(struct dma_chan *chan,
- void *dest, void *src, size_t len)
+static inline dma_cookie_t dma_async_buf_to_buf(struct dma_chan *chan,
+ void *dest, void *src, size_t len)
{
int cpu = get_cpu();
per_cpu_ptr(chan->local, cpu)->bytes_transferred += len;
- per_cpu_ptr(chan->local, cpu)->memcpy_count++;
+ per_cpu_ptr(chan->local, cpu)->count++;
put_cpu();

- return chan->device->device_memcpy_buf_to_buf(chan, dest, src, len);
+ return CHAN2FUNCS(chan)->buf_to_buf(chan, dest, src, len);
}

/**
- * dma_async_memcpy_buf_to_pg - offloaded copy from address to page
+ * dma_async_buf_to_pg - offloaded copy from address to page
* @chan: DMA channel to offload copy to
* @page: destination page
* @offset: offset in page to copy to
@@ -244,20 +264,21 @@ static inline dma_cookie_t dma_async_mem
* Both @page/@offset and @kdata must stay memory resident (kernel memory or
* locked user space pages)
*/
-static inline dma_cookie_t dma_async_memcpy_buf_to_pg(struct dma_chan *chan,
- struct page *page, unsigned int offset, void *kdata, size_t len)
+static inline dma_cookie_t dma_async_buf_to_pg(struct dma_chan *chan,
+ struct page *page, unsigned int offset,
+ void *kdata, size_t len)
{
int cpu = get_cpu();
per_cpu_ptr(chan->local, cpu)->bytes_transferred += len;
- per_cpu_ptr(chan->local, cpu)->memcpy_count++;
+ per_cpu_ptr(chan->local, cpu)->count++;
put_cpu();

- return chan->device->device_memcpy_buf_to_pg(chan, page, offset,
- kdata, len);
+ return CHAN2FUNCS(chan)->buf_to_pg(chan, page, offset,
+ kdata, len);
}

/**
- * dma_async_memcpy_pg_to_pg - offloaded copy from page to page
+ * dma_async_pg_to_pg - offloaded copy from page to page
* @chan: DMA channel to offload copy to
* @dest_pg: destination page
* @dest_off: offset in page to copy to
@@ -270,33 +291,33 @@ static inline dma_cookie_t dma_async_mem
* Both @dest_page/@dest_off and @src_page/@src_off must stay memory resident
* (kernel memory or locked user space pages).
*/
-static inline dma_cookie_t dma_async_memcpy_pg_to_pg(struct dma_chan *chan,
- struct page *dest_pg, unsigned int dest_off, struct page *src_pg,
- unsigned int src_off, size_t len)
+static inline dma_cookie_t dma_async_pg_to_pg( struct dma_chan *chan,
+ struct page *dest_pg, unsigned int dest_off,
+ struct page *src_pg, unsigned int src_off, size_t len)
{
int cpu = get_cpu();
per_cpu_ptr(chan->local, cpu)->bytes_transferred += len;
- per_cpu_ptr(chan->local, cpu)->memcpy_count++;
+ per_cpu_ptr(chan->local, cpu)->count++;
put_cpu();

- return chan->device->device_memcpy_pg_to_pg(chan, dest_pg, dest_off,
- src_pg, src_off, len);
+ return CHAN2FUNCS(chan)->pg_to_pg(chan, dest_pg, dest_off,
+ src_pg, src_off, len);
}

/**
- * dma_async_memcpy_issue_pending - flush pending copies to HW
+ * dma_async_issue_pending - flush pending copies to HW
* @chan: target DMA channel
*
* This allows drivers to push copies to HW in batches,
* reducing MMIO writes where possible.
*/
-static inline void dma_async_memcpy_issue_pending(struct dma_chan *chan)
+static inline void dma_async_issue_pending(struct dma_chan *chan)
{
- return chan->device->device_memcpy_issue_pending(chan);
+ return CHAN2FUNCS(chan)->issue_pending(chan);
}

/**
- * dma_async_memcpy_complete - poll for transaction completion
+ * dma_async_complete - poll for transaction completion
* @chan: DMA channel
* @cookie: transaction identifier to check status of
* @last: returns last completed cookie, can be NULL
@@ -306,10 +327,11 @@ static inline void dma_async_memcpy_issu
* internal state and can be used with dma_async_is_complete() to check
* the status of multiple cookies without re-checking hardware state.
*/
-static inline enum dma_status dma_async_memcpy_complete(struct dma_chan *chan,
- dma_cookie_t cookie, dma_cookie_t *last, dma_cookie_t *used)
+static inline enum dma_status dma_async_complete(struct dma_chan *chan,
+ dma_cookie_t cookie, dma_cookie_t *last,
+ dma_cookie_t *used)
{
- return chan->device->device_memcpy_complete(chan, cookie, last, used);
+ return CHAN2FUNCS(chan)->complete(chan, cookie, last, used);
}

/**
@@ -318,7 +340,7 @@ static inline enum dma_status dma_async_
* @last_complete: last know completed transaction
* @last_used: last cookie value handed out
*
- * dma_async_is_complete() is used in dma_async_memcpy_complete()
+ * dma_async_is_complete() is used in dma_async_complete()
* the test logic is seperated for lightweight testing of multiple cookies
*/
static inline enum dma_status dma_async_is_complete(dma_cookie_t cookie,
Index: linux-2.6/net/core/dev.c
===================================================================
--- linux-2.6.orig/net/core/dev.c
+++ linux-2.6/net/core/dev.c
@@ -1945,7 +1945,7 @@ out:
struct dma_chan *chan;
rcu_read_lock();
list_for_each_entry_rcu(chan, &net_dma_client->channels, client_node)
- dma_async_memcpy_issue_pending(chan);
+ dma_async_issue_pending(chan);
rcu_read_unlock();
}
#endif
@@ -3467,7 +3467,7 @@ static void netdev_dma_event(struct dma_
static int __init netdev_dma_register(void)
{
spin_lock_init(&net_dma_event_lock);
- net_dma_client = dma_async_client_register(netdev_dma_event);
+ net_dma_client = dma_async_client_register(DMAFUNC_MEMCPY, netdev_dma_event);
if (net_dma_client == NULL)
return -ENOMEM;

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo.DeleteThis@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Back to top
Olof Johansson
External


Since: Aug 16, 2005
Posts: 34



PostPosted: Fri Sep 15, 2006 10:10 pm    Post subject: [PATCH] [v2] dmaengine: clean up and abstract function types (was Re: [PATCH 08/19] dmaengine: enable multiple clients and operations) [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

[Bad day, forgot a quilt refresh.]




Clean up dmaengine a bit. Make the client registration specify which
channel functions ("type") the client will use. Also, make devices
register which functions they will provide.

Also exorcise most of the memcpy-specific references from the generic
dma engine code. There's still some left in the iov stuff.


Signed-off-by: Olof Johansson <olof.TakeThisOut@lixom.net>

Index: linux-2.6/drivers/dma/dmaengine.c
===================================================================
--- linux-2.6.orig/drivers/dma/dmaengine.c
+++ linux-2.6/drivers/dma/dmaengine.c
@@ -73,14 +73,14 @@ static LIST_HEAD(dma_client_list);

/* --- sysfs implementation --- */

-static ssize_t show_memcpy_count(struct class_device *cd, char *buf)
+static ssize_t show_count(struct class_device *cd, char *buf)
{
struct dma_chan *chan = container_of(cd, struct dma_chan, class_dev);
unsigned long count = 0;
int i;

for_each_possible_cpu(i)
- count += per_cpu_ptr(chan->local, i)->memcpy_count;
+ count += per_cpu_ptr(chan->local, i)->count;

return sprintf(buf, "%lu\n", count);
}
@@ -105,7 +105,7 @@ static ssize_t show_in_use(struct class_
}

static struct class_device_attribute dma_class_attrs[] = {
- __ATTR(memcpy_count, S_IRUGO, show_memcpy_count, NULL),
+ __ATTR(count, S_IRUGO, show_count, NULL),
__ATTR(bytes_transferred, S_IRUGO, show_bytes_transferred, NULL),
__ATTR(in_use, S_IRUGO, show_in_use, NULL),
__ATTR_NULL
@@ -142,6 +142,10 @@ static struct dma_chan *dma_client_chan_

/* Find a channel, any DMA engine will do */
list_for_each_entry(device, &dma_device_list, global_node) {
+ /* Skip devices that don't provide the right function */
+ if (!device->funcs[client->type])
+ continue;
+
list_for_each_entry(chan, &device->channels, device_node) {
if (chan->client)
continue;
@@ -241,7 +245,8 @@ static void dma_chans_rebalance(void)
* dma_async_client_register - allocate and register a &dma_client
* @event_callback: callback for notification of channel addition/removal
*/
-struct dma_client *dma_async_client_register(dma_event_callback event_callback)
+struct dma_client *dma_async_client_register(enum dma_function_type type,
+ dma_event_callback event_callback)
{
struct dma_client *client;

@@ -254,6 +259,7 @@ struct dma_client *dma_async_client_regi
client->chans_desired = 0;
client->chan_count = 0;
client->event_callback = event_callback;
+ client->type = type;

mutex_lock(&dma_list_mutex);
list_add_tail(&client->global_node, &dma_client_list);
@@ -402,11 +408,11 @@ subsys_initcall(dma_bus_init);
EXPORT_SYMBOL(dma_async_client_register);
EXPORT_SYMBOL(dma_async_client_unregister);
EXPORT_SYMBOL(dma_async_client_chan_request);
-EXPORT_SYMBOL(dma_async_memcpy_buf_to_buf);
-EXPORT_SYMBOL(dma_async_memcpy_buf_to_pg);
-EXPORT_SYMBOL(dma_async_memcpy_pg_to_pg);
-EXPORT_SYMBOL(dma_async_memcpy_complete);
-EXPORT_SYMBOL(dma_async_memcpy_issue_pending);
+EXPORT_SYMBOL(dma_async_buf_to_buf);
+EXPORT_SYMBOL(dma_async_buf_to_pg);
+EXPORT_SYMBOL(dma_async_pg_to_pg);
+EXPORT_SYMBOL(dma_async_complete);
+EXPORT_SYMBOL(dma_async_issue_pending);
EXPORT_SYMBOL(dma_async_device_register);
EXPORT_SYMBOL(dma_async_device_unregister);
EXPORT_SYMBOL(dma_chan_cleanup);
Index: linux-2.6/drivers/dma/ioatdma.c
===================================================================
--- linux-2.6.orig/drivers/dma/ioatdma.c
+++ linux-2.6/drivers/dma/ioatdma.c
@@ -40,6 +40,7 @@
#define to_ioat_device(dev) container_of(dev, struct ioat_device, common)
#define to_ioat_desc(lh) container_of(lh, struct ioat_desc_sw, node)

+
/* internal functions */
static int __devinit ioat_probe(struct pci_dev *pdev, const struct pci_device_id *ent);
static void __devexit ioat_remove(struct pci_dev *pdev);
@@ -681,6 +682,14 @@ out:
return err;
}

+struct dma_function ioat_memcpy_functions = {
+ .buf_to_buf = ioat_dma_memcpy_buf_to_buf,
+ .buf_to_pg = ioat_dma_memcpy_buf_to_pg,
+ .pg_to_pg = ioat_dma_memcpy_pg_to_pg,
+ .complete = ioat_dma_is_complete,
+ .issue_pending = ioat_dma_memcpy_issue_pending,
+};
+
static int __devinit ioat_probe(struct pci_dev *pdev,
const struct pci_device_id *ent)
{
@@ -756,11 +765,8 @@ static int __devinit ioat_probe(struct p

device->common.device_alloc_chan_resources = ioat_dma_alloc_chan_resources;
device->common.device_free_chan_resources = ioat_dma_free_chan_resources;
- device->common.device_memcpy_buf_to_buf = ioat_dma_memcpy_buf_to_buf;
- device->common.device_memcpy_buf_to_pg = ioat_dma_memcpy_buf_to_pg;
- device->common.device_memcpy_pg_to_pg = ioat_dma_memcpy_pg_to_pg;
- device->common.device_memcpy_complete = ioat_dma_is_complete;
- device->common.device_memcpy_issue_pending = ioat_dma_memcpy_issue_pending;
+ device->common.funcs[DMAFUNC_MEMCPY] = &ioat_memcpy_functions;
+
printk(KERN_INFO "Intel(R) I/OAT DMA Engine found, %d channels\n",
device->common.chancnt);

Index: linux-2.6/include/linux/dmaengine.h
===================================================================
--- linux-2.6.orig/include/linux/dmaengine.h
+++ linux-2.6/include/linux/dmaengine.h
@@ -67,14 +67,14 @@ enum dma_status {
/**
* struct dma_chan_percpu - the per-CPU part of struct dma_chan
* @refcount: local_t used for open-coded "bigref" counting
- * @memcpy_count: transaction counter
+ * @count: transaction counter
* @bytes_transferred: byte counter
*/

struct dma_chan_percpu {
local_t refcount;
/* stats */
- unsigned long memcpy_count;
+ unsigned long count;
unsigned long bytes_transferred;
};

@@ -138,6 +138,15 @@ static inline void dma_chan_put(struct d
typedef void (*dma_event_callback) (struct dma_client *client,
struct dma_chan *chan, enum dma_event event);

+/*
+ * dma_function_type - one entry for every possible function type provided
+ */
+enum dma_function_type {
+ DMAFUNC_MEMCPY = 0,
+ DMAFUNC_XOR,
+ DMAFUNC_MAX
+};
+
/**
* struct dma_client - info on the entity making use of DMA services
* @event_callback: func ptr to call when something happens
@@ -152,11 +161,35 @@ struct dma_client {
unsigned int chan_count;
unsigned int chans_desired;

+ enum dma_function_type type;
+
spinlock_t lock;
struct list_head channels;
struct list_head global_node;
};

+/* struct dma_function
+ * @buf_to_pg: buf pointer to struct page
+ * @pg_to_pg: struct page/offset to struct page/offset
+ * @complete: poll the status of a DMA transaction
+ * @issue_pending: push appended descriptors to hardware
+ */
+struct dma_function {
+ dma_cookie_t (*buf_to_buf)(struct dma_chan *chan,
+ void *dest, void *src, size_t len);
+ dma_cookie_t (*buf_to_pg)(struct dma_chan *chan,
+ struct page *page, unsigned int offset,
+ void *kdata, size_t len);
+ dma_cookie_t (*pg_to_pg)(struct dma_chan *chan,
+ struct page *dest_pg, unsigned int dest_off,
+ struct page *src_pg, unsigned int src_off,
+ size_t len);
+ enum dma_status (*complete)(struct dma_chan *chan,
+ dma_cookie_t cookie, dma_cookie_t *last,
+ dma_cookie_t *used);
+ void (*issue_pending)(struct dma_chan *chan);
+};
+
/**
* struct dma_device - info on the entity supplying DMA services
* @chancnt: how many DMA channels are supported
@@ -168,14 +201,8 @@ struct dma_client {
* @device_alloc_chan_resources: allocate resources and return the
* number of allocated descriptors
* @device_free_chan_resources: release DMA channel's resources
- * @device_memcpy_buf_to_buf: memcpy buf pointer to buf pointer
- * @device_memcpy_buf_to_pg: memcpy buf pointer to struct page
- * @device_memcpy_pg_to_pg: memcpy struct page/offset to struct page/offset
- * @device_memcpy_complete: poll the status of an IOAT DMA transaction
- * @device_memcpy_issue_pending: push appended descriptors to hardware
*/
struct dma_device {
-
unsigned int chancnt;
struct list_head channels;
struct list_head global_node;
@@ -185,31 +212,24 @@ struct dma_device {

int dev_id;

+ struct dma_function *funcs[DMAFUNC_MAX];
+
int (*device_alloc_chan_resources)(struct dma_chan *chan);
void (*device_free_chan_resources)(struct dma_chan *chan);
- dma_cookie_t (*device_memcpy_buf_to_buf)(struct dma_chan *chan,
- void *dest, void *src, size_t len);
- dma_cookie_t (*device_memcpy_buf_to_pg)(struct dma_chan *chan,
- struct page *page, unsigned int offset, void *kdata,
- size_t len);
- dma_cookie_t (*device_memcpy_pg_to_pg)(struct dma_chan *chan,
- struct page *dest_pg, unsigned int dest_off,
- struct page *src_pg, unsigned int src_off, size_t len);
- enum dma_status (*device_memcpy_complete)(struct dma_chan *chan,
- dma_cookie_t cookie, dma_cookie_t *last,
- dma_cookie_t *used);
- void (*device_memcpy_issue_pending)(struct dma_chan *chan);
};

+#define CHAN2FUNCS(chan) (chan->device->funcs[chan->client->type])
+
/* --- public DMA engine API --- */

-struct dma_client *dma_async_client_register(dma_event_callback event_callback);
+struct dma_client *dma_async_client_register(enum dma_function_type type,
+ dma_event_callback event_callback);
void dma_async_client_unregister(struct dma_client *client);
void dma_async_client_chan_request(struct dma_client *client,
unsigned int number);

/**
- * dma_async_memcpy_buf_to_buf - offloaded copy between virtual addresses
+ * dma_async_buf_to_buf - offloaded copy between virtual addresses
* @chan: DMA channel to offload copy to
* @dest: destination address (virtual)
* @src: source address (virtual)
@@ -220,19 +240,19 @@ void dma_async_client_chan_request(struc
* Both @dest and @src must stay memory resident (kernel memory or locked
* user space pages).
*/
-static inline dma_cookie_t dma_async_memcpy_buf_to_buf(struct dma_chan *chan,
- void *dest, void *src, size_t len)
+static inline dma_cookie_t dma_async_buf_to_buf(struct dma_chan *chan,
+ void *dest, void *src, size_t len)
{
int cpu = get_cpu();
per_cpu_ptr(chan->local, cpu)->bytes_transferred += len;
- per_cpu_ptr(chan->local, cpu)->memcpy_count++;
+ per_cpu_ptr(chan->local, cpu)->count++;
put_cpu();

- return chan->device->device_memcpy_buf_to_buf(chan, dest, src, len);
+ return CHAN2FUNCS(chan)->buf_to_buf(chan, dest, src, len);
}

/**
- * dma_async_memcpy_buf_to_pg - offloaded copy from address to page
+ * dma_async_buf_to_pg - offloaded copy from address to page
* @chan: DMA channel to offload copy to
* @page: destination page
* @offset: offset in page to copy to
@@ -244,20 +264,21 @@ static inline dma_cookie_t dma_async_mem
* Both @page/@offset and @kdata must stay memory resident (kernel memory or
* locked user space pages)
*/
-static inline dma_cookie_t dma_async_memcpy_buf_to_pg(struct dma_chan *chan,
- struct page *page, unsigned int offset, void *kdata, size_t len)
+static inline dma_cookie_t dma_async_buf_to_pg(struct dma_chan *chan,
+ struct page *page, unsigned int offset,
+ void *kdata, size_t len)
{
int cpu = get_cpu();
per_cpu_ptr(chan->local, cpu)->bytes_transferred += len;
- per_cpu_ptr(chan->local, cpu)->memcpy_count++;
+ per_cpu_ptr(chan->local, cpu)->count++;
put_cpu();

- return chan->device->device_memcpy_buf_to_pg(chan, page, offset,
- kdata, len);
+ return CHAN2FUNCS(chan)->buf_to_pg(chan, page, offset,
+ kdata, len);
}

/**
- * dma_async_memcpy_pg_to_pg - offloaded copy from page to page
+ * dma_async_pg_to_pg - offloaded copy from page to page
* @chan: DMA channel to offload copy to
* @dest_pg: destination page
* @dest_off: offset in page to copy to
@@ -270,33 +291,33 @@ static inline dma_cookie_t dma_async_mem
* Both @dest_page/@dest_off and @src_page/@src_off must stay memory resident
* (kernel memory or locked user space pages).
*/
-static inline dma_cookie_t dma_async_memcpy_pg_to_pg(struct dma_chan *chan,
- struct page *dest_pg, unsigned int dest_off, struct page *src_pg,
- unsigned int src_off, size_t len)
+static inline dma_cookie_t dma_async_pg_to_pg( struct dma_chan *chan,
+ struct page *dest_pg, unsigned int dest_off,
+ struct page *src_pg, unsigned int src_off, size_t len)
{
int cpu = get_cpu();
per_cpu_ptr(chan->local, cpu)->bytes_transferred += len;
- per_cpu_ptr(chan->local, cpu)->memcpy_count++;
+ per_cpu_ptr(chan->local, cpu)->count++;
put_cpu();

- return chan->device->device_memcpy_pg_to_pg(chan, dest_pg, dest_off,
- src_pg, src_off, len);
+ return CHAN2FUNCS(chan)->pg_to_pg(chan, dest_pg, dest_off,
+ src_pg, src_off, len);
}

/**
- * dma_async_memcpy_issue_pending - flush pending copies to HW
+ * dma_async_issue_pending - flush pending copies to HW
* @chan: target DMA channel
*
* This allows drivers to push copies to HW in batches,
* reducing MMIO writes where possible.
*/
-static inline void dma_async_memcpy_issue_pending(struct dma_chan *chan)
+static inline void dma_async_issue_pending(struct dma_chan *chan)
{
- return chan->device->device_memcpy_issue_pending(chan);
+ return CHAN2FUNCS(chan)->issue_pending(chan);
}

/**
- * dma_async_memcpy_complete - poll for transaction completion
+ * dma_async_complete - poll for transaction completion
* @chan: DMA channel
* @cookie: transaction identifier to check status of
* @last: returns last completed cookie, can be NULL
@@ -306,10 +327,11 @@ static inline void dma_async_memcpy_issu
* internal state and can be used with dma_async_is_complete() to check
* the status of multiple cookies without re-checking hardware state.
*/
-static inline enum dma_status dma_async_memcpy_complete(struct dma_chan *chan,
- dma_cookie_t cookie, dma_cookie_t *last, dma_cookie_t *used)
+static inline enum dma_status dma_async_complete(struct dma_chan *chan,
+ dma_cookie_t cookie, dma_cookie_t *last,
+ dma_cookie_t *used)
{
- return chan->device->device_memcpy_complete(chan, cookie, last, used);
+ return CHAN2FUNCS(chan)->complete(chan, cookie, last, used);
}

/**
@@ -318,7 +340,7 @@ static inline enum dma_status dma_async_
* @last_complete: last know completed transaction
* @last_used: last cookie value handed out
*
- * dma_async_is_complete() is used in dma_async_memcpy_complete()
+ * dma_async_is_complete() is used in dma_async_complete()
* the test logic is seperated for lightweight testing of multiple cookies
*/
static inline enum dma_status dma_async_is_complete(dma_cookie_t cookie,
Index: linux-2.6/net/core/dev.c
===================================================================
--- linux-2.6.orig/net/core/dev.c
+++ linux-2.6/net/core/dev.c
@@ -1945,7 +1945,7 @@ out:
struct dma_chan *chan;
rcu_read_lock();
list_for_each_entry_rcu(chan, &net_dma_client->channels, client_node)
- dma_async_memcpy_issue_pending(chan);
+ dma_async_issue_pending(chan);
rcu_read_unlock();
}
#endif
@@ -3467,7 +3467,7 @@ static void netdev_dma_event(struct dma_
static int __init netdev_dma_register(void)
{
spin_lock_init(&net_dma_event_lock);
- net_dma_client = dma_async_client_register(netdev_dma_event);
+ net_dma_client = dma_async_client_register(DMAFUNC_MEMCPY, netdev_dma_event);
if (net_dma_client == NULL)
return -ENOMEM;

Index: linux-2.6/drivers/dma/iovlock.c
===================================================================
--- linux-2.6.orig/drivers/dma/iovlock.c
+++ linux-2.6/drivers/dma/iovlock.c
@@ -151,7 +151,7 @@ static dma_cookie_t dma_memcpy_to_kernel
while (len > 0) {
if (iov->iov_len) {
int copy = min_t(unsigned int, iov->iov_len, len);
- dma_cookie = dma_async_memcpy_buf_to_buf(
+ dma_cookie = dma_async_buf_to_buf(
chan,
iov->iov_base,
kdata,
@@ -210,7 +210,7 @@ dma_cookie_t dma_memcpy_to_iovec(struct
copy = min_t(int, PAGE_SIZE - iov_byte_offset, len);
copy = min_t(int, copy, iov[iovec_idx].iov_len);

- dma_cookie = dma_async_memcpy_buf_to_pg(chan,
+ dma_cookie = dma_async_buf_to_pg(chan,
page_list->pages[page_idx],
iov_byte_offset,
kdata,
@@ -274,7 +274,7 @@ dma_cookie_t dma_memcpy_pg_to_iovec(stru
copy = min_t(int, PAGE_SIZE - iov_byte_offset, len);
copy = min_t(int, copy, iov[iovec_idx].iov_len);

- dma_cookie = dma_async_memcpy_pg_to_pg(chan,
+ dma_cookie = dma_async_pg_to_pg(chan,
page_list->pages[page_idx],
iov_byte_offset,
page,
Index: linux-2.6/net/ipv4/tcp.c
===================================================================
--- linux-2.6.orig/net/ipv4/tcp.c
+++ linux-2.6/net/ipv4/tcp.c
@@ -1431,11 +1431,11 @@ skip_copy:
struct sk_buff *skb;
dma_cookie_t done, used;

- dma_async_memcpy_issue_pending(tp->ucopy.dma_chan);
+ dma_async_issue_pending(tp->ucopy.dma_chan);

- while (dma_async_memcpy_complete(tp->ucopy.dma_chan,
- tp->ucopy.dma_cookie, &done,
- &used) == DMA_IN_PROGRESS) {
+ while (dma_async_complete(tp->ucopy.dma_chan,
+ tp->ucopy.dma_cookie, &done,
+ &used) == DMA_IN_PROGRESS) {
/* do partial cleanup of sk_async_wait_queue */
while ((skb = skb_peek(&sk->sk_async_wait_queue)) &&
(dma_async_is_complete(skb->dma_cookie, done,
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo.TakeThisOut@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Back to top
Dan Williams
External


Since: Aug 26, 2006
Posts: 11



PostPosted: Tue Sep 19, 2006 1:00 am    Post subject: Re: [PATCH] dmaengine: clean up and abstract function types (was Re: [PATCH 08/19] dmaengine: enable multiple clients and operations) [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

On 9/15/06, Olof Johansson <olof DeleteThis @lixom.net> wrote:
> On Fri, 15 Sep 2006 11:38:17 -0500 Olof Johansson <olof DeleteThis @lixom.net> wrote:
>
> > On Mon, 11 Sep 2006 19:44:16 -0400 Jeff Garzik <jeff DeleteThis @garzik.org> wrote:
>
> > > Are we really going to add a set of hooks for each DMA engine whizbang
> > > feature?
> > >
> > > That will get ugly when DMA engines support memcpy, xor, crc32, sha1,
> > > aes, and a dozen other transforms.
> >
> >
> > Yes, it will be unmaintainable. We need some sort of multiplexing with
> > per-function registrations.
> >
> > Here's a first cut at it, just very quick. It could be improved further
> > but it shows that we could exorcise most of the hardcoded things pretty
> > easily.
>
> Ok, that was obviously a naive and not so nice first attempt, but I
> figured it was worth it to show how it can be done.
>
> This is a little more proper: Specify at client registration time what
> the function the client will use is, and make the channel use it. This
> way most of the error checking per call can be removed too.
>
> Chris/Dan: Please consider picking this up as a base for the added
> functionality and cleanups.
>
Thanks for this Olof it has sparked some ideas about how to redo
support for multiple operations.

>
>
>
>
> Clean up dmaengine a bit. Make the client registration specify which
> channel functions ("type") the client will use. Also, make devices
> register which functions they will provide.
>
> Also exorcise most of the memcpy-specific references from the generic
> dma engine code. There's still some left in the iov stuff.
I think we should keep the operation type in the function name but
drop all the [buf|pg|dma]_to_[buf|pg|dma] permutations. The buffer
type can be handled generically across all operation types. Something
like the following for a pg_to_buf memcpy.

struct dma_async_op_memcpy *op;
struct page *pg;
void *buf;
size_t len;

dma_async_op_init_src_pg(op, pg);
dma_async_op_init_dest_buf(op, buf);
dma_async_memcpy(chan, op, len);

-Dan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo DeleteThis @vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Back to top
Olof Johansson
External


Since: Aug 16, 2005
Posts: 34



PostPosted: Tue Sep 19, 2006 3:10 am    Post subject: Re: [PATCH] dmaengine: clean up and abstract function types (was Re: [PATCH 08/19] dmaengine: enable multiple clients and operations) [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

On Mon, 18 Sep 2006 15:56:37 -0700 "Dan Williams" <dan.j.williams RemoveThis @gmail.com> wrote:

> On 9/15/06, Olof Johansson <olof RemoveThis @lixom.net> wrote:
> > On Fri, 15 Sep 2006 11:38:17 -0500 Olof Johansson <olof RemoveThis @lixom.net> wrote:

> > Chris/Dan: Please consider picking this up as a base for the added
> > functionality and cleanups.
> >
> Thanks for this Olof it has sparked some ideas about how to redo
> support for multiple operations.

Good. Smile

> I think we should keep the operation type in the function name but
> drop all the [buf|pg|dma]_to_[buf|pg|dma] permutations. The buffer
> type can be handled generically across all operation types. Something
> like the following for a pg_to_buf memcpy.
>
> struct dma_async_op_memcpy *op;
> struct page *pg;
> void *buf;
> size_t len;
>
> dma_async_op_init_src_pg(op, pg);
> dma_async_op_init_dest_buf(op, buf);
> dma_async_memcpy(chan, op, len);

I'm generally for a more generic interface, especially in the address
permutation cases like above. However, I think it'll be a mistake to
keep the association between the API and the function names and types
so close.

What's the benefit of keeping a memcpy-specific dma_async_memcpy()
instead of a more generic dma_async_commit() (or similar)? We'll know
based on how the client/channel was allocated what kind of function is
requested, won't we?

Same goes for the dma_async_op_memcpy. Make it an union that has a type
field if you need per-operation settings. But as before, we'll know
what kind of op structure gets passed in since we'll know what kind of
operation is to be performed on it.

Finally, yet again the same goes for the op_init settings. I would even
prefer it to not be function-based, instead just direct union/struct
assignments.

struct dma_async_op op;
....

op.src_type = PG; op.src = pg;
op.dest_type = BUF; op.dest = buf;
op.len = len;
dma_async_commit(chan, op);

op might have to be dynamically allocated, since it'll outlive the
scope of this function. But the idea would be the same.


-Olof
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo RemoveThis @vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Back to top
Alan Cox
External


Since: Sep 11, 2004
Posts: 1608



PostPosted: Tue Sep 19, 2006 1:00 pm    Post subject: Re: [PATCH] dmaengine: clean up and abstract function types (was Re: [PATCH 08/19] dmaengine: enable multiple clients and operations) [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

Ar Llu, 2006-09-18 am 20:05 -0500, ysgrifennodd Olof Johansson:
> On Mon, 18 Sep 2006 15:56:37 -0700 "Dan Williams" <dan.j.williams.TakeThisOut@gmail.com> wrote:

> op.src_type = PG; op.src = pg;
> op.dest_type = BUF; op.dest = buf;
> op.len = len;
> dma_async_commit(chan, op);

At OLS Linus suggested it should distinguish between sync and async
events for locking reasons.

if(dma_async_commit(foo) == SYNC_COMPLETE) {
finalise_stuff();
}

else /* will call foo->callback(foo->dev_id) */

because otherwise you have locking complexities - the callback wants to
take locks to guard the object it works on but if it is called
synchronously - eg if hardware is busy and we fall back - it might
deadlock with the caller of dmaa_async_foo() who also needs to hold the
lock.

Alan

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo.TakeThisOut@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Back to top
Olof Johansson
External


Since: Aug 16, 2005
Posts: 34



PostPosted: Tue Sep 19, 2006 6:40 pm    Post subject: Re: [PATCH] dmaengine: clean up and abstract function types (was Re: [PATCH 08/19] dmaengine: enable multiple clients and operations) [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

On Tue, 19 Sep 2006 12:20:09 +0100 Alan Cox <alan.DeleteThis@lxorguk.ukuu.org.uk> wrote:

> Ar Llu, 2006-09-18 am 20:05 -0500, ysgrifennodd Olof Johansson:
> > On Mon, 18 Sep 2006 15:56:37 -0700 "Dan Williams" <dan.j.williams.DeleteThis@gmail.com> wrote:
>
> > op.src_type = PG; op.src = pg;
> > op.dest_type = BUF; op.dest = buf;
> > op.len = len;
> > dma_async_commit(chan, op);
>
> At OLS Linus suggested it should distinguish between sync and async
> events for locking reasons.
>
> if(dma_async_commit(foo) == SYNC_COMPLETE) {
> finalise_stuff();
> }
>
> else /* will call foo->callback(foo->dev_id) */
>
> because otherwise you have locking complexities - the callback wants to
> take locks to guard the object it works on but if it is called
> synchronously - eg if hardware is busy and we fall back - it might
> deadlock with the caller of dmaa_async_foo() who also needs to hold the
> lock.

Good point, sounds very reasonable to me.


-Olof
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo.DeleteThis@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Back to top
Neil Brown
External


Since: Mar 13, 2006
Posts: 356



PostPosted: Mon Oct 09, 2006 2:40 am    Post subject: Re: [PATCH 00/19] Hardware Accelerated MD RAID5: Introduction [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

On Monday September 11, dan.j.williams.TakeThisOut@intel.com wrote:
> Neil,
>
> The following patches implement hardware accelerated raid5 for the Intel
> XscaleŽ series of I/O Processors. The MD changes allow stripe
> operations to run outside the spin lock in a work queue. Hardware
> acceleration is achieved by using a dma-engine-aware work queue routine
> instead of the default software only routine.

Hi Dan,
Sorry for the delay in replying.
I've looked through these patches at last (mostly the raid-specific
bits) and while there is clearly a lot of good stuff here, it does
'feel' right - it just seems too complex.

The particular issues that stand out to me are:
- 33 new STRIPE_OP_* flags. I'm sure there doesn't need to be that
many new flags.
- the "raid5 dma client" patch moves far too much internal
knowledge about raid5 into drivers/dma.

Clearly there are some complex issues being dealt with and some
complexity is to be expected, but I feel there must be room for some
serious simplification.

Let me try to describe how I envisage it might work.

As you know, the theory-of-operation of handle_stripe is that it
assesses the state of a stripe deciding what actions to perform and
then performs them. Synchronous actions (e.g. current parity calcs)
are performed 'in-line'. Async actions (reads, writes) and actions
that cannot be performed under a spinlock (->b_end_io) are recorded
as being needed and then are initiated at the end of handle_stripe
outside of the sh->lock.

The proposal is to bring the parity and other bulk-memory operations
out of the spinlock and make them optionally asynchronous.

The set of tasks that might be needed to be performed on a stripe
are:
Clear a target cache block
pre-xor various cache blocks into a target
copy data out of bios into cache blocks. (drain)
post-xor various cache blocks into a target
copy data into bios out of cache blocks (fill)
test if a cache block is all zeros
start a read on a cache block
start a write on a cache block

(There is also a memcpy when expanding raid5. I think I would try to
simply avoid that copy and move pointers around instead).

Some of these steps require sequencing. e.g.
clear, pre-xor, copy, post-xor, write
for a rwm cycle.
We could require handle_stripe to be called again for each step.
i.e. first call just clears the target and flags it as clear. Next
call initiates the pre-xor and flags that as done. Etc. However I
think that would make the non-offloaded case too slow, or at least
too clumsy.

So instead we set flags to say what needs to be done and have a
workqueue system that does it.

(so far this is all quite similar to what you have done.)

So handle_stripe would set various flag and other things (like
identify which block was the 'target' block) and run the following
in a workqueue:

raid5_do_stuff(struct stripe_head *sh)
{
raid5_cont_t *conf = sh->raid_conf;

if (test_bit(CLEAR_TARGET, &sh->ops.pending)) {
struct page = *p->sh->dev[sh->ops.target].page;
rv = async_memset(p, 0, 0, PAGE_SIZE, ops_done, sh);
if (rv != BUSY)
clear_bit(CLEAR_TARGET, &sh->ops.pending);
if (rv != COMPLETE)
goto out;
}

while (test_bit(PRE_XOR, &sh->ops.pending)) {
struct page *plist[XOR_MAX];
int offset[XOR_MAX];
int pos = 0;
int d;

for (d = sh->ops.nextdev;
d < conf->raid_disks && pos < XOR_MAX ;
d++) {
if (sh->ops.nextdev == sh->ops.target)
continue;
if (!test_bit(R5_WantPreXor, &sh->dev[d].flags))
continue;
plist[pos] = sh->dev[d].page;
offset[pos++] = 0;
}
if (pos) {
struct page *p = sh->dev[sh->ops.target].page;
rv = async_xor(p, 0, plist, offset, pos, PAGE_SIZE,
ops_done, sh);
if (rv != BUSY)
sh->ops.nextdev = d;
if (rv != COMPLETE)
goto out;
} else {
clear_bit(PRE_XOR, &sh->ops.pending);
sh->ops.nextdev = 0;
}
}

while (test_bit(COPY_IN, &sh0>ops.pending)) {
...
}
....

if (test_bit(START_IO, &sh->ops.pending)) {
int d;
for (d = 0 ; d < conf->raid_disk ; d++) {
/* all that code from the end of handle_stripe */
}

release_stripe(conf, sh);
return;

out:
if (rv == BUSY) {
/* wait on something and try again ???*/
}
return;
}

ops_done(struct stripe_head *sh)
{
queue_work(....whatever..);
}


Things to note:
- we keep track of where we are up to in sh->ops.
.pending is flags saying what is left to be done
.next_dev is the next device to process for operations that
work on several devices
.next_bio, .next_iov will be needed for copy operations that
cross multiple bios and iovecs.

- Each sh->dev has R5_Want flags reflecting which multi-device
operations are wanted on each device.

- async bulk-memory operations take pages, offsets, and lengths,
and can return COMPLETE (if the operation was performed
synchronously) IN_PROGRESS (if it has been started, or at least
queued) or BUSY if it couldn't even be queued. Exactly what to do
in that case I'm not sure. Probably we need a waitqueue to wait
on.

- The interface between the client and the ADMA hardware is a
collection of async_ functions. async_memcpy, async_xor,
async_memset etc.

I gather there needs to be some understanding
about whether the pages are already appropriately mapped for DMA or
whether a mapping is needed. Maybe an extra flag argument should
be passed.

I imagine that any piece of ADMA hardware would register with the
'async_*' subsystem, and a call to async_X would be routed as
appropriate, or be run in-line.

This approach introduces 8 flags for sh->ops.pending and maybe two or
three new R5_Want* flags. It also keeps the raid5 knowledge firmly in
the raid5 code base. So it seems to keep the complexity under control

Would this approach make sense to you? Is there something really
important I have missed?

(I'll try and be more responsive next time).

Thanks,
NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo.TakeThisOut@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Back to top
Dan Williams
External


Since: Sep 12, 2006
Posts: 26



PostPosted: Tue Oct 10, 2006 8:30 pm    Post subject: Re: [PATCH 00/19] Hardware Accelerated MD RAID5: Introduction [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

On 10/8/06, Neil Brown <neilb RemoveThis @suse.de> wrote:
>
>
> On Monday September 11, dan.j.williams RemoveThis @intel.com wrote:
> > Neil,
> >
> > The following patches implement hardware accelerated raid5 for the Intel
> > Xscale(r) series of I/O Processors. The MD changes allow stripe
> > operations to run outside the spin lock in a work queue. Hardware
> > acceleration is achieved by using a dma-engine-aware work queue routine
> > instead of the default software only routine.
>
> Hi Dan,
> Sorry for the delay in replying.
> I've looked through these patches at last (mostly the raid-specific
> bits) and while there is clearly a lot of good stuff here, it does
> 'feel' right - it just seems too complex.
>
> The particular issues that stand out to me are:
> - 33 new STRIPE_OP_* flags. I'm sure there doesn't need to be that
> many new flags.
> - the "raid5 dma client" patch moves far too much internal
> knowledge about raid5 into drivers/dma.
>
> Clearly there are some complex issues being dealt with and some
> complexity is to be expected, but I feel there must be room for some
> serious simplification.
A valid criticism. There was definitely a push to just get it
functional, so I can now see how the complexity crept into the
implementation. The primary cause was the choice to explicitly handle
channel switching in raid5-dma. However, relieving "client" code from
this responsibility is something I am taking care of in the async api
changes.

>
> Let me try to describe how I envisage it might work.
>
> As you know, the theory-of-operation of handle_stripe is that it
> assesses the state of a stripe deciding what actions to perform and
> then performs them. Synchronous actions (e.g. current parity calcs)
> are performed 'in-line'. Async actions (reads, writes) and actions
> that cannot be performed under a spinlock (->b_end_io) are recorded
> as being needed and then are initiated at the end of handle_stripe
> outside of the sh->lock.
>
> The proposal is to bring the parity and other bulk-memory operations
> out of the spinlock and make them optionally asynchronous.
>
> The set of tasks that might be needed to be performed on a stripe
> are:
> Clear a target cache block
> pre-xor various cache blocks into a target
> copy data out of bios into cache blocks. (drain)
> post-xor various cache blocks into a target
> copy data into bios out of cache blocks (fill)
> test if a cache block is all zeros
> start a read on a cache block
> start a write on a cache block
>
> (There is also a memcpy when expanding raid5. I think I would try to
> simply avoid that copy and move pointers around instead).
>
> Some of these steps require sequencing. e.g.
> clear, pre-xor, copy, post-xor, write
> for a rwm cycle.
> We could require handle_stripe to be called again for each step.
> i.e. first call just clears the target and flags it as clear. Next
> call initiates the pre-xor and flags that as done. Etc. However I
> think that would make the non-offloaded case too slow, or at least
> too clumsy.
>
> So instead we set flags to say what needs to be done and have a
> workqueue system that does it.
>
> (so far this is all quite similar to what you have done.)
>
> So handle_stripe would set various flag and other things (like
> identify which block was the 'target' block) and run the following
> in a workqueue:
>
> raid5_do_stuff(struct stripe_head *sh)
> {
> raid5_cont_t *conf = sh->raid_conf;
>
> if (test_bit(CLEAR_TARGET, &sh->ops.pending)) {
> struct page = *p->sh->dev[sh->ops.target].page;
> rv = async_memset(p, 0, 0, PAGE_SIZE, ops_done, sh);
> if (rv != BUSY)
> clear_bit(CLEAR_TARGET, &sh->ops.pending);
> if (rv != COMPLETE)
> goto out;
> }
>
> while (test_bit(PRE_XOR, &sh->ops.pending)) {
> struct page *plist[XOR_MAX];
> int offset[XOR_MAX];
> int pos = 0;
> int d;
>
> for (d = sh->ops.nextdev;
> d < conf->raid_disks && pos < XOR_MAX ;
> d++) {
> if (sh->ops.nextdev == sh->ops.target)
> continue;
> if (!test_bit(R5_WantPreXor, &sh->dev[d].flags))
> continue;
> plist[pos] = sh->dev[d].page;
> offset[pos++] = 0;
> }
> if (pos) {
> struct page *p = sh->dev[sh->ops.target].page;
> rv = async_xor(p, 0, plist, offset, pos, PAGE_SIZE,
> ops_done, sh);
> if (rv != BUSY)
> sh->ops.nextdev = d;
> if (rv != COMPLETE)
> goto out;
> } else {
> clear_bit(PRE_XOR, &sh->ops.pending);
> sh->ops.nextdev = 0;
> }
> }
>
> while (test_bit(COPY_IN, &sh0>ops.pending)) {
> ...
> }
> ....
>
> if (test_bit(START_IO, &sh->ops.pending)) {
> int d;
> for (d = 0 ; d < conf->raid_disk ; d++) {
> /* all that code from the end of handle_stripe */
> }
>
> release_stripe(conf, sh);
> return;
>
> out:
> if (rv == BUSY) {
> /* wait on something and try again ???*/
> }
> return;
> }
>
> ops_done(struct stripe_head *sh)
> {
> queue_work(....whatever..);
> }
>
>
> Things to note:
> - we keep track of where we are up to in sh->ops.
> .pending is flags saying what is left to be done
> .next_dev is the next device to process for operations that
> work on several devices
> .next_bio, .next_iov will be needed for copy operations that
> cross multiple bios and iovecs.
>
> - Each sh->dev has R5_Want flags reflecting which multi-device
> operations are wanted on each device.
>
> - async bulk-memory operations take pages, offsets, and lengths,
> and can return COMPLETE (if the operation was performed
> synchronously) IN_PROGRESS (if it has been started, or at least
> queued) or BUSY if it couldn't even be queued. Exactly what to do
> in that case I'm not sure. Probably we need a waitqueue to wait
> on.
>
> - The interface between the client and the ADMA hardware is a
> collection of async_ functions. async_memcpy, async_xor,
> async_memset etc.
>
> I gather there needs to be some understanding
> about whether the pages are already appropriately mapped for DMA or
> whether a mapping is needed. Maybe an extra flag argument should
> be passed.
>
> I imagine that any piece of ADMA hardware would register with the
> 'async_*' subsystem, and a call to async_X would be routed as
> appropriate, or be run in-line.
>
> This approach introduces 8 flags for sh->ops.pending and maybe two or
> three new R5_Want* flags. It also keeps the raid5 knowledge firmly in
> the raid5 code base. So it seems to keep the complexity under control
>
> Would this approach make sense to you?
Definitely.

> Is there something really important I have missed?
No, nothing important jumps out. Just a follow up question/note about
the details.

You imply that the async path and the sync path are unified in this
implementation. I think it is doable but it will add some complexity
since the sync case is not a distinct subset of the async case. For
example "Clear a target cache block" is required for the sync case,
but it can go away when using hardware engines. Engines typically
have their own accumulator buffer to store the temporary result,
whereas software only operates on memory.

What do you think of adding async tests for these situations?
test_bit(XOR, &conf->async)

Where a flag is set if calls to async_<operation> may be routed to
hardware engine? Otherwise skip any async specific details.

>
> (I'll try and be more responsive next time).
Thanks for shepherding this along.

>
> Thanks,
> NeilBrown

Regards,
Dan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo RemoveThis @vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Back to top
Dan Williams
External


Since: Sep 12, 2006
Posts: 26



PostPosted: Wed Oct 11, 2006 3:50 am    Post subject: Re: [PATCH 00/19] Hardware Accelerated MD RAID5: Introduction [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

On 9/14/06, Jakob Oestergaard <jakob RemoveThis @unthought.net> wrote:
> On Wed, Sep 13, 2006 at 12:17:55PM -0700, Dan Williams wrote:
> ...
> > >Out of curiosity; how does accelerated compare to non-accelerated?
> >
> > One quick example:
> > 4-disk SATA array rebuild on iop321 without acceleration - 'top'
> > reports md0_resync and md0_raid5 dueling for the CPU each at ~50%
> > utilization.
> >
> > With acceleration - 'top' reports md0_resync cpu utilization at ~90%
> > with the rest split between md0_raid5 and md0_raid5_ops.
> >
> > The sync speed reported by /proc/mdstat is ~40% higher in the accelerated
> > case.
>
> Ok, nice Smile
>
> >
> > That being said, array resync is a special case, so your mileage may
> > vary with other applications.
>
> Every-day usage I/O performance data would be nice indeed Smile
>
> > I will put together some data from bonnie++, iozone, maybe contest,
> > and post it on SourceForge.
>
> Great!
>
I have posted some Iozone data and graphs showing the performance
impact of the patches across the three iop processors iop321, iop331,
and iop341. The general take away from the data is that using dma
engines extends the region that Iozone calls the "buffer cache
effect". Write performance benefited the most as expected, but read
performance showed some modest gains as well. There are some regions
(smaller file size and record length) that show a performance
disadvantage but it is typically less than 5%.

The graphs map the relative performance multiplier that the raid
patches generate ('2.6.18-rc6 performance' x 'performance multiplier'
= '2.6.18-rc6-raid performance') . A value of '1' designates equal
performance. The large cliff that drops to zero is a "not measured"
region, i.e. the record length is larger than the file size. Iozone
outputs to Excel, but I have also made pdf's of the graphs available.
Note: Openoffice-calc can view the data but it does not support the 3D
surface graphs that Iozone uses.

Excel:
http://prdownloads.sourceforge.net/xscaleiop/iozone_raid_accel.xls?download

PDF Graphs:
http://prdownloads.sourceforge.net/xscaleiop/iop-iozone-graphs-2006101...ar.bz2?

Regards,
Dan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo RemoveThis @vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Back to top
Neil Brown
External


Since: Mar 13, 2006
Posts: 356



PostPosted: Wed Oct 11, 2006 4:50 am    Post subject: Re: [PATCH 00/19] Hardware Accelerated MD RAID5: Introduction [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

[dropped akpm from the Cc: as current discussion isn't directly
relevant to him]
On Tuesday October 10, dan.j.williams RemoveThis @intel.com wrote:
> On 10/8/06, Neil Brown <neilb RemoveThis @suse.de> wrote:
>
> > Is there something really important I have missed?
> No, nothing important jumps out. Just a follow up question/note about
> the details.
>
> You imply that the async path and the sync path are unified in this
> implementation. I think it is doable but it will add some complexity
> since the sync case is not a distinct subset of the async case. For
> example "Clear a target cache block" is required for the sync case,
> but it can go away when using hardware engines. Engines typically
> have their own accumulator buffer to store the temporary result,
> whereas software only operates on memory.
>
> What do you think of adding async tests for these situations?
> test_bit(XOR, &conf->async)
>
> Where a flag is set if calls to async_<operation> may be routed to
> hardware engine? Otherwise skip any async specific details.

I'd rather try to come up with an interface that was equally
appropriate to both offload and inline. I appreciate that it might
not be possible to get an interface that gets best performance out of
both, but I'd like to explore that direction first.

I'd guess from what you say that the dma engine is given a bunch of
sources and a destination and it xor's all the sources together into
an accumulation buffer, and then writes the accum buffer to the
destination. Would that be right? Can you use the destination as one
of the sources?

That can obviously be done inline too with some changes to the xor
code, and avoiding the initial memset might be good for performance
too.

So I would suggest we drop the memset idea, and define the async_xor
interface to xor a number of sources into a destination, where the
destination is allowed to be the same as the first source, but
doesn't need to be.
Then the inline version could use a memset followed by the current xor
operations, or could use newly written xor operations, and the offload
version could equally do whatever is appropriate.

Another place where combining operations might make sense is copy-in
and post-xor. In some cases it might be more efficient to only read
the source once, and both write it to the destination and xor it into
the target. Would your DMA engine be able to optimise this
combination? I think current processors could certainly do better if
the two were combined.

So there is definitely room to move, but would rather avoid flags if I
could.

NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo RemoveThis @vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Back to top
Display posts from previous:   
Post new topic   General Reply to Topic (not reply to a specific post)    Forums Home -> Kernel (archive) All times are: Eastern Time (US & Canada) (change)
Goto page Previous  1, 2, 3
Page 3 of 3

 
You can post new topics in this forum
You can reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum