[dpdk-dev] [PATCH v3 15/29] crypto/qat: use eal I/O device memory read/write API

Jerin Jacob jerin.jacob at caviumnetworks.com
Fri Jan 13 09:17:05 CET 2017


On Thu, Jan 12, 2017 at 07:09:22PM +0000, Ferruh Yigit wrote:
> Hi Jerin,
> 
> On 1/12/2017 9:17 AM, Jerin Jacob wrote:
> <...>
> 
> > +#include <rte_io.h>
> > +
> >  /* CSR write macro */
> > -#define ADF_CSR_WR(csrAddr, csrOffset, val) \
> > -	(void)((*((volatile uint32_t *)(((uint8_t *)csrAddr) + csrOffset)) \
> > -			= (val)))
> > +#define ADF_CSR_WR(csrAddr, csrOffset, val)		\
> > +	rte_write32(val, (((uint8_t *)csrAddr) + csrOffset))
> 
> For IA, this update introduces an extra compiler barrier (rte_io_wmb()),
> which is indeed not a must, is this correct?

AFAIK, Compiler barrier is required for IA. I am not an IA expert, if
someone thinks it needs to changed then I can fix it in following commit
in this patch series by making rte_io_wmb() and rte_io_rmb() as empty.

Let me know.

AFAIK, Linux kernel code has a barrier in readl/writel for IA.

Typically we don't use any non relaxed versions in fast path.In fast
typically all the drivers has explicit write barrier for doorbell write
and followed by a relaxed version of write. IMO, In any event, it won't
generate performance regression.

[dpdk-master] $ git show
70c343bdc8c33a51a9db23cd58122bdfc120a58f
commit 70c343bdc8c33a51a9db23cd58122bdfc120a58f
Author: Jerin Jacob <jerin.jacob at caviumnetworks.com>
Date:   Mon Dec 5 06:36:49 2016 +0530

    eal/x86: define I/O device memory barriers for IA

    The patch does not provide any functional change for IA.
    I/O barriers are mapped to existing smp barriers.

    CC: Bruce Richardson <bruce.richardson at intel.com>
    CC: Konstantin Ananyev <konstantin.ananyev at intel.com>
    Signed-off-by: Jerin Jacob <jerin.jacob at caviumnetworks.com>

diff --git a/lib/librte_eal/common/include/arch/x86/rte_atomic.h
b/lib/librte_eal/common/include/arch/x86/rte_atomic.h
index 00b1cdf..4eac666 100644
--- a/lib/librte_eal/common/include/arch/x86/rte_atomic.h
+++ b/lib/librte_eal/common/include/arch/x86/rte_atomic.h
@@ -61,6 +61,12 @@ extern "C" {
 
 #define rte_smp_rmb() rte_compiler_barrier()
 
+#define rte_io_mb() rte_mb()
+
+#define rte_io_wmb() rte_compiler_barrier()
+
+#define rte_io_rmb() rte_compiler_barrier()
+
 /*------------------------- 16 bit atomic operations
 * -------------------------*/
 
 #ifndef RTE_FORCE_INTRINSICS

> 
> If so, does it make sense to override these functions for x86, and make
> rte_writeX = rte_writeX_relaxed
> rte_readX = rte_readX_relaxed
> 
> >  
> >  /* CSR read macro */
> > -#define ADF_CSR_RD(csrAddr, csrOffset) \
> > -	(*((volatile uint32_t *)(((uint8_t *)csrAddr) + csrOffset)))
> > +#define ADF_CSR_RD(csrAddr, csrOffset)			\
> > +	rte_read32((((uint8_t *)csrAddr) + csrOffset))
> 
> This patchset both introduces new rte_readX/rte_writeX functions, also
> applies them into drivers.
> 
> While applying them, it changes the behavior.
> Like above code was doing a read, but after update it does read and
> read_memory_barrier.
> 
> What do you think this patchset updates usage in a manner that keeps
> behavior exact same. Like using rte_read32_relaxed for this case.
> And doing architecture related updates in a different patchset?

Need to use rte_read32 at this commit otherwise it will break for ARM.
That's was all point for this patchset.
For performance regression, we can always verify by taking delta
between this changeset and the previous changeset. If you think, I need
to make rte_io_wmb()/rte_io_rmb() as empty for IA then I could do that
as well.


> 
> This both makes easy to see architecture specific updates, and makes
> easy to trace any possible performance issues by this patchset.
> 
> >  
> >  #define ADF_BANK_INT_SRC_SEL_MASK_0 0x4444444CUL
> >  #define ADF_BANK_INT_SRC_SEL_MASK_X 0x44444444UL
> > 
> 


More information about the dev mailing list