[dpdk-dev] [PATCH v3 15/29] crypto/qat: use eal I/O device memory read/write API

Jerin Jacob jerin.jacob at caviumnetworks.com
Fri Jan 13 17:21:53 CET 2017


On Fri, Jan 13, 2017 at 03:50:59PM +0000, Ferruh Yigit wrote:
> On 1/13/2017 2:57 PM, Jerin Jacob wrote:
> > On Fri, Jan 13, 2017 at 11:32:29AM +0000, Ferruh Yigit wrote:
> >> On 1/13/2017 8:17 AM, Jerin Jacob wrote:
> >>> On Thu, Jan 12, 2017 at 07:09:22PM +0000, Ferruh Yigit wrote:
> >>>> Hi Jerin,
> >>>>
> >>>> On 1/12/2017 9:17 AM, Jerin Jacob wrote:
> >>>> <...>
> >>>>
> >>>>> +#include <rte_io.h>
> >>>>> +
> >>>>>  /* CSR write macro */
> >>>>> -#define ADF_CSR_WR(csrAddr, csrOffset, val) \
> >>>>> -	(void)((*((volatile uint32_t *)(((uint8_t *)csrAddr) + csrOffset)) \
> >>>>> -			= (val)))
> >>>>> +#define ADF_CSR_WR(csrAddr, csrOffset, val)		\
> >>>>> +	rte_write32(val, (((uint8_t *)csrAddr) + csrOffset))
> >>>>
> >>>> For IA, this update introduces an extra compiler barrier (rte_io_wmb()),
> >>>> which is indeed not a must, is this correct?
> >>>
> >>> AFAIK, Compiler barrier is required for IA. I am not an IA expert, if
> >>> someone thinks it needs to changed then I can fix it in following commit
> >>> in this patch series by making rte_io_wmb() and rte_io_rmb() as empty.
> >>>
> >>> Let me know.
> >>>
> >>> AFAIK, Linux kernel code has a barrier in readl/writel for IA.
> >>>
> >>> Typically we don't use any non relaxed versions in fast path.In fast
> >>> typically all the drivers has explicit write barrier for doorbell write
> >>> and followed by a relaxed version of write. IMO, In any event, it won't
> >>> generate performance regression.
> >>>
> >>> [dpdk-master] $ git show
> >>> 70c343bdc8c33a51a9db23cd58122bdfc120a58f
> >>> commit 70c343bdc8c33a51a9db23cd58122bdfc120a58f
> >>> Author: Jerin Jacob <jerin.jacob at caviumnetworks.com>
> >>> Date:   Mon Dec 5 06:36:49 2016 +0530
> >>>
> >>>     eal/x86: define I/O device memory barriers for IA
> >>>
> >>>     The patch does not provide any functional change for IA.
> >>>     I/O barriers are mapped to existing smp barriers.
> >>>
> >>>     CC: Bruce Richardson <bruce.richardson at intel.com>
> >>>     CC: Konstantin Ananyev <konstantin.ananyev at intel.com>
> >>>     Signed-off-by: Jerin Jacob <jerin.jacob at caviumnetworks.com>
> >>>
> >>> diff --git a/lib/librte_eal/common/include/arch/x86/rte_atomic.h
> >>> b/lib/librte_eal/common/include/arch/x86/rte_atomic.h
> >>> index 00b1cdf..4eac666 100644
> >>> --- a/lib/librte_eal/common/include/arch/x86/rte_atomic.h
> >>> +++ b/lib/librte_eal/common/include/arch/x86/rte_atomic.h
> >>> @@ -61,6 +61,12 @@ extern "C" {
> >>>  
> >>>  #define rte_smp_rmb() rte_compiler_barrier()
> >>>  
> >>> +#define rte_io_mb() rte_mb()
> >>> +
> >>> +#define rte_io_wmb() rte_compiler_barrier()
> >>> +
> >>> +#define rte_io_rmb() rte_compiler_barrier()
> >>> +
> >>>  /*------------------------- 16 bit atomic operations
> >>>  * -------------------------*/
> >>>  
> >>>  #ifndef RTE_FORCE_INTRINSICS
> >>>
> >>>>
> >>>> If so, does it make sense to override these functions for x86, and make
> >>>> rte_writeX = rte_writeX_relaxed
> >>>> rte_readX = rte_readX_relaxed
> >>>>
> >>>>>  
> >>>>>  /* CSR read macro */
> >>>>> -#define ADF_CSR_RD(csrAddr, csrOffset) \
> >>>>> -	(*((volatile uint32_t *)(((uint8_t *)csrAddr) + csrOffset)))
> >>>>> +#define ADF_CSR_RD(csrAddr, csrOffset)			\
> >>>>> +	rte_read32((((uint8_t *)csrAddr) + csrOffset))
> >>>>
> >>>> This patchset both introduces new rte_readX/rte_writeX functions, also
> >>>> applies them into drivers.
> >>>>
> >>>> While applying them, it changes the behavior.
> >>>> Like above code was doing a read, but after update it does read and
> >>>> read_memory_barrier.
> >>>>
> >>>> What do you think this patchset updates usage in a manner that keeps
> >>>> behavior exact same. Like using rte_read32_relaxed for this case.
> >>>> And doing architecture related updates in a different patchset?
> >>>
> >>> Need to use rte_read32 at this commit otherwise it will break for ARM.
> >>> That's was all point for this patchset.
> >>
> >> Why it breaks the ARM, is it because rte_*mb() updated for ARM in this
> >> patchset (patch 7/29) ?
> > 
> > Yes.
> > 
> > 
> >>
> >> I believe it is good to make these modifications in two phase:
> > 
> > It is in two phases only. First introduced the API with implementation and
> > enabled in each driver. Why did you think other-way around it is better?
> 
> For two things:
> 1- If something goes wrong, find the source of problem easier.

How?

Are you suggesting like this below,
0) Introduce rte_io_?mb()
1) Introduce readxx_relaxed and writexx_relaxed
2) Change all the drivers with readxx_relaxed and writexx_relaxed(15
change sets) to keep the same behavior
3) Introduce readxx and writexx
4) revert step 2 changes and make driver based on readxx and
writexx(again 15 change sets)


Instead of(existing one)
0) Introduce rte_io_?mb()
1) Introduce readxx_relaxed and writexx_relaxed
2) Introduce readxx and writexx
3) Change all the drivers with readxx_[relaxed] and writexx_[relaxed]

Proposed scheme makes driver authors to review two check-ins.
And git bisect fail on fourth case of instead of thrird case with
existing one.

Not sure what we soloving here?

Thomas,
Any thoughts?


> 2- Make architectural changes obvious, right now it is a little hard to
> see, and this again for item 1.
> 
> But I also would like to hear more comments before you change/try anything.
> 
> > I can rework and test if there is any value addition. If you concerned
> > about git bisect ability then I don't think we are loosing that in this
> > model.
> > 
> > Thoughts?
> > 
> >> - First replace old usage with rte_readX/rte_writeX while keeping exact
> >> same behavior
> >>
> >> - Second, do architecture specific changes. Both in eal and drivers
> >> level if required.
> >>
> >> Thanks,
> >> ferruh
> >>
> >>> For performance regression, we can always verify by taking delta
> >>> between this changeset and the previous changeset. If you think, I need
> >>> to make rte_io_wmb()/rte_io_rmb() as empty for IA then I could do that
> >>> as well.
> >>>
> >>>
> >>>>
> >>>> This both makes easy to see architecture specific updates, and makes
> >>>> easy to trace any possible performance issues by this patchset.
> >>>>
> >>>>>  
> >>>>>  #define ADF_BANK_INT_SRC_SEL_MASK_0 0x4444444CUL
> >>>>>  #define ADF_BANK_INT_SRC_SEL_MASK_X 0x44444444UL
> >>>>>
> >>>>
> >>
> 


More information about the dev mailing list