[dpdk-dev] [PATCH v9 2/6] lib/ring: apis to support configurable element size
David Christensen
drc at linux.vnet.ibm.com
Fri Jan 17 19:10:59 CET 2020
>>> +static __rte_always_inline void
>>> +enqueue_elems_128(struct rte_ring *r, uint32_t prod_head,
>>> + const void *obj_table, uint32_t n)
>>> +{
>>> + unsigned int i;
>>> + const uint32_t size = r->size;
>>> + uint32_t idx = prod_head & r->mask;
>>> + rte_int128_t *ring = (rte_int128_t *)&r[1];
>>> + const rte_int128_t *obj = (const rte_int128_t *)obj_table;
>>> + if (likely(idx + n < size)) {
>>> + for (i = 0; i < (n & ~0x1); i += 2, idx += 2)
>>> + memcpy((void *)(ring + idx),
>>> + (const void *)(obj + i), 32);
>>> + switch (n & 0x1) {
>>> + case 1:
>>> + memcpy((void *)(ring + idx),
>>> + (const void *)(obj + i), 16);
>>> + }
>>> + } else {
>>> + for (i = 0; idx < size; i++, idx++)
>>> + memcpy((void *)(ring + idx),
>>> + (const void *)(obj + i), 16);
>>> + /* Start at the beginning */
>>> + for (idx = 0; i < n; i++, idx++)
>>> + memcpy((void *)(ring + idx),
>>> + (const void *)(obj + i), 16);
>>> + }
>>> +}
>>> +
>>> +/* the actual enqueue of elements on the ring.
>>> + * Placed here since identical code needed in both
>>> + * single and multi producer enqueue functions.
>>> + */
>>> +static __rte_always_inline void
>>> +enqueue_elems(struct rte_ring *r, uint32_t prod_head, const void
>> *obj_table,
>>> + uint32_t esize, uint32_t num)
>>> +{
>>> + /* 8B and 16B copies implemented individually to retain
>>> + * the current performance.
>>> + */
>>> + if (esize == 8)
>>> + enqueue_elems_64(r, prod_head, obj_table, num);
>>> + else if (esize == 16)
>>> + enqueue_elems_128(r, prod_head, obj_table, num);
>>> + else {
>>> + uint32_t idx, scale, nr_idx, nr_num, nr_size;
>>> +
>>> + /* Normalize to uint32_t */
>>> + scale = esize / sizeof(uint32_t);
>>> + nr_num = num * scale;
>>> + idx = prod_head & r->mask;
>>> + nr_idx = idx * scale;
>>> + nr_size = r->size * scale;
>>> + enqueue_elems_32(r, nr_size, nr_idx, obj_table, nr_num);
>>> + }
>>> +}
>>
>> Following Konstatin's comment on v7, enqueue_elems_128() was modified to
>> ensure it won't crash if the object is unaligned. Are we sure that this same
>> problem cannot also occurs with 64b copies on all supported architectures? (I
>> mean 64b access that is only aligned on 32b)
> Konstantin mentioned that the 64b load/store instructions on x86 can handle unaligned access. On aarch64, the load/store (non-atomic, which will be used in this case) can handle unaligned access.
>
> + David Christensen to comment for PPC
The vectorized version of memcpy for Power can handle unaligned access
as well.
Dave
More information about the dev
mailing list