[Patch v4] net/mana: use rte_pktmbuf_alloc_bulk for allocating RX mbufs
Ferruh Yigit
ferruh.yigit at amd.com
Tue Feb 6 19:06:50 CET 2024
On 2/2/2024 1:19 AM, longli at linuxonhyperv.com wrote:
> From: Long Li <longli at microsoft.com>
>
> Instead of allocating mbufs one by one during RX, use
> rte_pktmbuf_alloc_bulk() to allocate them in a batch.
>
> With this patch, there are no measurable performance improvements in
> benchmarks. However, this patch should improve CPU cycles and reduce
> potential locking conflicts in real-world applications.
>
> Signed-off-by: Long Li <longli at microsoft.com>
<...>
> @@ -120,20 +113,33 @@ mana_alloc_and_post_rx_wqe(struct mana_rxq *rxq)
> /*
> * Post work requests for a Rx queue.
> */
> +#define MANA_MBUF_BULK 32u
> static int
> -mana_alloc_and_post_rx_wqes(struct mana_rxq *rxq)
> +mana_alloc_and_post_rx_wqes(struct mana_rxq *rxq, uint32_t count)
> {
> int ret;
> - uint32_t i;
> + uint32_t i, batch_count;
> + struct rte_mbuf *mbufs[MANA_MBUF_BULK];
> +
> +more_mbufs:
> + batch_count = RTE_MIN(count, MANA_MBUF_BULK);
> + ret = rte_pktmbuf_alloc_bulk(rxq->mp, mbufs, batch_count);
> + if (ret) {
> + DP_LOG(ERR, "failed to allocate mbufs for RX");
> + rxq->stats.nombuf += count;
> +
> + /* Bail out to ring doorbell for posted packets */
> + goto out;
> + }
>
> #ifdef RTE_ARCH_32
> rxq->wqe_cnt_to_short_db = 0;
> #endif
> - for (i = 0; i < rxq->num_desc; i++) {
> - ret = mana_alloc_and_post_rx_wqe(rxq);
> + for (i = 0; i < batch_count; i++) {
> + ret = mana_post_rx_wqe(rxq, mbufs[i]);
> if (ret) {
> DP_LOG(ERR, "failed to post RX ret = %d", ret);
> - return ret;
> + break;
>
Hi Long,
Assume that if "count > MANA_MBUF_BULK", and int the first iteration of
the loop 'mana_post_rx_wqe()' failed, but in second iteration it is
successful, this will cause function to return success in spite of
failure in first iteration.
As mbufs posted Rx queue, it may be OK to consider above case as
success, but since 'count' number not posted this may be misleading.
I just want to double check if this is done intentionally.
With the limitation of VLA code become more complex, and if there is no
performance benefit of the allocating 'mbufs' array from stack, you may
prefer to switch back to allocating dynamic memory, up to you.
More information about the dev
mailing list