[PATCH v20 09/25] net/pcap: fix error accounting and backpressure on transmit

Bruce Richardson bruce.richardson at intel.com
Mon Mar 16 13:24:17 CET 2026


On Tue, Mar 10, 2026 at 09:09:47AM -0700, Stephen Hemminger wrote:
> The error handling when pcap_sendpacket() was incorrect.
> When underlying kernel socket buffer got full the send was counted as
> an error. Malformed multi-segment mbufs where pkt_len exceeds actual
> data were silently accepted.
> 
> On Linux, pcap_sendpacket() calls send() on a blocking PF_PACKET
> socket with default kernel buffer sizes and no TX ring (PACKET_TX_RING).
> The send() call only blocks when the kernel socket send buffer is full,
> providing limited backpressure. Backpressure is not an error.
> 

I think we need a clearer explanation of what this patch does, this didn't
make sense to me on reading it. See also comments inline below - I think we
can get a better patch with a clearer explanation by just focusing on the
changes to eth_pcap_tx. If you want to update the tx_dumper function put it
in a different patch with a different explanation.

> Fixes: fbbbf553f268 ("net/pcap: fix concurrent multiseg Tx")
> Cc: stable at dpdk.org
> 
> Signed-off-by: Stephen Hemminger <stephen at networkplumber.org>

With commit log clarified, I'm ok with this, but would prefer a smaller
patch.

Acked-by: Bruce Richardson <bruce.richardson at intel.com>


> ---
>  drivers/net/pcap/pcap_ethdev.c | 58 +++++++++++++++++++++-------------
>  1 file changed, 36 insertions(+), 22 deletions(-)
> 
> diff --git a/drivers/net/pcap/pcap_ethdev.c b/drivers/net/pcap/pcap_ethdev.c
> index 72a297d423..47a050df11 100644
> --- a/drivers/net/pcap/pcap_ethdev.c
> +++ b/drivers/net/pcap/pcap_ethdev.c
> @@ -407,7 +407,8 @@ eth_pcap_tx_dumper(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
>  	 * dumper */
>  	for (i = 0; i < nb_pkts; i++) {
>  		struct rte_mbuf *mbuf = bufs[i];
> -		size_t len, caplen;
> +		uint32_t len, caplen;
> +		const uint8_t *data;
>  
>  		len = rte_pktmbuf_pkt_len(mbuf);
>  		caplen = RTE_MIN(len, RTE_ETH_PCAP_SNAPSHOT_LEN);
> @@ -415,15 +416,16 @@ eth_pcap_tx_dumper(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
>  		calculate_timestamp(&header.ts);
>  		header.len = len;
>  		header.caplen = caplen;
> -		/* rte_pktmbuf_read() returns a pointer to the data directly
> -		 * in the mbuf (when the mbuf is contiguous) or, otherwise,
> -		 * a pointer to temp_data after copying into it.
> -		 */
> -		pcap_dump((u_char *)dumper, &header,
> -			rte_pktmbuf_read(mbuf, 0, caplen, temp_data));
> +
> +		data = rte_pktmbuf_read(mbuf, 0, caplen, temp_data);
> +
> +		/* This could only happen if mbuf is bogus pkt_len > data_len */
> +		RTE_ASSERT(data != NULL);
> +		pcap_dump((u_char *)dumper, &header, data);
>  
>  		num_tx++;
>  		tx_bytes += caplen;
> +
>  		rte_pktmbuf_free(mbuf);
>  	}
>  
> @@ -435,9 +437,8 @@ eth_pcap_tx_dumper(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
>  	pcap_dump_flush(dumper);
>  	dumper_q->tx_stat.pkts += num_tx;
>  	dumper_q->tx_stat.bytes += tx_bytes;
> -	dumper_q->tx_stat.err_pkts += nb_pkts - num_tx;
>  
> -	return nb_pkts;
> +	return i;
>  }

While not wrong or problematic, the changes here are unnecessary IMHO,
since there is no error that I can see in this function. The changes I see
are:
* change datatype from size_t to uint32_t - ok but the original code wasn't
  really wrong for this.
* having an RTE_ASSERT for the data being NULL - I suppose this is useful
  for certain debug builds, but only if the user knows to define
  RTE_ENABLE_ASSERT.
* extra whitespace line - unnecessary
* drop status increment - the num_tx value would always equal nb_pkts, so
  this was an increment of zero. Removing it is good, as it saves a store.
* change return value from nb_pkts to i - this is equivalent since the loop
  always completes so that i == nb_pkts, so change is not needed.

The change I would keep is the unnecessary error increment since it was
always zero - however, it doesn't go far enough, as num_tx can be removed
in the function completely. By function end, i == num_tx == nb_pkts, so
let's just use nb_pkts directly, and we can also make "i" a loop-local
variable.

>  
>  /*
> @@ -462,7 +463,17 @@ eth_tx_drop(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
>  }
>  
>  /*
> - * Callback to handle sending packets through a real NIC.
> + * Send a burst of packets to a pcap device.
> + *
> + * On Linux, pcap_sendpacket() calls send() on a blocking PF_PACKET
> + * socket with default kernel buffer sizes and no TX ring (PACKET_TX_RING).
> + * The send() call only blocks when the kernel socket send buffer is full,
> + * providing limited backpressure.
> + *
> + * On error, pcap_sendpacket() returns non-zero and the loop breaks,
> + * leaving remaining packets unsent.
> + *
> + * Bottom line: backpressure is not an error.
>   */
>  static uint16_t
>  eth_pcap_tx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
> @@ -484,34 +495,37 @@ eth_pcap_tx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
>  
>  	for (i = 0; i < nb_pkts; i++) {
>  		struct rte_mbuf *mbuf = bufs[i];
> -		size_t len = rte_pktmbuf_pkt_len(mbuf);
> -		int ret;
> +		uint32_t len = rte_pktmbuf_pkt_len(mbuf);
> +		const uint8_t *data;
>  
> -		if (unlikely(!rte_pktmbuf_is_contiguous(mbuf) &&
> -				len > RTE_ETH_PCAP_SNAPSHOT_LEN)) {
> +		if (unlikely(!rte_pktmbuf_is_contiguous(mbuf) && len > RTE_ETH_PCAP_SNAPSHOT_LEN)) {
>  			PMD_LOG(ERR,
> -				"Dropping multi segment PCAP packet. Size (%zd) > max size (%u).",
> +				"Dropping multi segment PCAP packet. Size (%u) > max size (%u).",
>  				len, RTE_ETH_PCAP_SNAPSHOT_LEN);
> +			tx_queue->tx_stat.err_pkts++;
>  			rte_pktmbuf_free(mbuf);
>  			continue;
>  		}
>  
> -		/* rte_pktmbuf_read() returns a pointer to the data directly
> -		 * in the mbuf (when the mbuf is contiguous) or, otherwise,
> -		 * a pointer to temp_data after copying into it.
> +		data = rte_pktmbuf_read(mbuf, 0, len, temp_data);
> +		RTE_ASSERT(data != NULL);
> +
> +		/*
> +		 * No good way to separate back pressure from failure here
> +		 * Assume it is EBUSY, ENOMEM, or EINTR, something that can be retried.
>  		 */
> -		ret = pcap_sendpacket(pcap,
> -			rte_pktmbuf_read(mbuf, 0, len, temp_data), len);
> -		if (unlikely(ret != 0))
> +		if (pcap_sendpacket(pcap, data, len) != 0) {
> +			PMD_LOG(ERR, "pcap_sendpacket() failed: %s", pcap_geterr(pcap));
>  			break;
> +		}
>  		num_tx++;
>  		tx_bytes += len;
> +
>  		rte_pktmbuf_free(mbuf);
>  	}
>  
>  	tx_queue->tx_stat.pkts += num_tx;
>  	tx_queue->tx_stat.bytes += tx_bytes;
> -	tx_queue->tx_stat.err_pkts += i - num_tx;
>  
>  	return i;
>  }
> -- 
> 2.51.0
> 


More information about the dev mailing list