[dpdk-dev] How to approach packet TX lockups
Stephen Hemminger
stephen at networkplumber.org
Tue Nov 17 01:12:01 CET 2015
On Mon, 16 Nov 2015 17:48:35 -0600
Matt Laswell <laswell at infiniteio.com> wrote:
> Hey Folks,
>
> I sent this to the users email list, but I'm not sure how many people are
> actively reading that list at this point. I'm dealing with a situation in
> which my application loses the ability to transmit packets out of a port
> during times of moderate stress. I'd love to hear suggestions for how to
> approach this problem, as I'm a bit at a loss at the moment.
>
> Specifically, I'm using DPDK 1.6r2 running on Ubuntu 14.04LTS on Haswell
> processors. I'm using the 82599 controller, configured to spread packets
> across multiple queues. Each queue is accessed by a different lcore in my
> application; there is therefore concurrent access to the controller, but
> not to any of the queues. We're binding the ports to the igb_uio driver.
> The symptoms I see are these:
>
>
> - All transmit out of a particular port stops
> - rte_eth_tx_burst() indicates that it is sending all of the packets
> that I give to it
> - rte_eth_stats_get() gives me stats indicating that no packets are
> being sent on the affected port. Also, no tx errors, and no pause frames
> sent or received (opackets = 0, obytes = 0, oerrors = 0, etc.)
> - All other ports continue to work normally
> - The affected port continues to receive packets without problems; only
> TX is affected
> - Resetting the port via rte_eth_dev_stop() and rte_eth_dev_start()
> restores things and packets can flow again
> - The problem is replicable on multiple devices, and doesn't follow one
> particular port
>
> I've tried calling rte_mbuf_sanity_check() on all packets before sending
> them. I've also instrumented my code to look for packets that have already
> been sent or freed, as well as cycles in chained packets being sent. I
> also put a lock around all accesses to rte_eth* calls to synchronize access
> to the NIC. Given some recent discussion here, I also tried changing the
> TX RS threshold from 0 to 32, 16, and 1. None of these strategies proved
> effective.
>
> Like I said at the top, I'm a little at a loss at this point. If you were
> dealing with this set of symptoms, how would you proceed?
>
I remember some issues with old DPDK 1.6 with some of the prefetch
thresholds on 82599. You would be better off going to a later DPDK
version.
More information about the dev
mailing list