From renata.saiakhova at ekinops.com Fri Aug 1 12:00:08 2025 From: renata.saiakhova at ekinops.com (Renata Saiakhova) Date: Fri, 1 Aug 2025 10:00:08 +0000 Subject: Support for Forcing Speed/Duplex/Autoneg on i225 with DPDK igc Driver Message-ID: Hi all, I'm trying to configure Layer 1 attributes (speed, duplex, and autonegotiation) for Intel i225 interfaces in both kernel and DPDK modes, and I've run into some issues. In kernel driver mode, I normally use: "ethtool -s eth5 speed 100 duplex half autoneg off". However, this results in the kernel message: "igc 0000:04:00.0 eth5: Force mode currently not supported". When the interface is bound to DPDK and connected to an OVS bridge, I configure it like this: ovs-vsctl set Interface 1.extra2 \ type=dpdk \ options:dpdk-devargs=0000:04:00.0 \ options:dpdk-speed=100 \ options:dpdk-autoneg=false \ options:dpdk-duplex=half This should pass the speed/autoneg/duplex settings to the igc DPDK driver via devargs. However, it appears these options are not applied - the interface continues to negotiate its settings as usual, and I can't find any code in the igc DPDK driver that processes these devargs parameters. Is support for forced speed/duplex/autoneg available at all for i225 in DPDK (or even kernel) mode? Or is this a hardware limitation? Any insight, documentation pointers, or confirmation of support status especially in case of DPDK mode would be greatly appreciated. Best regards, Renata Saiakhova SW ARCHITECT renata.saiakhova at ekinops.com Tel: +32 16 799 970 [https://www.ekinops.com/images/public-communication/mail-signature/logo_96dpi.gif] [https://www.ekinops.com/images/public-communication/mail-signature/current.png] -------------- next part -------------- An HTML attachment was scrubbed... URL: From ivan.malov at arknetworks.am Fri Aug 1 15:10:44 2025 From: ivan.malov at arknetworks.am (Ivan Malov) Date: Fri, 1 Aug 2025 17:10:44 +0400 (+04) Subject: Support for Forcing Speed/Duplex/Autoneg on i225 with DPDK igc Driver In-Reply-To: References: Message-ID: <403ff976-7b56-0489-10ab-adb6822e801a@arknetworks.am> Hi Renata, On Fri, 1 Aug 2025, Renata Saiakhova wrote: > > Hi all, > > I?m trying to configure Layer 1 attributes (speed, duplex, and autonegotiation) for Intel i225 interfaces in both kernel and DPDK modes, and I?ve run into some issues. > > In kernel driver mode, I normally use: ?ethtool -s eth5 speed 100 duplex half autoneg off?. > > However, this results in the kernel message: ?igc 0000:04:00.0 eth5: Force mode currently not supported?. > > ? > > When the interface is bound to DPDK and connected to an OVS bridge, I configure it like this: > > ovs-vsctl set Interface 1.extra2 \ > > ? type=dpdk \ > > ? options:dpdk-devargs=0000:04:00.0 \ > > ? options:dpdk-speed=100 \ > > ? options:dpdk-autoneg=false \ > > ? options:dpdk-duplex=half > > This should pass the speed/autoneg/duplex settings to the igc DPDK driver via devargs. Should it? Even if these were on the list of supported devargs for the PMD in question, they would belong in comma-separated tokens after '0000:04:00.0'. I don't see these to be parsed in the OvS, neither do I see such in the PMD. Or are you looking at some custom OvS version? > > However, it appears these options are not applied ? the interface continues to negotiate its settings as usual, and I can?t find any code in the igc DPDK driver that processes these > devargs parameters. If these were passed via 'dpdk-devargs', the PMD would've complained perhaps. Being passed as OvS's own options, they're likely just ignored by the OvS. > > Is support for forced speed/duplex/autoneg available at all for i225 in DPDK (or even kernel) mode? Or is this a hardware limitation? As per [1], 'forced' mode may not be supported in the DPDK PMD. But the code suggests one can possibly try to pass, say, 'RTE_ETH_LINK_SPEED_100M_HD', without being accompanied by neither 'AUTONEG' nor 'FIXED', but that would still come as 'advertised' mode, not a 'forced' one. I may be wrong, though. [1] https://github.com/DPDK/dpdk/blob/1b3bf1128d9bda5595861814792f74b8f57160c8/drivers/net/igc/igc_ethdev.c#L1088 > > Any insight, documentation pointers, or confirmation of support status especially in case of DPDK mode would be greatly appreciated. I take it OvS most likely needs to be augmented with extra code to recognise such options and translate those into DPDK link settings. For an example of how a DPDK application can do that, one should refer to 'test-pmd' implementation. Thank you. > > Best regards, > > ? > > ? > > Renata Saiakhova > SW ARCHITECT > renata.saiakhova at ekinops.com > Tel: +32 16 799 970 > > [logo_96dpi.gif] > > [current.png] > > ? > > > From stephen at networkplumber.org Sat Aug 2 00:01:59 2025 From: stephen at networkplumber.org (Stephen Hemminger) Date: Fri, 1 Aug 2025 15:01:59 -0700 Subject: Support for Forcing Speed/Duplex/Autoneg on i225 with DPDK igc Driver In-Reply-To: <403ff976-7b56-0489-10ab-adb6822e801a@arknetworks.am> References: <403ff976-7b56-0489-10ab-adb6822e801a@arknetworks.am> Message-ID: <20250801150159.3ef1e3d4@hermes.local> On Fri, 1 Aug 2025 17:10:44 +0400 (+04) Ivan Malov wrote: > Hi Renata, > > On Fri, 1 Aug 2025, Renata Saiakhova wrote: > > > > > Hi all, > > > > I?m trying to configure Layer 1 attributes (speed, duplex, and autonegotiation) for Intel i225 interfaces in both kernel and DPDK modes, and I?ve run into some issues. > > > > In kernel driver mode, I normally use: ?ethtool -s eth5 speed 100 duplex half autoneg off?. > > > > However, this results in the kernel message: ?igc 0000:04:00.0 eth5: Force mode currently not supported?. > > > > ? > > > > When the interface is bound to DPDK and connected to an OVS bridge, I configure it like this: > > > > ovs-vsctl set Interface 1.extra2 \ > > > > ? type=dpdk \ > > > > ? options:dpdk-devargs=0000:04:00.0 \ > > > > ? options:dpdk-speed=100 \ > > > > ? options:dpdk-autoneg=false \ > > > > ? options:dpdk-duplex=half > > > > This should pass the speed/autoneg/duplex settings to the igc DPDK driver via devargs. > > Should it? Even if these were on the list of supported devargs for the PMD in > question, they would belong in comma-separated tokens after '0000:04:00.0'. > I don't see these to be parsed in the OvS, neither do I see such in the PMD. > > Or are you looking at some custom OvS version? > > > > > However, it appears these options are not applied ? the interface continues to negotiate its settings as usual, and I can?t find any code in the igc DPDK driver that processes these > > devargs parameters. > > If these were passed via 'dpdk-devargs', the PMD would've complained perhaps. > Being passed as OvS's own options, they're likely just ignored by the OvS. > > > > > Is support for forced speed/duplex/autoneg available at all for i225 in DPDK (or even kernel) mode? Or is this a hardware limitation? > > As per [1], 'forced' mode may not be supported in the DPDK PMD. But the code > suggests one can possibly try to pass, say, 'RTE_ETH_LINK_SPEED_100M_HD', > without being accompanied by neither 'AUTONEG' nor 'FIXED', but that would still > come as 'advertised' mode, not a 'forced' one. I may be wrong, though. > > [1] https://github.com/DPDK/dpdk/blob/1b3bf1128d9bda5595861814792f74b8f57160c8/drivers/net/igc/igc_ethdev.c#L1088 > > > > > Any insight, documentation pointers, or confirmation of support status especially in case of DPDK mode would be greatly appreciated. > > I take it OvS most likely needs to be augmented with extra code to recognise > such options and translate those into DPDK link settings. For an example of how > a DPDK application can do that, one should refer to 'test-pmd' implementation. > > Thank you. > > > > > Best regards, > > > > ? > > > > ? > > > > Renata Saiakhova > > SW ARCHITECT > > renata.saiakhova at ekinops.com > > Tel: +32 16 799 970 > > > > [logo_96dpi.gif] > > > > [current.png] > > > > ? > > > > > > Use of driver specific devargs is strongly discouraged. There is link_speeds in rte_eth_conf passed to rte_eth_dev_configure() which is the correct way to set fixed speed. It maybe that the driver doesn't interpret it correctly? From renata.saiakhova at ekinops.com Wed Aug 6 16:10:29 2025 From: renata.saiakhova at ekinops.com (Renata Saiakhova) Date: Wed, 6 Aug 2025 14:10:29 +0000 Subject: Support for Forcing Speed/Duplex/Autoneg on i225 with DPDK igc Driver In-Reply-To: <20250801150159.3ef1e3d4@hermes.local> References: <403ff976-7b56-0489-10ab-adb6822e801a@arknetworks.am> <20250801150159.3ef1e3d4@hermes.local> Message-ID: Hi all, hi Ivan and Stephen, Thank you for your comments and clarifications. Indeed, the correct way to set speed and autoneg for a DPDK port is by configuring the link_speeds field in the struct rte_eth_conf passed to rte_eth_dev_configure(). However, after reviewing the openvswitch (version 3.3.2) code, I found that OVS does not currently expose any mechanism to set these Layer 1 attributes (speed, autoneg or duplex) via ovs-vsctl or OVSDB. The relevant fields in rte_eth_conf are not settable through OVS configuration, and there is no code to parse or apply such options from the OVSDB or command line. To support this, OVS would need to be extended to parse these options and set the corresponding fields in rte_eth_conf before calling rte_eth_dev_configure(). So, at present, there is no way to configure these parameters through OVS without adding custom code. However, for our customer we need to add this implementation. Do you think that the community would benefit from this implementation? Kind regards, Renata Saiakhova -----Original Message----- From: Stephen Hemminger Sent: Saturday, 2 August 2025 00:02 To: Ivan Malov Cc: Renata Saiakhova ; users at dpdk.org Subject: Re: Support for Forcing Speed/Duplex/Autoneg on i225 with DPDK igc Driver On Fri, 1 Aug 2025 17:10:44 +0400 (+04) Ivan Malov wrote: > Hi Renata, > > On Fri, 1 Aug 2025, Renata Saiakhova wrote: > > > > > Hi all, > > > > I?m trying to configure Layer 1 attributes (speed, duplex, and autonegotiation) for Intel i225 interfaces in both kernel and DPDK modes, and I?ve run into some issues. > > > > In kernel driver mode, I normally use: ?ethtool -s eth5 speed 100 duplex half autoneg off?. > > > > However, this results in the kernel message: ?igc 0000:04:00.0 eth5: Force mode currently not supported?. > > > > ? > > > > When the interface is bound to DPDK and connected to an OVS bridge, I configure it like this: > > > > ovs-vsctl set Interface 1.extra2 \ > > > > ? type=dpdk \ > > > > ? options:dpdk-devargs=0000:04:00.0 \ > > > > ? options:dpdk-speed=100 \ > > > > ? options:dpdk-autoneg=false \ > > > > ? options:dpdk-duplex=half > > > > This should pass the speed/autoneg/duplex settings to the igc DPDK driver via devargs. > > Should it? Even if these were on the list of supported devargs for the > PMD in question, they would belong in comma-separated tokens after '0000:04:00.0'. > I don't see these to be parsed in the OvS, neither do I see such in the PMD. > > Or are you looking at some custom OvS version? > > > > > However, it appears these options are not applied ? the interface > > continues to negotiate its settings as usual, and I can?t find any code in the igc DPDK driver that processes these devargs parameters. > > If these were passed via 'dpdk-devargs', the PMD would've complained perhaps. > Being passed as OvS's own options, they're likely just ignored by the OvS. > > > > > Is support for forced speed/duplex/autoneg available at all for i225 in DPDK (or even kernel) mode? Or is this a hardware limitation? > > As per [1], 'forced' mode may not be supported in the DPDK PMD. But > the code suggests one can possibly try to pass, say, > 'RTE_ETH_LINK_SPEED_100M_HD', without being accompanied by neither > 'AUTONEG' nor 'FIXED', but that would still come as 'advertised' mode, not a 'forced' one. I may be wrong, though. > > [1] > https://github.com/DPDK/dpdk/blob/1b3bf1128d9bda5595861814792f74b8f571 > 60c8/drivers/net/igc/igc_ethdev.c#L1088 > > > > > Any insight, documentation pointers, or confirmation of support status especially in case of DPDK mode would be greatly appreciated. > > I take it OvS most likely needs to be augmented with extra code to > recognise such options and translate those into DPDK link settings. > For an example of how a DPDK application can do that, one should refer to 'test-pmd' implementation. > > Thank you. > > > > > Best regards, > > > > ? > > > > ? > > > > Renata Saiakhova > > SW ARCHITECT > > renata.saiakhova at ekinops.com > > Tel: +32 16 799 970 > > > > [logo_96dpi.gif] > > > > [current.png] > > > > ? > > > > > > Use of driver specific devargs is strongly discouraged. There is link_speeds in rte_eth_conf passed to rte_eth_dev_configure() which is the correct way to set fixed speed. It maybe that the driver doesn't interpret it correctly? From lencse at hit.bme.hu Thu Aug 7 17:32:02 2025 From: lencse at hit.bme.hu (=?UTF-8?Q?G=C3=A1bor_LENCSE?=) Date: Thu, 7 Aug 2025 17:32:02 +0200 Subject: How to calculate ICMPv6 checksum? Message-ID: <40938de8-49b3-46d0-964b-9cd296000d10@hit.bme.hu> Dear All, I am working on adding ARP/NDP support to my SIIT / Stateful NAT64 benchmarking tool, siitperf [1]. (So far, the ARP / NDP table entries had to be set manually at the device under test, as siitperf was not able to reply to ARP / NDP requests). The ARP reply functionality seems to work fine, but I have a problem with NDP. As ICMPv6 messages contain checksum, I would need a function that computes it. However, I only found? the?rte_ipv6_udptcp_cksum() function, but I did not find a similar one for calculating ICMPv6 checksum. I have been checking the functions shown here: https://doc.dpdk.org/api/rte__ip6_8h.html Could you please advise me about the function to use for ICMPv6 checksum calculation? Best regards, G?bor [1]?https://github.com/lencsegabor/siitperf From stephen at networkplumber.org Thu Aug 7 19:57:03 2025 From: stephen at networkplumber.org (Stephen Hemminger) Date: Thu, 7 Aug 2025 10:57:03 -0700 Subject: How to calculate ICMPv6 checksum? In-Reply-To: <40938de8-49b3-46d0-964b-9cd296000d10@hit.bme.hu> References: <40938de8-49b3-46d0-964b-9cd296000d10@hit.bme.hu> Message-ID: <20250807105703.22de669d@hermes.local> On Thu, 7 Aug 2025 17:32:02 +0200 G?bor LENCSE wrote: > Dear All, > > I am working on adding ARP/NDP support to my SIIT / Stateful NAT64 > benchmarking tool, siitperf [1]. (So far, the ARP / NDP table entries > had to be set manually at the device under test, as siitperf was not > able to reply to ARP / NDP requests). > > The ARP reply functionality seems to work fine, but I have a problem > with NDP. As ICMPv6 messages contain checksum, I would need a function > that computes it. However, I only found? the?rte_ipv6_udptcp_cksum() > function, but I did not find a similar one for calculating ICMPv6 checksum. > > I have been checking the functions shown here: > https://doc.dpdk.org/api/rte__ip6_8h.html > > Could you please advise me about the function to use for ICMPv6 checksum > calculation? > > Best regards, > > G?bor > > [1]?https://github.com/lencsegabor/siitperf The pseudo-header part is different. https://www.rfc-editor.org/rfc/rfc4443 2.3. Message Checksum Calculation The checksum is the 16-bit one's complement of the one's complement sum of the entire ICMPv6 message, starting with the ICMPv6 message type field, and prepended with a "pseudo-header" of IPv6 header fields, as specified in [IPv6, Section 8.1]. The Next Header value used in the pseudo-header is 58. (The inclusion of a pseudo-header in the ICMPv6 checksum is a change from IPv4; see [IPv6] for the rationale for this change.) For computing the checksum, the checksum field is first set to zero. https://www.rfc-editor.org/rfc/rfc2460#section-8.1 8.1 Upper-Layer Checksums Any transport or other upper-layer protocol that includes the addresses from the IP header in its checksum computation must be modified for use over IPv6, to include the 128-bit IPv6 addresses instead of 32-bit IPv4 addresses. In particular, the following illustration shows the TCP and UDP "pseudo-header" for IPv6: +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | + + | | + Source Address + | | + + | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | + + | | + Destination Address + | | + + | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Upper-Layer Packet Length | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | zero | Next Header | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ o If the IPv6 packet contains a Routing header, the Destination Address used in the pseudo-header is that of the final destination. At the originating node, that address will be in the last element of the Routing header; at the recipient(s), that address will be in the Destination Address field of the IPv6 header. o The Next Header value in the pseudo-header identifies the upper-layer protocol (e.g., 6 for TCP, or 17 for UDP). It will differ from the Next Header value in the IPv6 header if there are extension headers between the IPv6 header and the upper- layer header. o The Upper-Layer Packet Length in the pseudo-header is the length of the upper-layer header and data (e.g., TCP header plus TCP data). Some upper-layer protocols carry their own length information (e.g., the Length field in the UDP header); for such protocols, that is the length used in the pseudo- header. Other protocols (such as TCP) do not carry their own length information, in which case the length used in the pseudo-header is the Payload Length from the IPv6 header, minus the length of any extension headers present between the IPv6 header and the upper-layer header. o Unlike IPv4, when UDP packets are originated by an IPv6 node, the UDP checksum is not optional. That is, whenever originating a UDP packet, an IPv6 node must compute a UDP checksum over the packet and the pseudo-header, and, if that computation yields a result of zero, it must be changed to hex FFFF for placement in the UDP header. IPv6 receivers must discard UDP packets containing a zero checksum, and should log the error. The IPv6 version of ICMP [ICMPv6] includes the above pseudo-header in its checksum computation; this is a change from the IPv4 version of ICMP, which does not include a pseudo-header in its checksum. The reason for the change is to protect ICMP from misdelivery or corruption of those fields of the IPv6 header on which it depends, which, unlike IPv4, are not covered by an internet-layer checksum. The Next Header field in the pseudo-header for ICMP contains the value 58, which identifies the IPv6 version of ICMP. From lencse at hit.bme.hu Fri Aug 8 20:56:33 2025 From: lencse at hit.bme.hu (=?UTF-8?Q?G=C3=A1bor_LENCSE?=) Date: Fri, 8 Aug 2025 20:56:33 +0200 Subject: How to calculate ICMPv6 checksum? In-Reply-To: <20250807105703.22de669d@hermes.local> References: <40938de8-49b3-46d0-964b-9cd296000d10@hit.bme.hu> <20250807105703.22de669d@hermes.local> Message-ID: Dear Stephen, Thank you very much for your answer. It helps me a lot, but I have further questions. Please see my comments inline. > The pseudo-header part is different. If I understand it correctly, then it means that I need to write the ICMPv6 checksum function myself. To that end, I reviewed the source code of the "rte_ipv6_udptcp_cksum()" function so that I can learn from it. However, I did not find where it differs from the one that I need. I took the below source code from here: https://doc.dpdk.org/api/rte__ip6_8h_source.html#l00610 rte_ipv6_udptcp_cksum(const struct rte_ipv6_hdr *ipv6_hdr, const void *l4_hdr) { uint16_t cksum = __rte_ipv6_udptcp_cksum(ipv6_hdr, l4_hdr); cksum = ~cksum; /* * Per RFC 768: If the computed checksum is zero for UDP, * it is transmitted as all ones * (the equivalent in one's complement arithmetic). */ if (cksum == 0 && ipv6_hdr->proto == IPPROTO_UDP) cksum = 0xffff; return cksum; } It is the highest level. It calls an internal function and at the end it considers the protocol number (with other words, the next header field of the IPv6 header) when it handles UDP specific things, thus I think that this time it does not cause any problem in the case of ICMPv6. This is the source code of the internal function: static inline uint16_t __rte_ipv6_udptcp_cksum(const struct rte_ipv6_hdr *ipv6_hdr, const void *l4_hdr) { ??? uint32_t cksum; ??? uint32_t l4_len; ??? l4_len = rte_be_to_cpu_16(ipv6_hdr->payload_len); ??? cksum = rte_raw_cksum(l4_hdr, l4_len); ??? cksum += rte_ipv6_phdr_cksum(ipv6_hdr, 0); ??? cksum = ((cksum & 0xffff0000) >> 16) + (cksum & 0xffff); ??? return (uint16_t)cksum; } It calculates the checksum for the L4 part and also for the pseudo-header separately. The latter could be different than what I need for ICMPv6. I also checked the source code of? "rte_ipv6_phdr_cksum(ipv6_hdr, 0)", please see it below the figure from RFC 2460. > https://www.rfc-editor.org/rfc/rfc4443 > > 2.3. Message Checksum Calculation > > The checksum is the 16-bit one's complement of the one's complement > sum of the entire ICMPv6 message, starting with the ICMPv6 message > type field, and prepended with a "pseudo-header" of IPv6 header > fields, as specified in [IPv6, Section 8.1]. The Next Header value > used in the pseudo-header is 58. (The inclusion of a pseudo-header > in the ICMPv6 checksum is a change from IPv4; see [IPv6] for the > rationale for this change.) > > For computing the checksum, the checksum field is first set to zero. > > https://www.rfc-editor.org/rfc/rfc2460#section-8.1 > > 8.1 Upper-Layer Checksums > > Any transport or other upper-layer protocol that includes the > addresses from the IP header in its checksum computation must be > modified for use over IPv6, to include the 128-bit IPv6 addresses > instead of 32-bit IPv4 addresses. In particular, the following > illustration shows the TCP and UDP "pseudo-header" for IPv6: > > +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ > | | > + + > | | > + Source Address + > | | > + + > | | > +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ > | | > + + > | | > + Destination Address + > | | > + + > | | > +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ > | Upper-Layer Packet Length | > +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ > | zero | Next Header | > +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ So this is what I need. And it seems to me, that the below source code does exactly the same: static inline uint16_t rte_ipv6_phdr_cksum(const struct rte_ipv6_hdr *ipv6_hdr, uint64_t ol_flags) { ??? uint32_t sum; ??? struct { ??????? rte_be32_t len;?? /* L4 length. */ ??????? rte_be32_t proto; /* L4 protocol - top 3 bytes must be zero */ ??? } psd_hdr; ??? psd_hdr.proto = (uint32_t)(ipv6_hdr->proto << 24); ??? if (ol_flags & (RTE_MBUF_F_TX_TCP_SEG | RTE_MBUF_F_TX_UDP_SEG)) ??????? psd_hdr.len = 0; ??? else ??????? psd_hdr.len = ipv6_hdr->payload_len; ??? sum = __rte_raw_cksum(&ipv6_hdr->src_addr, ??????? sizeof(ipv6_hdr->src_addr) + sizeof(ipv6_hdr->dst_addr), ??????? 0); ??? sum = __rte_raw_cksum(&psd_hdr, sizeof(psd_hdr), sum); ??? return __rte_raw_cksum_reduce(sum); } As required, it handles length field on 32 bits, and shifts the protocol field (containing the value of 58) to the left by 24 bit, which means the same as the "next header" field is at the topmost 8 bits of a 32 bit number in the drawing. Then it does a "trick" that it uses the source and destination IPv6 addresses from the IPv6 packet (likely to spare their copying). Thus, I did not find anything what I would need to do differently. However, on the other hand, _there should be something_, because I tried using the "rte_ipv6_udptcp_cksum()" function (of course, I set the checksum field to 0 before using it), but Wireshark said that the checksum was incorrect. Both tshark and Wireshark decodes my NA message perfectly, but the Linux kernel of the device under test does not accept it, this is why it sends further NS messages. This is a tshark capture on the device under test: root at dut:~# tshark -i eno1 Running as user "root" and group "root". This could be dangerous. Capturing on 'eno1' ? ? 1 0.000000000 fe80::baca:3aff:fe5e:25a8 ? ff02::16? ? ?ICMPv6 170 Multicast Listener Report Message v2 ? ? 2 0.379986848 fe80::baca:3aff:fe5e:25a8 ? ff02::16? ? ?ICMPv6 170 Multicast Listener Report Message v2 ? ? 3 4.156047617? ? 2001:2::2 ? 2001:2:0:8000::2 UDP 80 58488 ? 27971 Len=18 ? ? 4 4.156066982 fe80::baca:3aff:fe5e:25a8 ? ff02::1:ff00:2 ICMPv6 86 Neighbor Solicitation for 2001:2::2 from b8:ca:3a:5e:25:a8 ? ? 5 4.156092949? ? 2001:2::2 ? fe80::baca:3aff:fe5e:25a8 ICMPv6 86 Neighbor Advertisement 2001:2::2 (ovr) is at 24:6e:96:3c:3f:40 ? ? 6 5.183987802 fe80::baca:3aff:fe5e:25a8 ? ff02::1:ff00:2 ICMPv6 86 Neighbor Solicitation for 2001:2::2 from b8:ca:3a:5e:25:a8 ? ? 7 5.184007499? ? 2001:2::2 ? fe80::baca:3aff:fe5e:25a8 ICMPv6 86 Neighbor Advertisement 2001:2::2 (ovr) is at 24:6e:96:3c:3f:40 ? ? 8 6.203987286 fe80::baca:3aff:fe5e:25a8 ? ff02::1:ff00:2 ICMPv6 86 Neighbor Solicitation for 2001:2::2 from b8:ca:3a:5e:25:a8 ? ? 9 6.204007429? ? 2001:2::2 ? fe80::baca:3aff:fe5e:25a8 ICMPv6 86 Neighbor Advertisement 2001:2::2 (ovr) is at 24:6e:96:3c:3f:40 ? ?10 7.232005250? ? 2001:2::1 ? ff02::1:ff00:2 ICMPv6 86 Neighbor Solicitation for 2001:2::2 from b8:ca:3a:5e:25:a8 ? ?11 8.251987771 fe80::baca:3aff:fe5e:25a8 ? ff02::1:ff00:2 ICMPv6 86 Neighbor Solicitation for 2001:2::2 from b8:ca:3a:5e:25:a8 ? ?12 9.275986860 fe80::baca:3aff:fe5e:25a8 ? ff02::1:ff00:2 ICMPv6 86 Neighbor Solicitation for 2001:2::2 from b8:ca:3a:5e:25:a8 And Wireshark says: "Checksum: 0x1baf incorrect, should be 0x035d". Could you please advise me, what I could overlook? Best regards, G?bor -------------- next part -------------- An HTML attachment was scrubbed... URL: From stephen at networkplumber.org Mon Aug 11 17:13:54 2025 From: stephen at networkplumber.org (Stephen Hemminger) Date: Mon, 11 Aug 2025 08:13:54 -0700 Subject: All non-dpdk application threads assigned to the same core after calling rte_eal_init() In-Reply-To: References: Message-ID: <20250811081332.4d25980c@hermes.local> On Mon, 28 Jul 2025 09:57:28 -0500 Daniel May wrote: > Hello, > > I have a C++ application which uses DPDK to receive packets over two 100 > Gbps interfaces from an FPGA device. > > I've isolated cores 0-2. DPDK is using cores 0-2, main lcore on 0, and two > receive threads on cores 1 and 2. The receive threads read packets from > mbufs and move the packets to a couple of circular buffers. This works as > expected. > > The application starts another set of threads (worker threads) to read from > the circular buffers and handle the data. However, these threads are all > being assigned to core 0, instead of cores 3-n. If you are creating the threads from another thread, the new thread inherits the affinity mask of the parent. In you example, it looks like you are spawning the threads from the main lcore (0). > If I start the worker threads first, before calling rte_eal_init(), they > get assigned to non isolated cores as expected. However, any thread I > start from the main thread after calling rte_eal_init() gets assigned to > the main lcore core (0). I can set affinity manually, but I'd rather the > kernel scheduler do its thing. > > Is there something I need to do to hand control back to the kernel > scheduler for assigning threads to cores after initializing DPDK? The way to fix is to call set affinity to tell kernel what you want. From ivan.malov at arknetworks.am Mon Aug 11 18:08:38 2025 From: ivan.malov at arknetworks.am (Ivan Malov) Date: Mon, 11 Aug 2025 20:08:38 +0400 (+04) Subject: [DPDK 24.11.3-rc1] rte_flow_async_create() stucks in while loop (infinite loop) In-Reply-To: References: Message-ID: <9c57e90b-4216-cc15-6ff3-b8ed8cd322d5@arknetworks.am> Hi, On Mon, 28 Jul 2025, ??? wrote: > Hello commit authors (and maintainers), > > I'm currently working with rte_flow_async_create() using the postpone flag, along with rte_flow_push/pull() for batching, in a scenario involving thousands of flows on a BlueField-2 > system. > > My goal is to implement hardware steering such that ingress traffic bypasses the ARM core of the BF2, and egress traffic does the same. > > According to the DPDK documentation, rte_flow_push/pull() seems to be intended for use as a batch operation, wrapping a large for loop that issues multiple flow operations, and then > committing them to hardware in one go. > > However, I?ve observed that when multiple cores simultaneously insert flow rules, using rte_flow_push/pull() in such a batched way can result in the rule insertion operations not being > properly transmitted to the hardware. Specifically, the internal function mlx5dr_send_all_dep_wqe() ends up getting stuck in its while loop. > > Interestingly, if I call rte_flow_push/pull() after each individual rte_flow_async_create() operation, even though that usage seems contrary to the intended batching model, the infinite > loop issue is significantly mitigated. The frequency of getting stuck in mlx5dr_send_all_dep_wqe() drops drastically?though it still occurs occasionally. > > In summary, calling rte_flow_push/pull() after each rte_flow_async_create() seems to avoid the infinite loop, but I?m unsure if this is an expected usage pattern. I would like to ask: > > * > > Is this behavior intentional? > > * > > Am I misunderstanding the design or usage expectations for rte_flow_push/pull() in multi-core scenarios? > Perhaps my question is a bit out of place and wrong, but, given the fact there are no code snippets to take a look at, are you using separate flow queues for submitting the operations, one flow queue per lcore? Thank you. > Thank you for your time and support. > > Sincerely, > Seongjong Bae?M.S. Student?T-Networking Lab. > [AIorK4yCWXBmHrQ1GGSZ1Kc18irHfB1S9x_FqTeAHsxNIdnf_olG-PRjFVlItUw34zr1tnNwkP5AlPTomK87] > Email > sjbae1999 at gmail.com > Mobile > (+82)01089640524 > Web. > https://tnet.snu.ac.kr/ > [a81b6766e3d5b6518dc4010493c7772f5a46f598.png?u=11013800] > > From bingz at nvidia.com Tue Aug 12 10:30:49 2025 From: bingz at nvidia.com (Bing Zhao) Date: Tue, 12 Aug 2025 08:30:49 +0000 Subject: [DPDK 24.11.3-rc1] rte_flow_async_create() stucks in while loop (infinite loop) In-Reply-To: <9c57e90b-4216-cc15-6ff3-b8ed8cd322d5@arknetworks.am> References: <9c57e90b-4216-cc15-6ff3-b8ed8cd322d5@arknetworks.am> Message-ID: @Ivan Malov, which version of DPDK are you using? The last year RC? @Erez Shitrit, could you help to confirm if the GCC loop expansion bug of some arm compiler is also present in this branch? I remember there was a GCC bug to always compare with 1 and jump into an infinite loop. Thanks > -----Original Message----- > From: Ivan Malov > Sent: Tuesday, August 12, 2025 12:09 AM > To: ??? > Cc: users at dpdk.org; Dariusz Sosnowski ; Slava > Ovsiienko ; Bing Zhao ; Ori Kam > ; Suanming Mou ; Matan Azrad > > Subject: Re: [DPDK 24.11.3-rc1] rte_flow_async_create() stucks in while > loop (infinite loop) > > External email: Use caution opening links or attachments > > > Hi, > > On Mon, 28 Jul 2025, ??? wrote: > > > Hello commit authors (and maintainers), > > > > I'm currently working with rte_flow_async_create() using the postpone > > flag, along with rte_flow_push/pull() for batching, in a scenario > involving thousands of flows on a BlueField-2 system. > > > > My goal is to implement hardware steering such that ingress traffic > bypasses the ARM core of the BF2, and egress traffic does the same. > > > > According to the DPDK documentation, rte_flow_push/pull() seems to be > > intended for use as a batch operation, wrapping a large for loop that > issues multiple flow operations, and then committing them to hardware in > one go. > > > > However, I?ve observed that when multiple cores simultaneously insert > > flow rules, using rte_flow_push/pull() in such a batched way can result > in the rule insertion operations not being properly transmitted to the > hardware. Specifically, the internal function mlx5dr_send_all_dep_wqe() > ends up getting stuck in its while loop. > > > > Interestingly, if I call rte_flow_push/pull() after each individual > > rte_flow_async_create() operation, even though that usage seems contrary > to the intended batching model, the infinite loop issue is significantly > mitigated. The frequency of getting stuck in mlx5dr_send_all_dep_wqe() > drops drastically?though it still occurs occasionally. > > > > In summary, calling rte_flow_push/pull() after each > rte_flow_async_create() seems to avoid the infinite loop, but I?m unsure > if this is an expected usage pattern. I would like to ask: > > > > * > > > > Is this behavior intentional? > > > > * > > > > Am I misunderstanding the design or usage expectations for > rte_flow_push/pull() in multi-core scenarios? > > > > Perhaps my question is a bit out of place and wrong, but, given the fact > there are no code snippets to take a look at, are you using separate flow > queues for submitting the operations, one flow queue per lcore? > > Thank you. > > > Thank you for your time and support. > > > > Sincerely, > > Seongjong Bae M.S. Student T-Networking Lab. > > [AIorK4yCWXBmHrQ1GGSZ1Kc18irHfB1S9x_FqTeAHsxNIdnf_olG-PRjFVlItUw34zr1t > > nNwkP5AlPTomK87] > > Email > > sjbae1999 at gmail.com > > Mobile > > (+82)01089640524 > > Web. > > https://tnet.snu.ac.kr/ > > [a81b6766e3d5b6518dc4010493c7772f5a46f598.png?u=11013800] > > > > From stephen at networkplumber.org Wed Aug 13 01:57:06 2025 From: stephen at networkplumber.org (Stephen Hemminger) Date: Tue, 12 Aug 2025 16:57:06 -0700 Subject: How to calculate ICMPv6 checksum? In-Reply-To: References: <40938de8-49b3-46d0-964b-9cd296000d10@hit.bme.hu> <20250807105703.22de669d@hermes.local> Message-ID: <20250812165706.52ac3b50@hermes.local> On Fri, 8 Aug 2025 20:56:33 +0200 G?bor LENCSE wrote: > Dear Stephen, > > Thank you very much for your answer. It helps me a lot, but I have > further questions. Please see my comments inline. > > The pseudo-header part is different. > If I understand it correctly, then it means that I need to write the > ICMPv6 checksum function myself. To that end, I reviewed the source code > of the "rte_ipv6_udptcp_cksum()" function so that I can learn from it. > However, I did not find where it differs from the one that I need. I > took the below source code from here: > https://doc.dpdk.org/api/rte__ip6_8h_source.html#l00610 > rte_ipv6_udptcp_cksum(const struct rte_ipv6_hdr *ipv6_hdr, const void > *l4_hdr) { uint16_t cksum = __rte_ipv6_udptcp_cksum(ipv6_hdr, l4_hdr); > cksum = ~cksum; /* * Per RFC 768: If the computed checksum is zero for > UDP, * it is transmitted as all ones * (the equivalent in one's > complement arithmetic). */ if (cksum == 0 && ipv6_hdr->proto == > IPPROTO_UDP) cksum = 0xffff; return cksum; } It is the highest level. It > calls an internal function and at the end it considers the protocol > number (with other words, the next header field of the IPv6 header) when > it handles UDP specific things, thus I think that this time it does not > cause any problem in the case of ICMPv6. > > This is the source code of the internal function: > > static inline uint16_t > __rte_ipv6_udptcp_cksum(const struct rte_ipv6_hdr *ipv6_hdr, const void > *l4_hdr) > { > ??? uint32_t cksum; > ??? uint32_t l4_len; > > ??? l4_len = rte_be_to_cpu_16(ipv6_hdr->payload_len); > > ??? cksum = rte_raw_cksum(l4_hdr, l4_len); > ??? cksum += rte_ipv6_phdr_cksum(ipv6_hdr, 0); > > ??? cksum = ((cksum & 0xffff0000) >> 16) + (cksum & 0xffff); > > ??? return (uint16_t)cksum; > } Yes this is similar but in UDP/TCP case the UDP/TCP header is included in the checksum. l4_hdr points to the UDP/TCP header. l4_len is the payload length that is TCP/UDP header and the associated data. The pseudo header is done by rte_ipv6_phdr_cksum(). For ICMPv6 you would need to point l4_hdr at ICMP header. Even though ICMP is not really an L4 protocol. https://en.wikipedia.org/wiki/ICMPv6#Checksum From lencse at hit.bme.hu Thu Aug 14 16:43:55 2025 From: lencse at hit.bme.hu (=?UTF-8?Q?G=C3=A1bor_LENCSE?=) Date: Thu, 14 Aug 2025 16:43:55 +0200 Subject: Solved! :-) -- Re: How to calculate ICMPv6 checksum? In-Reply-To: <20250812165706.52ac3b50@hermes.local> References: <40938de8-49b3-46d0-964b-9cd296000d10@hit.bme.hu> <20250807105703.22de669d@hermes.local> <20250812165706.52ac3b50@hermes.local> Message-ID: <691b98f2-6dc6-47cd-a5a0-437f9858d115@hit.bme.hu> Dear Stephen, On 8/13/2025 1:57 AM, Stephen Hemminger wrote: [...] > Yes this is similar but in UDP/TCP case the UDP/TCP header is included in > the checksum. l4_hdr points to the UDP/TCP header. l4_len is the payload > length that is TCP/UDP header and the associated data. Yes, when I tried using?the rte_ipv6_udptcp_cksum() function, I supplied a pointer to the ICMPv6 header as the second argument. My code line was: reply_icmpv6_hdr->checksum=rte_ipv6_udptcp_cksum(reply_ipv6_hdr,reply_icmpv6_hdr); And the internal function reads out the payload length from the IPv6 header as follows: ??? l4_len = rte_be_to_cpu_16(ipv6_hdr->payload_len); This is also correct. > The pseudo header is done by rte_ipv6_phdr_cksum(). > > For ICMPv6 you would need to point l4_hdr at ICMP header. > Even though ICMP is not really an L4 protocol. > > https://en.wikipedia.org/wiki/ICMPv6#Checksum > Yes, I checked the drawing, this is the same as https://www.rfc-editor.org/rfc/rfc2460#section-8.1 and the code calculates exactly the same (with some trick, as I mentioned earlier). *And the calculated checksum is CORRECT! :-)* It turned out, that I had made a programming error. (My calculation of the address of the checksum field?was incorrect, and thus I manipulated a wrong field.) Anyway, thank you very much for all your help! I learnt a lot from checking how ICMPv6 checksum is calculated. :-) And I hope that it will be useful information for others that the rte_ipv6_udptcp_cksum() function is perfectly suitable for calculating ICMPv6 checksum, too. :-) Best regards, G?bor -------------- next part -------------- An HTML attachment was scrubbed... URL: From sjbae1999 at gmail.com Wed Aug 13 05:45:42 2025 From: sjbae1999 at gmail.com (=?UTF-8?B?67Cw7ISx7KKF?=) Date: Wed, 13 Aug 2025 12:45:42 +0900 Subject: [DPDK 24.11.3-rc1] rte_flow_async_create() stucks in while loop (infinite loop) In-Reply-To: References: <9c57e90b-4216-cc15-6ff3-b8ed8cd322d5@arknetworks.am> Message-ID: Hello, @Ivan Malov, I use one flow queue per lcore. Sincerely, *Seongjong Bae *M.S. Student T-Networking Lab. *Email.* sjbae1999 at gmail.com *Mobile.* (+82)01089640524 *Web.* https://tnet.snu.ac.kr/ 2025? 8? 12? (?) ?? 5:30, Bing Zhao ?? ??: > @Ivan Malov, which version of DPDK are you using? The last year RC? > > @Erez Shitrit, could you help to confirm if the GCC loop expansion bug of > some arm compiler is also present in this branch? > I remember there was a GCC bug to always compare with 1 and jump into an > infinite loop. > > Thanks > > > -----Original Message----- > > From: Ivan Malov > > Sent: Tuesday, August 12, 2025 12:09 AM > > To: ??? > > Cc: users at dpdk.org; Dariusz Sosnowski ; Slava > > Ovsiienko ; Bing Zhao ; Ori > Kam > > ; Suanming Mou ; Matan Azrad > > > > Subject: Re: [DPDK 24.11.3-rc1] rte_flow_async_create() stucks in while > > loop (infinite loop) > > > > External email: Use caution opening links or attachments > > > > > > Hi, > > > > On Mon, 28 Jul 2025, ??? wrote: > > > > > Hello commit authors (and maintainers), > > > > > > I'm currently working with rte_flow_async_create() using the postpone > > > flag, along with rte_flow_push/pull() for batching, in a scenario > > involving thousands of flows on a BlueField-2 system. > > > > > > My goal is to implement hardware steering such that ingress traffic > > bypasses the ARM core of the BF2, and egress traffic does the same. > > > > > > According to the DPDK documentation, rte_flow_push/pull() seems to be > > > intended for use as a batch operation, wrapping a large for loop that > > issues multiple flow operations, and then committing them to hardware in > > one go. > > > > > > However, I?ve observed that when multiple cores simultaneously insert > > > flow rules, using rte_flow_push/pull() in such a batched way can result > > in the rule insertion operations not being properly transmitted to the > > hardware. Specifically, the internal function mlx5dr_send_all_dep_wqe() > > ends up getting stuck in its while loop. > > > > > > Interestingly, if I call rte_flow_push/pull() after each individual > > > rte_flow_async_create() operation, even though that usage seems > contrary > > to the intended batching model, the infinite loop issue is significantly > > mitigated. The frequency of getting stuck in mlx5dr_send_all_dep_wqe() > > drops drastically?though it still occurs occasionally. > > > > > > In summary, calling rte_flow_push/pull() after each > > rte_flow_async_create() seems to avoid the infinite loop, but I?m unsure > > if this is an expected usage pattern. I would like to ask: > > > > > > * > > > > > > Is this behavior intentional? > > > > > > * > > > > > > Am I misunderstanding the design or usage expectations for > > rte_flow_push/pull() in multi-core scenarios? > > > > > > > Perhaps my question is a bit out of place and wrong, but, given the fact > > there are no code snippets to take a look at, are you using separate flow > > queues for submitting the operations, one flow queue per lcore? > > > > Thank you. > > > > > Thank you for your time and support. > > > > > > Sincerely, > > > Seongjong Bae M.S. Student T-Networking Lab. > > > [AIorK4yCWXBmHrQ1GGSZ1Kc18irHfB1S9x_FqTeAHsxNIdnf_olG-PRjFVlItUw34zr1t > > > nNwkP5AlPTomK87] > > > Email > > > sjbae1999 at gmail.com > > > Mobile > > > (+82)01089640524 > > > Web. > > > https://tnet.snu.ac.kr/ > > > [a81b6766e3d5b6518dc4010493c7772f5a46f598.png?u=11013800] > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From samjhanatandukar46 at gmail.com Sat Aug 16 15:35:47 2025 From: samjhanatandukar46 at gmail.com (Samjhana Tandukar) Date: Sat, 16 Aug 2025 19:20:47 +0545 Subject: [dpdk-users] [dpdk-dev] VMXNET3 EAL : Requested device XXX cannot be used Message-ID: Bitu jbj -------------- next part -------------- An HTML attachment was scrubbed... URL: From viettd at dft.vn Sat Aug 23 03:16:16 2025 From: viettd at dft.vn (=?utf-8?B?VOG7kW5nIMSQ4bupYyBWaeG7h3Q=?=) Date: Sat, 23 Aug 2025 01:16:16 +0000 Subject: [DPDK 21.11 + Intel E810 DDP Comms] Issue with creating flow rules for TCP 5060 and SCTP 3868 Message-ID: Hi DPDK community, I am testing rte_flow rules on DPDK 21.11 with Intel E810 NIC using the DDP Comms package. My goal is to classify SIP and Diameter traffic by L4 ports (TCP 5060 and SCTP 3868). However, I see inconsistent behavior when creating flow rules with testpmd. 1. TCP 5060 - If I create a rule matching TCP src port 5060 and another rule matching TCP dst port 5060: flow create 0 ingress pattern eth / ipv4 / tcp src is 5060 / end \ actions mark id 1 / end flow create 0 ingress pattern eth / ipv4 / tcp dst is 5060 / end \ actions mark id 2 / end ? The second rule fails if I use "mark" action (seems overlapping). - If I switch to queue-based action: actions queue index N / end ? Both rules can coexist and work correctly. 2. SCTP 3868 - I cannot create either rule (src or dst). Example: flow create 0 ingress pattern eth / ipv4 / sctp src is 3868 / end \ actions mark id 3 / end flow create 0 ingress pattern eth / ipv4 / sctp dst is 3868 / end \ actions mark id 4 / end ? Both commands fail with "Invalid argument". Even "flow validate" shows not supported. So it looks like: - For TCP, port matching works partially depending on action type (mark vs queue). - For SCTP, port field matching is not supported at all, even with queue action. Questions: - Is this a known limitation of Intel E810 with the DDP Comms package? - Is SCTP src/dst port matching expected to be supported, or should I handle this in software (by sending all SCTP traffic to one queue)? - Is there any updated DDP package or workaround (e.g., raw pattern) for SCTP 3868 filtering? Thanks a lot for your support. Best regards, Viet -------------- next part -------------- An HTML attachment was scrubbed... URL: From julien.marcin.tech at gmail.com Sat Aug 23 17:38:03 2025 From: julien.marcin.tech at gmail.com (Julien) Date: Sat, 23 Aug 2025 17:38:03 +0200 Subject: [mlx5 driver] Usage of mlx5 with unpriviledged LXC container Message-ID: Hello, I have a question about using the mlx5 driver with LXC. I'm trying to use dpdk-testpmd in an LXC container whose root user isn't mapped to the host's root user. Note: The entire physical interface is given to the LXC container, not a virtual interface. The following error occured: mlx5_common: DevX create TIS failed errno=22 status=0 syndrome=0 mlx5_net: Failed to create TIS 0/0 for [bonding] device mlx5_2. mlx5_net: TIS allocation failure mlx5_net: probe of PCI device 0000:27:00.0 aborted after encountering an error: Cannot allocate memory mlx5_common: Failed to load driver mlx5_eth EAL: Requested device 0000:27:00.0 cannot be used EAL: Bus (pci) probe failed. The "transport_domain" is created, and the mlx5_devx_cmd_create_td() function runs normally. The call to mlx5dv_devx_obj_create() receives an errno of 22. I don't encounter any problems when the container's root user is mapped to the host's root user. Has anyone experienced this before? Is it possible to use the driver in an unprivileged LXC container? Dpdk version: 23.11 Linux Kernel: 5.15 Stack: libibverbs.so.1!execute_ioctl(struct ibv_context * context, struct ibv_context * context at entry, struct ibv_command_buffer * cmd, struct ibv_command_buffer * cmd at entry) (\rdma-core-49.0\libibverbs\cmd_ioctl.c:147) libmlx5.so.1!_mlx5dv_devx_obj_create(struct ibv_context * context, const void * in, size_t inlen, void * out, size_t outlen) (\rdma-core-49.0\providers\mlx5\verbs.c:5794) libmlx5.so.1!mlx5dv_devx_obj_create(struct ibv_context * context, const void * in, size_t inlen, void * out, size_t outlen) (\rdma-core-49.0\providers\mlx5\verbs.c:5819) librte_common_mlx5.so.24!mlx5_glue_devx_obj_create(struct ibv_context * ctx, const void * in, size_t inlen, void * out, size_t outlen) (\dpdk-23.11\drivers\common\mlx5\linux\mlx5_glue.c:1045) librte_common_mlx5.so.24!mlx5_devx_cmd_create_tis(void * ctx, struct mlx5_devx_tis_attr * tis_attr) (\dpdk-23.11\drivers\common\mlx5\mlx5_devx_cmds.c:2037) librte_net_mlx5.so!mlx5_setup_tis(struct mlx5_dev_ctx_shared * sh) (\dpdk-23.11\drivers\net\mlx5\mlx5.c:1343) librte_net_mlx5.so!mlx5_alloc_shared_dev_ctx(const struct mlx5_dev_spawn_data * spawn, struct mlx5_kvargs_ctrl * mkvlist) (\dpdk-23.11\drivers\net\mlx5\mlx5.c:1784) librte_net_mlx5.so!mlx5_dev_spawn(struct rte_device * dpdk_dev, struct mlx5_dev_spawn_data * spawn, struct rte_eth_devargs * eth_da, struct mlx5_kvargs_ctrl * mkvlist) (\dpdk-23.11\drivers\net\mlx5\linux\mlx5_os.c:1169) librte_net_mlx5.so!mlx5_os_pci_probe_pf(struct mlx5_common_device * cdev, struct rte_eth_devargs * req_eth_da, uint16_t owner_id, struct mlx5_kvargs_ctrl * mkvlist) (\dpdk-23.11\drivers\net\mlx5\linux\mlx5_os.c:2648) librte_net_mlx5.so!mlx5_os_pci_probe(struct mlx5_common_device * cdev, struct mlx5_kvargs_ctrl * mkvlist) (\dpdk-23.11\drivers\net\mlx5\linux\mlx5_os.c:2797) librte_net_mlx5.so!mlx5_os_net_probe(struct mlx5_common_device * cdev, struct mlx5_kvargs_ctrl * mkvlist) (\dpdk-23.11\drivers\net\mlx5\linux\mlx5_os.c:2881) librte_common_mlx5.so.24!drivers_probe(struct mlx5_common_device * cdev, uint32_t user_classes, struct mlx5_kvargs_ctrl * mkvlist) (\dpdk-23.11\drivers\common\mlx5\mlx5_common.c:938) librte_common_mlx5.so.24!mlx5_common_dev_probe(struct rte_device * eal_dev) (\dpdk-23.11\drivers\common\mlx5\mlx5_common.c:1028) librte_common_mlx5.so.24!mlx5_common_pci_probe(struct rte_pci_driver * pci_drv, struct rte_pci_device * pci_dev) (\dpdk-23.11\drivers\common\mlx5\mlx5_common_pci.c:168) librte_bus_pci.so.24!rte_pci_probe_one_driver(struct rte_pci_driver * dr, struct rte_pci_device * dev) (\dpdk-23.11\drivers\bus\pci\pci_common.c:312) librte_bus_pci.so.24!pci_probe_all_drivers(struct rte_pci_device * dev) (\dpdk-23.11\drivers\bus\pci\pci_common.c:396) librte_bus_pci.so.24!pci_probe() (\dpdk-23.11\drivers\bus\pci\pci_common.c:423) librte_eal.so.24!rte_bus_probe() (\dpdk-23.11\lib\eal\common\eal_common_bus.c:78) librte_eal.so.24!rte_eal_init(int argc, char ** argv) (\dpdk-23.11\lib\eal\linux\eal.c:1287) Best regards, Julien Marcin -------------- next part -------------- An HTML attachment was scrubbed... URL: From david.marchand at redhat.com Mon Aug 25 14:24:11 2025 From: david.marchand at redhat.com (David Marchand) Date: Mon, 25 Aug 2025 14:24:11 +0200 Subject: [mlx5 driver] Usage of mlx5 with unpriviledged LXC container In-Reply-To: References: Message-ID: On Mon, 25 Aug 2025 at 14:09, Julien wrote: > > Hello, > I have a question about using the mlx5 driver with LXC. > I'm trying to use dpdk-testpmd in an LXC container whose root user isn't mapped to the host's root user. > Note: The entire physical interface is given to the LXC container, not a virtual interface. > > The following error occured: > mlx5_common: DevX create TIS failed errno=22 status=0 syndrome=0 > mlx5_net: Failed to create TIS 0/0 for [bonding] device mlx5_2. > mlx5_net: TIS allocation failure > mlx5_net: probe of PCI device 0000:27:00.0 aborted after encountering an error: Cannot allocate memory > mlx5_common: Failed to load driver mlx5_eth > EAL: Requested device 0000:27:00.0 cannot be used > EAL: Bus (pci) probe failed. > > The "transport_domain" is created, and the mlx5_devx_cmd_create_td() function runs normally. > The call to mlx5dv_devx_obj_create() receives an errno of 22. > > I don't encounter any problems when the container's root user is mapped to the host's root user. > Has anyone experienced this before? > Is it possible to use the driver in an unprivileged LXC container? There is probably something missing in terms of capabilities. I don't know how LXC behaves in this regard. I suggest you look at "5.5.1.5. Run as Non-Root" in https://doc.dpdk.org/guides/platform/mlx5.html. -- David Marchand From viettd at dft.vn Fri Aug 29 04:34:21 2025 From: viettd at dft.vn (=?utf-8?B?VOG7kW5nIMSQ4bupYyBWaeG7h3Q=?=) Date: Fri, 29 Aug 2025 02:34:21 +0000 Subject: [DPDK 21.11 + Intel E810 DDP Comms] Issue with creating flow rules for TCP 5060 and SCTP 3868 Message-ID: Hello I am testing rte_flow rules on DPDK 21.11 with Intel E810 NIC using the DDP Comms package. My goal is to classify SIP and Diameter traffic by L4 ports (TCP 5060 and SCTP 3868). However, I see inconsistent behavior when creating flow rules with testpmd. 1. TCP 5060 - If I create a rule matching TCP src port 5060 and another rule matching TCP dst port 5060: flow create 0 ingress pattern eth / ipv4 / tcp src is 5060 / end \ actions mark id 1 / end flow create 0 ingress pattern eth / ipv4 / tcp dst is 5060 / end \ actions mark id 2 / end ? The second rule fails if I use "mark" action (seems overlapping). - If I switch to queue-based action: actions queue index N / end ? Both rules can coexist and work correctly. 2. SCTP 3868 - I cannot create either rule (src or dst). Example: flow create 0 ingress pattern eth / ipv4 / sctp src is 3868 / end \ actions mark id 3 / end flow create 0 ingress pattern eth / ipv4 / sctp dst is 3868 / end \ actions mark id 4 / end ? Both commands fail with "Invalid argument". Even "flow validate" shows not supported. So it looks like: - For TCP, port matching works partially depending on action type (mark vs queue). - For SCTP, port field matching is not supported at all, even with queue action. Questions: - Is this a known limitation of Intel E810 with the DDP Comms package? - Is SCTP src/dst port matching expected to be supported, or should I handle this in software (by sending all SCTP traffic to one queue)? - Is there any updated DDP package or workaround (e.g., raw pattern) for SCTP 3868 filtering? Thanks a lot for your support. Best regards, Viet -------------- next part -------------- An HTML attachment was scrubbed... URL: