Enabling multiport eswitch (mlx5) breaks PF1 bifurcation immediately
Dariusz Sosnowski
dsosnowski at nvidia.com
Mon Jun 24 14:57:52 CEST 2024
> -----Original Message-----
> From: Guelce, Guevenc <guevenc.guelce at sap.com>
> Sent: Thursday, June 20, 2024 17:34
> To: Dariusz Sosnowski <dsosnowski at nvidia.com>; users at dpdk.org; Li, Tao
> <tao.li06 at sap.com>
> Subject: RE: Enabling multiport eswitch (mlx5) breaks PF1 bifurcation immediately
>
> Hi Dariusz,
>
> Thanks a lot for looking into this.
> I am attaching the infos you requested to this email. I reproduced the issue
> described below on another machine and that machine has two Nvidia cards and
> a newer ConnectX6 Firmware.
> The card I used for testing and reproducing is ConnectX6 on PCI address
> 0000:3b:00.0 and 0000:3b:00.1 I ran the commands I mentioned below in the
> email and PF1 traffic of this card to linux kernel was cut off.
>
> ----<test environment>----
> pci/0000:3b:00.0:
> driver mlx5_core
> versions:
> fixed:
> fw.psid MT_0000000359
> running:
> fw.version 22.41.1000
> fw 22.41.1000
> stored:
> fw.version 22.41.1000
> fw 22.41.1000
> auxiliary/mlx5_core.eth.0:
> driver mlx5_core.eth
> pci/0000:3b:00.1:
> driver mlx5_core
> versions:
> fixed:
> fw.psid MT_0000000359
> running:
> fw.version 22.41.1000
> fw 22.41.1000
> stored:
> fw.version 22.41.1000
> fw 22.41.1000
>
> Linux Kernel Version: 6.6.12
> ----</test environment>----
>
> We didn’t configure any LAG but we enabled this firmware setting
> "LAG_RESOURCE_ALLOCATION"
> as it is needed for multiport eswitch per documentation here:
> https://doc.dpdk.org/guides/nics/mlx5.html#id1
>
>
> Linux logs and sysfs / devlink outputs are on attach as a text file.
>
> Thanks & Regards,
>
> Guvenc Gulce
>
>
> -----Original Message-----
> From: Dariusz Sosnowski <dsosnowski at nvidia.com>
> Sent: Wednesday, 19 June 2024 20:13
> To: Guelce, Guevenc <guevenc.guelce at sap.com>; users at dpdk.org
> Subject: RE: Enabling multiport eswitch (mlx5) breaks PF1 bifurcation immediately
>
> Hi,
>
> > From: Guelce, Guevenc <guevenc.guelce at sap.com>
> > Sent: Friday, June 14, 2024 11:18
> > To: users at dpdk.org
> > Cc: Dariusz Sosnowski <dsosnowski at nvidia.com>
> > Subject: Enabling multiport eswitch (mlx5) breaks PF1 bifurcation
> > immediately Hi all, Hi Dariusz,
> >
> >
> > Thanks a lot for your help so far. We really appreciate it.
> > I just want to touch base with this question which was asked by my colleague
> Tao a while back.
> >
> > Our question is actually quite simple. Issuing the commands listed
> > below on a ConnectX-6 Dx Card breaks the bifurcated nature of the mlx5
> > driver in linux kernel for PF1. (No traffic is forwarded to linux
> > kernel anymore on PF1) You don’t need to start any testpmd or dpdk
> application. Just issuing the following commands below breaks the PF1 in linux
> kernel already.
> >
> > sudo devlink dev eswitch set pci/0000:8a:00.0 mode switchdev sudo
> > devlink dev eswitch set pci/0000:8a:00.1 mode switchdev sudo devlink
> > dev param set pci/0000:8a:00.0 name esw_multiport value true cmode
> > runtime sudo devlink dev param set pci/0000:8a:00.1 name esw_multiport
> > value true cmode runtime
> >
> >
> > ----<test environment>-----
> > pci/0000:8a:00.0:
> > driver mlx5_core
> > versions:
> > fixed:
> > fw.psid MT_0000000359
> > running:
> > fw.version 22.39.2048
> > fw 22.39.2048
> > Linux kernel version: 6.6.16
> > DPDK: 23.11 (But not really needed to reproduce the issue) ----</test
> > environment>------
> >
> >
> > This makes the eswitch multiport feature for us unusable. Could you please
> advise whether we are missing smt here ?
> > As we are really keen to use this feature.
>
> Could you please send us the following info? It would help with debugging the
> issue.
>
> - Despite the Multiport E-Switch configuration, do you configure any additional
> bonding?
> - Output of commands:
> - sudo devlink dev param show
> - for f in /sys/kernel/debug/mlx5/0000:8a:00.0/lag/*; do echo $f; cat $f; done
> - for f in /sys/kernel/debug/mlx5/0000:8a:00.1/lag/*; do echo $f; cat $f; done
> - Output of dmesg, ideally all logs since boot.
Thank you for all the logs.
From the logs and command outputs, it looks like the Multiport E-Switch was configured correctly.
All devlink configurations also seems correct, so receiving traffic on PF1 in kernel should be working.
This issue appears to be related to kernel driver, so it would require digging into what happens inside it.
Could you please reach out to upstream maintainers of mlx5 Linux kernel driver?
They would be able to assist you much better in that case.
Best regards,
Dariusz Sosnowski
More information about the users
mailing list