Error in rte_eal_init() when multiple PODs over single node of K8 cluster
    Bruce Richardson 
    bruce.richardson at intel.com
       
    Wed Mar 27 15:55:36 CET 2024
    
    
  
On Wed, Mar 27, 2024 at 12:42:55PM +0000, Avijit  Pandey wrote:
>    Hello Devs,
> 
> 
>    I hope this email finds you well.
> 
>    I am reaching out to seek assistance regarding an issue I am facing in
>    DPDK within my Kubernetes cluster.
> 
> 
>    I have deployed a Kubernetes cluster v1.26.0, and I am currently
>    running network testing through DPPD-PRoX ([1]commit/02425932) using
>    DPDK (v22.11.0). I have deployed 3 pairs of PODs (3 server pods and 3
>    client pods) on a single K8 node. The server generates and sends
>    traffic to the receiver pod.
> 
> 
>    During the automated testing, I encounter an error: "Error in
>    rte_eal_init()." This error occurs randomly, and I am unable to
>    determine the root cause. However, this issue does not occur when I use
>    a single pair of PODs (1 server pod and 1 client pod). The traffic is
>    sent and received through the sriov NICs.
> 
> 
<snip> 
>            With master core index 23, full core mask is 0x2800000
> 
>            EAL command line: /opt/samplevnf/VNFs/DPPD-PROX/build/prox
>    -c0x2800000 --main-lcore=23 -n4 --allow 0000:86:04.6
> 
>    error   Error in rte_eal_init()
> 
> 
Not sure what the problem is exactly, without a better error message. Can
you manage to provide the EAL output in the failure case, perhaps using
--log-level flag to up the log levels a bit higher if the error is not
clear from the default output.
Also, in case of running multiple instances of DPDK on a single system, I'd
generally recommend passing --in-memory flag to each instance to avoid
issues with conflicts over hugepage files. (This will disable support for
DPDK multi-process operation, so don't use the flag if that is a feature
you are using.)
/Bruce
PS: couple of other comments on your commandline that may be of interest,
since it's a little longer than it needs to be :-)
 - We'd generally recommend, for clarity, using "-l" flag rather than "-c"
   for passing core masks. In your case "-c 0x2800000" should be equivalent
   to the more comprehensible "-l 23,25".
 - DPDK always uses the lowest core number as the main lcore, so in the
   example above --main-lcore=23 should be superfluous and can be omitted
 - For mempool creation, -n 4 is the default in DPDK if unsupecified, so
   again that flag can be dropped without impact, unless something specific
   in the app depends on it in some other way.
 - If you want to shorten your allow list a little, the "0000:" can be
   dropped from the PCI address. So "--allow 0000:86:04.6" can be 
   "-a 86:04.6"
    
    
More information about the dev
mailing list