[dpdk-dev] [Bug 389] Crash in librte_kni driver due to	noncontiguous pages
    bugzilla at dpdk.org 
    bugzilla at dpdk.org
       
    Mon Feb  3 14:32:04 CET 2020
    
    
  
https://bugs.dpdk.org/show_bug.cgi?id=389
            Bug ID: 389
           Summary: Crash in librte_kni driver due to noncontiguous pages
           Product: DPDK
           Version: 18.11
          Hardware: All
                OS: All
            Status: UNCONFIRMED
          Severity: critical
          Priority: Normal
         Component: core
          Assignee: dev at dpdk.org
          Reporter: scott_wasson at affirmednetworks.com
  Target Milestone: ---
We’re seeing a continuous crash since upgrading to 18.11, the kni FIFO’s
apparently aren’t contiguous.  From user-space’s perspective, the kni’s tx_q
straddles the 2MB page boundary at 0x17a600000.  The mbuf pointers in the ring
prior to this address are valid.  The tx_q’s write pointer is indicating there
are mbufs at 0x17a600000 and beyond, but the pointers are all NULL.
Because the rte_kni kernel module is loaded:
In eal.c:
      /* Workaround for KNI which requires physical address to work */
      if (iova_mode == RTE_IOVA_VA &&
          rte_eal_check_module("rte_kni") == 1) {
              if (phys_addrs) {
                 iova_mode = RTE_IOVA_PA;
Iova_mode is automatically forced to PA.
We determined that enabling --legacy-mem caused the problem to go away.  But
this caused the locations of the kni’s data structures to move, so they no
longer straddled a hugepages boundary.  Our concern is that the furniture may
move around again and bring us back to where we were.  Being tied to using
--legacy-mem is undesirable in the long-term, anyway.
We also found that the following code patch helps (even without --legacy-mem):
index 3d2ffb2..5cc9d69 100644
--- a/lib/librte_kni/rte_kni.c
+++ b/lib/librte_kni/rte_kni.c
@@ -143,31 +143,31 @@ kni_reserve_mz(struct rte_kni *kni)
        char mz_name[RTE_MEMZONE_NAMESIZE];
        snprintf(mz_name, RTE_MEMZONE_NAMESIZE, KNI_TX_Q_MZ_NAME_FMT,
kni->name);
-       kni->m_tx_q = rte_memzone_reserve(mz_name, KNI_FIFO_SIZE,
SOCKET_ID_ANY, 0);
+       kni->m_tx_q = rte_memzone_reserve(mz_name, KNI_FIFO_SIZE,
SOCKET_ID_ANY, RTE_MEMZONE_IOVA_CONTIG);
        KNI_MEM_CHECK(kni->m_tx_q == NULL, tx_q_fail);
        snprintf(mz_name, RTE_MEMZONE_NAMESIZE, KNI_RX_Q_MZ_NAME_FMT,
kni->name);
-       kni->m_rx_q = rte_memzone_reserve(mz_name, KNI_FIFO_SIZE,
SOCKET_ID_ANY, 0);
+       kni->m_rx_q = rte_memzone_reserve(mz_name, KNI_FIFO_SIZE,
SOCKET_ID_ANY, RTE_MEMZONE_IOVA_CONTIG);
        KNI_MEM_CHECK(kni->m_rx_q == NULL, rx_q_fail);
        snprintf(mz_name, RTE_MEMZONE_NAMESIZE, KNI_ALLOC_Q_MZ_NAME_FMT,
kni->name);
-       kni->m_alloc_q = rte_memzone_reserve(mz_name, KNI_FIFO_SIZE,
SOCKET_ID_ANY, 0);
+       kni->m_alloc_q = rte_memzone_reserve(mz_name, KNI_FIFO_SIZE,
SOCKET_ID_ANY, RTE_MEMZONE_IOVA_CONTIG);
        KNI_MEM_CHECK(kni->m_alloc_q == NULL, alloc_q_fail);
        snprintf(mz_name, RTE_MEMZONE_NAMESIZE, KNI_FREE_Q_MZ_NAME_FMT,
kni->name);
-       kni->m_free_q = rte_memzone_reserve(mz_name, KNI_FIFO_SIZE,
SOCKET_ID_ANY, 0);
+       kni->m_free_q = rte_memzone_reserve(mz_name, KNI_FIFO_SIZE,
SOCKET_ID_ANY, RTE_MEMZONE_IOVA_CONTIG);
        KNI_MEM_CHECK(kni->m_free_q == NULL, free_q_fail);
        snprintf(mz_name, RTE_MEMZONE_NAMESIZE, KNI_REQ_Q_MZ_NAME_FMT,
kni->name);
-       kni->m_req_q = rte_memzone_reserve(mz_name, KNI_FIFO_SIZE,
SOCKET_ID_ANY, 0);
+       kni->m_req_q = rte_memzone_reserve(mz_name, KNI_FIFO_SIZE,
SOCKET_ID_ANY, RTE_MEMZONE_IOVA_CONTIG);
        KNI_MEM_CHECK(kni->m_req_q == NULL, req_q_fail);
        snprintf(mz_name, RTE_MEMZONE_NAMESIZE, KNI_RESP_Q_MZ_NAME_FMT,
kni->name);
-       kni->m_resp_q = rte_memzone_reserve(mz_name, KNI_FIFO_SIZE,
SOCKET_ID_ANY, 0);
+       kni->m_resp_q = rte_memzone_reserve(mz_name, KNI_FIFO_SIZE,
SOCKET_ID_ANY, RTE_MEMZONE_IOVA_CONTIG);
        KNI_MEM_CHECK(kni->m_resp_q == NULL, resp_q_fail);
        snprintf(mz_name, RTE_MEMZONE_NAMESIZE, KNI_SYNC_ADDR_MZ_NAME_FMT,
kni->name);
-       kni->m_sync_addr = rte_memzone_reserve(mz_name, KNI_FIFO_SIZE,
SOCKET_ID_ANY, 0);
+       kni->m_sync_addr = rte_memzone_reserve(mz_name, KNI_FIFO_SIZE,
SOCKET_ID_ANY, RTE_MEMZONE_IOVA_CONTIG);
        KNI_MEM_CHECK(kni->m_sync_addr == NULL, sync_addr_fail);
        return 0;
I removed --legacy-mem, the tx_q still straddles the same 2MB page boundary,
yet now everything seems OK.
This would seem to follow precedent in rte_mempool.c:
    /* if we're trying to reserve contiguous memory, add appropriate
     * memzone flag.
     */
    if (try_contig)
        flags |= RTE_MEMZONE_IOVA_CONTIG;
which I think explains why our mbufs haven’t seen data truncation issues.
Could you please why RTE_MEMZONE_IOVA_CONTIG is necessary in PA mode?  Isn’t
contiguousness a fundamental property of physical addressing?
Are we still potentially vulnerable with --legacy-mem and without the above
code change?  Did we just get lucky because the furniture moved and doesn’t
straddle a page boundary at the moment?
We also tested with 19.11 and did not see the crash.  However the 19.11 release
notes say:
+* Changed mempool allocation behavior.
Changed the mempool allocation behaviour so that objects no longer cross pages
by default. Note, this may consume more memory when using small memory pages.
Thanks!
-Scott
-- 
You are receiving this mail because:
You are the assignee for the bug.
    
    
More information about the dev
mailing list