[dpdk-dev] Enhance KNI DPDK-app-side to be	Multi-Producer/Consumer
    Zhou, Danny 
    danny.zhou at intel.com
       
    Sat Nov 15 01:04:45 CET 2014
    
    
  
It will be always good if you can submit the RFC patch in terms of KNI optimization.
On the other hand, do you have any perf. data to prove that your patchset could improve
KNI performance which is the concern that most customers care about? We introduced
multiple-threaded KNI kernel support last year, if I remember correctly, the key perform
bottle-neck we found is the skb alloc/free and memcpy between skb and mbuf. Would be 
very happy if your patchset can approve I am wrong.
> -----Original Message-----
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Sanford, Robert
> Sent: Saturday, November 15, 2014 5:06 AM
> To: Thomas Monjalon; dev at dpdk.org
> Subject: [dpdk-dev] Enhance KNI DPDK-app-side to be Multi-Producer/Consumer
> 
> Hello Thomas,
> 
> I want to discuss a small enhancement to KNI that we developed. We wanted
> to send/receive mbufs between one KNI device and multiple cores, but we
> wanted to keep the changes simple. So, here were our requirements:
> 
> 1. Don't use heavy synchronization when reading/writing the FIFOs in
> shared memory.
> 2. Don't make any one core (data or control plane) perform too much work
> shuffling mbufs to/from the FIFOs.
> 3. Don't use an intermediate RTE ring to drop off mbufs when another core
> is already accessing the same KNI FIFO.
> 4. Since (for our private use case) we don't need MP/MC on the kernel
> side, don't change the kernel KNI code at all.
> 5. Don't change the request/reply logic. It stays single-threaded on both
> sides.
> 
> Here is what we came up with:
> 
> 1. Borrow heavily from the librte_ring implementation.
> 2. In librte_kni structure rte_kni, supplement each rte_kni_fifo (tx, rx,
> alloc, and free q) with another private, corresponding structure that
> contains a head, tail, mask, and size field.
> 3. Create kni_fifo_put_mp function with merged logic of kni_fifo_put and
> __rte_ring_mp_do_enqueue. After we update the tail index (which is private
> to the DPDK-app), we update the FIFO write index (shared between app and
> kernel).
> 4. Create kni_fifo_get_mc function with merged logic of kni_fifo_get and
> __rte_ring_mc_do_dequeue. After we update the tail index, update the FIFO
> read index.
> 5. In rte_kni_tx_burst and kni_alloc_mbufs, call kni_fifo_put_mp instead
> of kni_fifo_put.
> 6. In rte_kni_rx_burst and kni_free_bufs, call kni_fifo_get_mc instead of
> kni_fifo_get.
> 
> We believe this is a common use case, and thus would like to contribute it
> to dpdk.org.
> Questions/comments:
> 1. Are you interested for us to send the changes as an RFC?
> 2. Do you agree with this approach, or would it be better, say, to rewrite
> both sides of the interface to be more like librte_ring?
> 3. Perhaps we could improve upon kni_allocate_mbufs, as it blindly
> attempts to allocate and enqueue a constant number of mbufs. We have not
> really focused on the kernel ==> app data path, because we were only
> interested in app ==> kernel.
> 
> --
> Regards,
> Robert Sanford
    
    
More information about the dev
mailing list