Issues around packet capture when secondary process is doing rx/tx

Ferruh Yigit ferruh.yigit at amd.com
Wed Apr 3 13:42:13 CEST 2024


On 1/8/2024 3:13 PM, Konstantin Ananyev wrote:
> 
> 
>> I have been looking at a problem reported by Sandesh
>> where packet capture does not work if rx/tx burst is done in secondary process.
>>
>> The root cause is that existing rx/tx callback model just doesn't work
>> unless the process doing the rx/tx burst calls is the same one that
>> registered the callbacks.
>>
>> An example sequence would be:
>> 	1. dumpcap (or pdump) as secondary tells pdump in primary to register callback
>> 	2. secondary process calls rx_burst.
>> 	3. rx_burst sees the callback but it has pointer pdump_rx which is not necessarily
>> 	   at same location in primary and secondary process.
>> 	4. indirect function call in secondary to bad location likely causes crash.
> 
> As I remember, RX/TX callbacks were never intended to work over multiple processes.
> Right now RX/TX callbacks are private for the process, different process simply should not
> see/execute them.
> I.E. it callbacks list is part of 'struct rte_eth_dev' itself, not the rte_eth_dev.data that is shared
> between processes.
> It should be normal, wehn for the same port/queue you will end-up with different list of callbacks
> for different processes.  
> So, unless I am missing something, I don't see how we can end-up with 3) and 4) from above:
> From my understanding secondary process will never see/call primary's callbacks.
> 

Ack. There should be another reason for crash.


> About pdump itself, it was a while when I looked at it last time, but as I remember to start it to work,
> server process has to call rte_pdump_init() which in terns register PDUMP_MP handler.
> I suppose for the secondary process to act as a 'pdump server' it needs to call rte_pdump_init() itself,
> though I am not sure such option is supported right now. 
>  

Currently testpmd calls 'rte_pdump_init()', and both primary testpmd and
secondary testpmd process calls this API and both register PDUMP_MP
handler, I think this is OK.

When pdump secondary process sends MP message, both primary testpmd and
secondary testpmd process should register callbacks with provided ring
and mempool information.

I don't know if both primary and secondary process callbacks running
simultaneously causing this problem, otherwise I expect it to work.

>>
>> Some possible workarounds.
>> 	1. Keep callback list per-process: messy, but won't crash. Capture won't work
>>            without other changes. In this primary would register callback, but secondaries
>>            would not use them in rx/tx burst.
>>
>> 	2. Replace use of rx/tx callback in pdump with change to rte_ethdev to have
>>            a capture flag. (i.e. don't use indirection).  Likely ABI problems.
>>            Basically, ignore the rx/tx callback mechanism. This is my preferred
>> 	   solution.
> 
> It is not only the capture flag, it is also what to do with the captured packets
> (copy? If yes, then where to? examine? drop?, do something else?).
> It is probably not the best choice to add all these things into ethdev API.
> 
>> 	3. Some fix up mechanism (in EAL mp support?) to have each process fixup
>>            its callback mechanism.
>  
> Probably the easiest way to fix that - pass to rte_pdump_enable() extra information
> that  would allow it to distinguish on what exact process (local, remote)
> we want to enable pdump functionality. Then it could act accordingly.
> 
>>
>> 	4. Do something in pdump_init to register the callback in same process context
>> 	   (probably need callbacks to be per-process). Would mean callback is always
>>            on independent of capture being enabled.
>>
>>         5. Get rid of indirect function call pointer, and replace it by index into
>>            a static table of callback functions. Every process would have same code
>>            (in this case pdump_rx) but at different address.  Requires all callbacks
>>            to be statically defined at build time.
> 
> Doesn't look like a good approach - it will break many things. 
>  
>> The existing rx/tx callback is not safe id rx/tx burst is called from different process
>> than where callback is registered.
>  
> 



More information about the dev mailing list