[dpdk-dev] [PATCH v3 2/3] eal: add synchronous multi-process communication

Burakov, Anatoly anatoly.burakov at intel.com
Thu Jan 25 19:02:57 CET 2018


On 25-Jan-18 5:10 PM, Tan, Jianfeng wrote:
> 
> 
> On 1/26/2018 12:22 AM, Burakov, Anatoly wrote:
>> On 25-Jan-18 3:03 PM, Ananyev, Konstantin wrote:
>>>
>>>
>>>> -----Original Message-----
>>>> From: Burakov, Anatoly
>>>> Sent: Thursday, January 25, 2018 1:10 PM
>>>> To: Ananyev, Konstantin <konstantin.ananyev at intel.com>; Tan, 
>>>> Jianfeng <jianfeng.tan at intel.com>; dev at dpdk.org
>>>> Cc: Richardson, Bruce <bruce.richardson at intel.com>; thomas at monjalon.net
>>>> Subject: Re: [dpdk-dev] [PATCH v3 2/3] eal: add synchronous 
>>>> multi-process communication
>>>>
>>>> On 25-Jan-18 1:05 PM, Burakov, Anatoly wrote:
>>>>> On 25-Jan-18 1:00 PM, Ananyev, Konstantin wrote:
>>>>>>
>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: Burakov, Anatoly
>>>>>>> Sent: Thursday, January 25, 2018 12:26 PM
>>>>>>> To: Ananyev, Konstantin <konstantin.ananyev at intel.com>; Tan, 
>>>>>>> Jianfeng
>>>>>>> <jianfeng.tan at intel.com>; dev at dpdk.org
>>>>>>> Cc: Richardson, Bruce <bruce.richardson at intel.com>; 
>>>>>>> thomas at monjalon.net
>>>>>>> Subject: Re: [PATCH v3 2/3] eal: add synchronous multi-process
>>>>>>> communication
>>>>>>>
>>>>>>> On 25-Jan-18 12:19 PM, Ananyev, Konstantin wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>> -----Original Message-----
>>>>>>>>> From: Burakov, Anatoly
>>>>>>>>> Sent: Thursday, January 25, 2018 12:00 PM
>>>>>>>>> To: Tan, Jianfeng <jianfeng.tan at intel.com>; dev at dpdk.org
>>>>>>>>> Cc: Richardson, Bruce <bruce.richardson at intel.com>; Ananyev,
>>>>>>>>> Konstantin <konstantin.ananyev at intel.com>; thomas at monjalon.net
>>>>>>>>> Subject: Re: [PATCH v3 2/3] eal: add synchronous multi-process
>>>>>>>>> communication
>>>>>>>>>
>>>>>>>>> On the overall patch,
>>>>>>>>>
>>>>>>>>> Reviewed-by: Anatoly Burakov <anatoly.burakov at intel.com>
>>>>>>>>>
>>>>>>>>> For request(), returning number of replies received actually makes
>>>>>>>>> sense, because now we get use the value to read our replies, if we
>>>>>>>>> were
>>>>>>>>> a primary process sending messages to secondary processes.
>>>>>>>>
>>>>>>>> Yes, I also think it is good to return number of sends.
>>>>>>>> Then caller can compare number of sended requests with number of
>>>>>>>> received replies and decide should it be considered a failure or 
>>>>>>>> no.
>>>>>>>>
>>>>>>>
>>>>>>> Well, OK, that might make sense. However, i think it would've be 
>>>>>>> of more
>>>>>>> value to make the API consistent (0/-1 on success/failure) and put
>>>>>>> number of sent messages into the reply, like number of received. 
>>>>>>> I.e.
>>>>>>> something like
>>>>>>>
>>>>>>> struct reply {
>>>>>>>       int nb_sent;
>>>>>>>       int nb_received;
>>>>>>> };
>>>>>>>
>>>>>>> We do it for the latter already, so why not the former?
>>>>>>
>>>>>> The question is what treat as success/failure?
>>>>>> Let say we sent 2 requests (of 3 possible), got back 1 response...
>>>>>> Should we consider it as success or failure?
>>>>>>
>>>>>
>>>>> I think "failure" is "something went wrong", not "secondary processes
>>>>> didn't respond". For example, invalid parameters, or our socket 
>>>>> suddenly
>>>>> being closed, or some other error that prevents us from sending 
>>>>> requests
>>>>> to secondaries.
>>>>>
>>>>> As far as i can tell from the code, there's no way to know if the
>>>>> secondary process is running other than by attempting to connect to 
>>>>> it,
>>>>> and get a response. So, failed connection should not be a failure
>>>>> condition, because we can't know if we *can* connect to the process
>>>>> until we do. Process may have ended, but socket files will still be
>>>>> around, and there's nothing we can do about that. So i wouldn't 
>>>>> consider
>>>>> inability to send a message a failure condition.
>>>>>
>>>>
>>>> Just to clarify - i'm suggesting leaving this decision up to the user.
>>>> If a user expects there to be "n" processes running, but only "m"
>>>> responses were received, he could treat it as error. Another user might
>>>> simply send periodical updates/polls to secondaries, for whatever 
>>>> reason
>>>> (say, stats display), and won't really care if one of them just 
>>>> died, so
>>>> there's no error for that user.
>>>>
>>>> However, all of this has nothing to do with API. If we're able to send
>>>> messages - it's not a failure. If we can't - it is. That's the part API
>>>> should be concerned about, and that's what the return value should
>>>> indicate, IMO.
>>>
>>> Ok so to clarify, you are suggesting:
>>> we have N peers - if send_msg() returns success for all N - return 
>>> success
>>> (no matter did we get a reply or not)
>>> Otherwise return a failure.
>>> ?
>>> Konstantin
>>
>> More along the lines of, return -1 if and only if something went 
>> wrong. That might be invalid parameters, or that might be an error 
>> with our own socket,
> 
> To check if the error is caused by our own socket, we check the errno 
> after sendmsg?
> 
> Like for remote socket errors, we check:
> - ECONNRESET
> - ECONNREFUSED
> - ENOBUFS
> 
> Right?
> 
> Thanks,
> Jianfeng

Well, that was only an example. If it doesn't make much sense to do so 
in this case, then don't, and only return -1 on invalid parameters. 
AFAIU we're using connectionless sockets so a bunch of these errors 
won't be applicable to us. Maybe -ENOBUFS, but i'm not sure it's worth 
it to check for that.

> 
> 
>> or something else to that effect. In all other cases, return 0 (that 
>> includes cases where we sent N messages but M replies where N != M). 
>> So, in other words, return 0 if we *could have succeeded* if nothing 
>> went wrong on the other side, and only return -1 if something went 
>> wrong on our side.
>>
>>>
>>>
>>>>
>>>> -- 
>>>> Thanks,
>>>> Anatoly
>>
>>
> 
> 

-- 
Thanks,
Anatoly


More information about the dev mailing list