[dpdk-dev] [PATCH v4] eal: fix proc type auto detection

Burakov, Anatoly anatoly.burakov at intel.com
Thu Oct 24 18:07:00 CEST 2019


On 12-Aug-19 11:03 AM, David Marchand wrote:
> On Wed, Jul 24, 2019 at 6:08 PM Anatoly Burakov
> <anatoly.burakov at intel.com> wrote:
>>
>> Currently, primary process holds an exclusive lock on the config
>> file, thereby preventing other primaries from spinning up. However,
>> when the primary dies, the lock is no longer being held, even though
>> there might be other secondary processes still running.
>>
>> The fix is two-fold. First of all, downgrade the primary process's
>> exclusive lock to a shared lock once we have it. Second of all,
>> also take out shared locks on the config from the secondaries. We
>> are using fcntl() locks, which get dropped when the file handle is
>> closed, so also remove the closure of config file handle.
>>
>> Fixes: af75078fece3 ("first public release")
>> Cc: stable at dpdk.org
>>
>> Signed-off-by: Anatoly Burakov <anatoly.burakov at intel.com>
>> ---
>>

Hi David,

I've been investigating how to improve this patch, and i've hit a dead end.

The problems here are two-fold. Using fcntl() and flock() locks together 
is not advisable, so both primary-secondary detection and 
rte_eal_primary_proc_alive() (as per Harry's point) have to use the same 
method for checking locks.

Using flock() would work for this purpose. Unfortunately, on FreeBSD, 
converting exclusive lock into a shared lock involves unlocking first 
[1] (creating a race). On Linux it doesn't specifically say that it does 
that, but it does mention that it is not guaranteed to be atomic [2]. 
So, we can't use flock() here.

It seems that fcntl() lock conversions are atomic, however fcntl() locks 
on both Linux and FreeBSD are implemented in a very stupid way in that 
/any/ closure of a file descriptor drops /all/ locks on that file. 
Meaning, the moment secondary does the check in primary_proc_alive() and 
closes the config file fd, the process-wide lock drops. Mind you, 
primary_proc_alive() is implemented using lockf() rather than fcntl(), 
which is an issue in itself, but on Linux, lockf() is implemented on top 
of fcntl() locks and thus suffers from the same issue.

So, unless you have better ideas, i think this patch can be marked as 
rejected.

[1] https://www.freebsd.org/cgi/man.cgi?query=flock&sektion=2
[2] https://linux.die.net/man/2/flock

-- 
Thanks,
Anatoly


More information about the dev mailing list