[dpdk-dev] [PATCH v4] eal: fix proc type auto detection

David Marchand david.marchand at redhat.com
Sun Oct 27 20:41:05 CET 2019


On Thu, Oct 24, 2019 at 6:07 PM Burakov, Anatoly
<anatoly.burakov at intel.com> wrote:
>
> On 12-Aug-19 11:03 AM, David Marchand wrote:
> > On Wed, Jul 24, 2019 at 6:08 PM Anatoly Burakov
> > <anatoly.burakov at intel.com> wrote:
> >>
> >> Currently, primary process holds an exclusive lock on the config
> >> file, thereby preventing other primaries from spinning up. However,
> >> when the primary dies, the lock is no longer being held, even though
> >> there might be other secondary processes still running.
> >>
> >> The fix is two-fold. First of all, downgrade the primary process's
> >> exclusive lock to a shared lock once we have it. Second of all,
> >> also take out shared locks on the config from the secondaries. We
> >> are using fcntl() locks, which get dropped when the file handle is
> >> closed, so also remove the closure of config file handle.
> >>
> >> Fixes: af75078fece3 ("first public release")
> >> Cc: stable at dpdk.org
> >>
> >> Signed-off-by: Anatoly Burakov <anatoly.burakov at intel.com>
> >> ---
> >>
>
> Hi David,
>
> I've been investigating how to improve this patch, and i've hit a dead end.
>
> The problems here are two-fold. Using fcntl() and flock() locks together
> is not advisable, so both primary-secondary detection and
> rte_eal_primary_proc_alive() (as per Harry's point) have to use the same
> method for checking locks.
>
> Using flock() would work for this purpose. Unfortunately, on FreeBSD,
> converting exclusive lock into a shared lock involves unlocking first
> [1] (creating a race). On Linux it doesn't specifically say that it does
> that, but it does mention that it is not guaranteed to be atomic [2].
> So, we can't use flock() here.
>
> It seems that fcntl() lock conversions are atomic, however fcntl() locks
> on both Linux and FreeBSD are implemented in a very stupid way in that
> /any/ closure of a file descriptor drops /all/ locks on that file.
> Meaning, the moment secondary does the check in primary_proc_alive() and
> closes the config file fd, the process-wide lock drops. Mind you,
> primary_proc_alive() is implemented using lockf() rather than fcntl(),
> which is an issue in itself, but on Linux, lockf() is implemented on top
> of fcntl() locks and thus suffers from the same issue.
>
> So, unless you have better ideas, i think this patch can be marked as
> rejected.
>
> [1] https://www.freebsd.org/cgi/man.cgi?query=flock&sektion=2
> [2] https://linux.die.net/man/2/flock

Sorry, hard to digest, I would need to look at this again later.
If you have no easy solution, let's revisit after 19.11.

Thanks Anatoly.


-- 
David Marchand



More information about the dev mailing list