[dpdk-dev] [PATCH v4 1/4] lib/rcu: add resource reclamation APIs
Ananyev, Konstantin
konstantin.ananyev at intel.com
Mon Apr 20 10:19:41 CEST 2020
> > > +
> > > + /* Enqueue the token and resource. Generating the token
> > > + * and enqueuing (token + resource) on the queue is not an
> > > + * atomic operation. This might result in tokens enqueued
> > > + * out of order on the queue. So, some tokens might wait
> > > + * longer than they are required to be reclaimed.
> > > + */
> > > + char data[dq->esize];
> > > + memcpy(data, &token, __RTE_QSBR_TOKEN_SIZE);
> > > + memcpy(data + __RTE_QSBR_TOKEN_SIZE, e,
> > > + dq->esize - __RTE_QSBR_TOKEN_SIZE);
> > > + /* Check the status as enqueue might fail since the other thread
> > > + * might have used up the freed space.
> > > + * Enqueue uses the configured flags when the DQ was created.
> > > + */
> > > + if (rte_ring_enqueue_elem(dq->r, data, dq->esize) != 0) {
> > > + rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> > > + "%s(): Enqueue failed\n", __func__);
> > > + /* Note that the token generated above is not used.
> > > + * Other than wasting tokens, it should not cause any
> > > + * other issues.
> > > + */
> > > + rte_log(RTE_LOG_INFO, rte_rcu_log_type,
> > > + "%s(): Skipped enqueuing token = %"PRIu64"\n",
> > > + __func__, token);
> > > +
> > > + rte_errno = ENOSPC;
> > > + return 1;
> > > + }
> >
> >
> > Just as a thought: in theory if we'll use MP_HTS(/SP) ring we can avoid
> > wasting RCU tokens:
> >
> > if (rte_ring_enqueue_elem_bulk_start(dq->r, 1, NULL) != 0) {
> > token = rte_rcu_qsbr_start(dq->v);
> > memcpy(data, &token, __RTE_QSBR_TOKEN_SIZE);
> > rte_ring_enqueue_elem_finish(dq->r, data, dq->esize, 1); }
> >
> > Though it might slowdown things if we'll have a lot of parallel dq_enqueue.
> > So not sure is it worth it or not.
> Adding peek APIs for RTS would be better. That should take care of the parallel dw_enqueue. Not sure if I gave you the comment. My ring
> patch supported these APIs.
AFAIK, peek API is not possible for RTS mode.
Probably you are talking about Scatter-Gather API introduced in your RFC
(_reserve_; update ring entries manually; _commit_)?
Anyway, if there is no much value in my idea above, then feel free to drop it.
>
> >
> > > +
> > > + rte_log(RTE_LOG_INFO, rte_rcu_log_type,
> > > + "%s(): Enqueued token = %"PRIu64"\n", __func__, token);
> > > +
> > > + return 0;
> > > +}
> > > +
> > > +/* Reclaim resources from the defer queue. */ int
> > > +rte_rcu_qsbr_dq_reclaim(struct rte_rcu_qsbr_dq *dq, unsigned int n,
> > > + unsigned int *freed, unsigned int *pending) {
> > > + uint32_t cnt;
> > > + uint64_t token;
> > > +
> > > + if (dq == NULL || n == 0) {
> > > + rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> > > + "%s(): Invalid input parameter\n", __func__);
> > > + rte_errno = EINVAL;
> > > +
> > > + return 1;
> > > + }
> > > +
> > > + cnt = 0;
> > > +
> > > + char e[dq->esize];
> > > + /* Check reader threads quiescent state and reclaim resources */
> > > + while ((cnt < n) &&
> > > + (rte_ring_dequeue_bulk_elem_start(dq->r, e,
> > > + dq->esize, 1, NULL) != 0)) {
> >
> > Another thought - any point to use burst_elem_start() here to retrieve more
> > then 1 elem in one go? Something like:
> I think it makes sense.
>
> > char e[32][dq->size];
> > while ((cnt < n) {
> > k = RTE_MAX(32, cnt - n);
> > k = rte_ring_dequeue_burst_elem_start(dq->r, e, dq->esize, k, NULL);
> > if (k = 0)
> > break;
> > for (i = 0; i != k; i++) {
> > memcpy(&token, e[i], sizeof(uint64_t));
> > if (rte_rcu_qsbr_check(dq->v, token, false) != 1)
> > break;
> > }
> > k = i;
> > rte_ring_dequeue_elem_finish(dq->r, k);
> > for (i = 0; i != k; i++)
> > dq->free_fn(dq->p, e[i] + __RTE_QSBR_TOKEN_SIZE);
> I think it also makes sense to change the free_fn to take 'n' number of tokens.
>
> > n += k;
> > if (k == 0)
> > break;
> >
> > ?
> > Also if at enqueue we guarantee strict ordrer (via
> > enqueue_start/enqueue_finish), then here we probably can do _check_ from
> > the last retrieved token here?
> > In theory that might help to minimize number of checks.
> > I.E. do:
> > for (i = k; i-- !=0; ) {
> > memcpy(&token, e[i], sizeof(uint64_t));
> > if (rte_rcu_qsbr_check(dq->v, token, false) != 1)
> There is a higher chance that later tokens are not acked. This introduces more polling of the counters.
> The rte_rcu_qsbr_check has an optimization. While acking the current token, it will also caches the greatest token acked. It uses the cached
> token for the subsequent calls. I think this provides a better optimization.
Ok.
More information about the dev
mailing list