[dpdk-dev] [PATCH v2 2/2] librte_ip_frag: add mbuf counter

Alex Kiselev alex at therouter.net
Wed Aug 22 11:47:48 CEST 2018


Hi Konstantin.

Could we please make a final decision about counting mbufs, since It still feels to me like an unfinished business?
Below are my final argumens. if they are not sound to you, just nack ;)

> Hi Konstantin.
>> Hi Alex,
>> Sorry for delay in reply.


>>> >> There might be situations (kind of attack when a lot of
>>> >> fragmented packets are sent to a dpdk application in order
>>> >> to flood the fragmentation table) when no additional mbufs
>>> >> must be added to the fragmentations table since it already
>>> >> contains to many of them. Currently there is no way to
>>> >> determine the number of mbufs holded int the fragmentation
>>> >> table. This patch allows to keep track of the number of mbufs
>>> >> holded in the fragmentation table.

>>> > I understand your intention, but still not sure it is worth it.
>>> > My thought was that you can estimate by upper limit (num_entries * entries_per_bucket) or so.
>>> No, I can't. The estimation error might be so big that there would be no difference at all.

>> Not sure why? If you'll use upper limit, then worst thing could happen -
>> you would start your table cleanup a bit earlier.
> Since bucket size is 4, an estimation error might be 400%.
> So, for example, if I want to setup the upper limit (max number mbufs that
> can be stored in frag table) to 20% of all my available mbufs
> I have to be ready that 80% of all mbufs might end up in a frag table
> (every bucket is full). Or if I take into account bucket size, and devide 20%
> by 4 in order the number mbufs to be exactly 20% in the worse case when every bucket is full,
> I could end up in the opposite border situation when exactly single mbuf is stored
> in every bucket, so upper limit of mbufs would be 20 / 4 = 5%. Both ways are not
> good since either you have to reserve extra mbufs just to correct estimation error
> or you upper limit would to small and you will be dropping good fragments.


>>> > Probably another way to account number of mbufs without changes in the lib -
>>> > apply something like that(assuming that your fragmets are not multisegs):

>>> > uint32_t mbuf_in_frag_table = 0;
>>> > ....

>>> n= dr->>cnt;
>>> > mb = rte_ipv4_frag_reassemble_packet(...);
>>> > if (mb != NULL)
>>> >    mbuf_in_frag_table += mb->nb_segs;
>>> > mbuf_in_frag_table += dr->cnt - n + 1;

>> Sorry, my bad, I think it should be 
>> mbuf_in_frag_table  -= dr->cnt - n + 1;


>>> > In theory that could be applied even if fragments might be multisegs, but for that,
>>> > we'll need to change rte_ip_frag_free_death_row() to return total number of freed segments.

>>> That should be a little bit more complicated wrapper code:

>>> uint32_t mbuf_in_frag_table = 0;
>>> ....

>>> n= dr->cnt;
>>> reassembled_mbuf = rte_ipv4_frag_reassemble_packet(..., fragmented_mbuf, ...);
>>> if (reassembled_mbuf == NULL)
>>>     mbuf_in_frag_table += fragmented_mbuf->nb_segs;

>> We don't know for sure here.
>> fragmented_mbuf could be in death row by now.
> Yes. That's exactly why you have to keep track of
> mbufs here and later after rte_ip_frag_free_death_row().

> User have to think about frag table and death row as a single entity, 
> kind of a black box, since it's impossible to say where 
> (in the frag table or in the death row) your mbuf will be
> after you call rte_ipv4_frag_reassemble_packet(). So, a caller/user should
> keep track of mbuf on every border/interface of that black box.
> One interface is rte_ipv4_frag_reassemble_packet and the other is 
> rte_ip_frag_free_death_row.

> So, that's why it's easier to keep track of mbufs inside the library.
>   

>>> else
>>>     mbuf_in_frag_table -= reassembled_mbuf->nb_segs;
>>> mbuf_in_frag_table += dr->cnt - n;


>>> Also, in that case every rte_ip_frag_free_death_row() needs a wrapper code too.

>>> n= dr->cnt;
>>> rte_ip_frag_free_death_row(..)
>>> mbuf_in_frag_table += dr->cnt - n;

>> I don't think it is necessary.
>> After packet is put in the death-row it is no longer in the table.
> It's critical, since from a user point of view death row and frag table
> is a black box due rte_ipv4_frag_reassemble_packet() doesn't indicate a caller
> where his packet has been stored (in the frag table or death row).

>> Konstantin



>>> I think my approach is simplier.

>>> > Konstantin


>>> >> Signed-off-by: Alex Kiselev <alex at therouter.net>
>>> >> ---
>>> >>  lib/librte_ip_frag/ip_frag_common.h        | 16 +++++++++-------
>>> >>  lib/librte_ip_frag/ip_frag_internal.c      | 16 +++++++++-------
>>> >>  lib/librte_ip_frag/rte_ip_frag.h           | 18 +++++++++++++++++-
>>> >>  lib/librte_ip_frag/rte_ip_frag_common.c    |  1 +
>>> >>  lib/librte_ip_frag/rte_ip_frag_version.map |  1 +
>>> >>  lib/librte_ip_frag/rte_ipv4_reassembly.c   |  2 +-
>>> >>  lib/librte_ip_frag/rte_ipv6_reassembly.c   |  2 +-
>>> >>  7 files changed, 39 insertions(+), 17 deletions(-)

>>> >> diff --git a/lib/librte_ip_frag/ip_frag_common.h b/lib/librte_ip_frag/ip_frag_common.h
>>> >> index 0fdcc7d0f..9fe5c0559 100644
>>> >> --- a/lib/librte_ip_frag/ip_frag_common.h
>>> >> +++ b/lib/librte_ip_frag/ip_frag_common.h
>>> >> @@ -32,15 +32,15 @@
>>> >>  #endif /* IP_FRAG_TBL_STAT */

>>> >>  /* internal functions declarations */
>>> >> -struct rte_mbuf * ip_frag_process(struct ip_frag_pkt *fp,
>>> >> -             struct rte_ip_frag_death_row *dr, struct rte_mbuf *mb,
>>> >> -             uint16_t ofs, uint16_t len, uint16_t more_frags);
>>> >> +struct rte_mbuf *ip_frag_process(struct rte_ip_frag_tbl *tbl,
>>> >> +     struct ip_frag_pkt *fp, struct rte_ip_frag_death_row *dr,
>>> >> +     struct rte_mbuf *mb, uint16_t ofs, uint16_t len, uint16_t more_frags);

>>> >> -struct ip_frag_pkt * ip_frag_find(struct rte_ip_frag_tbl *tbl,
>>> >> +struct ip_frag_pkt *ip_frag_find(struct rte_ip_frag_tbl *tbl,
>>> >>               struct rte_ip_frag_death_row *dr,
>>> >>               const struct ip_frag_key *key, uint64_t tms);

>>> >> -struct ip_frag_pkt * ip_frag_lookup(struct rte_ip_frag_tbl *tbl,
>>> >> +struct ip_frag_pkt *ip_frag_lookup(struct rte_ip_frag_tbl *tbl,
>>> >>       const struct ip_frag_key *key, uint64_t tms,
>>> >>       struct ip_frag_pkt **free, struct ip_frag_pkt **stale);

>>> >> @@ -91,7 +91,8 @@ ip_frag_key_cmp(const struct ip_frag_key * k1, const struct ip_frag_key * k2)

>>> >>  /* put fragment on death row */
>>> >>  static inline void
>>> >> -ip_frag_free(struct ip_frag_pkt *fp, struct rte_ip_frag_death_row *dr)
>>> >> +ip_frag_free(struct rte_ip_frag_tbl *tbl, struct ip_frag_pkt *fp,
>>> >> +     struct rte_ip_frag_death_row *dr)
>>> >>  {
>>> >>       uint32_t i, k;

>>> >> @@ -100,6 +101,7 @@ ip_frag_free(struct ip_frag_pkt *fp, struct rte_ip_frag_death_row *dr)
>>> >>               if (fp->frags[i].mb != NULL) {
>>> >>                       dr->row[k++] = fp->frags[i].mb;
>>> >>                       fp->frags[i].mb = NULL;
>>> >> +                     tbl->nb_mbufs--;
>>> >>               }
>>> >>       }

>>> >> @@ -160,7 +162,7 @@ static inline void
>>> >>  ip_frag_tbl_del(struct rte_ip_frag_tbl *tbl, struct rte_ip_frag_death_row *dr,
>>> >>       struct ip_frag_pkt *fp)
>>> >>  {
>>> >> -     ip_frag_free(fp, dr);
>>> >> +     ip_frag_free(tbl, fp, dr);
>>> >>       ip_frag_key_invalidate(&fp->key);
>>> >>       TAILQ_REMOVE(&tbl->lru, fp, lru);
>>> >>       tbl->use_entries--;
>>> >> diff --git a/lib/librte_ip_frag/ip_frag_internal.c b/lib/librte_ip_frag/ip_frag_internal.c
>>> >> index 97470a872..4c47d3fb4 100644
>>> >> --- a/lib/librte_ip_frag/ip_frag_internal.c
>>> >> +++ b/lib/librte_ip_frag/ip_frag_internal.c
>>> >> @@ -29,14 +29,13 @@ static inline void
>>> >>  ip_frag_tbl_reuse(struct rte_ip_frag_tbl *tbl, struct rte_ip_frag_death_row *dr,
>>> >>       struct ip_frag_pkt *fp, uint64_t tms)
>>> >>  {
>>> >> -     ip_frag_free(fp, dr);
>>> >> +     ip_frag_free(tbl, fp, dr);
>>> >>       ip_frag_reset(fp, tms);
>>> >>       TAILQ_REMOVE(&tbl->lru, fp, lru);
>>> >>       TAILQ_INSERT_TAIL(&tbl->lru, fp, lru);
>>> >>       IP_FRAG_TBL_STAT_UPDATE(&tbl->stat, reuse_num, 1);
>>> >>  }

>>> >> -
>>> >>  static inline void
>>> >>  ipv4_frag_hash(const struct ip_frag_key *key, uint32_t *v1, uint32_t *v2)
>>> >>  {
>>> >> @@ -88,8 +87,9 @@ ipv6_frag_hash(const struct ip_frag_key *key, uint32_t *v1, uint32_t *v2)
>>> >>  }

>>> >>  struct rte_mbuf *
>>> >> -ip_frag_process(struct ip_frag_pkt *fp, struct rte_ip_frag_death_row *dr,
>>> >> -     struct rte_mbuf *mb, uint16_t ofs, uint16_t len, uint16_t more_frags)
>>> >> +ip_frag_process(struct rte_ip_frag_tbl *tbl, struct ip_frag_pkt *fp,
>>> >> +     struct rte_ip_frag_death_row *dr, struct rte_mbuf *mb, uint16_t ofs,
>>> >> +     uint16_t len, uint16_t more_frags)
>>> >>  {
>>> >>       uint32_t idx;

>>> >> @@ -147,7 +147,7 @@ ip_frag_process(struct ip_frag_pkt *fp, struct rte_ip_frag_death_row *dr,
>>> >>                               fp->frags[IP_LAST_FRAG_IDX].len);

>>> >>               /* free all fragments, invalidate the entry. */
>>> >> -             ip_frag_free(fp, dr);
>>> >> +             ip_frag_free(tbl, fp, dr);
>>> >>               ip_frag_key_invalidate(&fp->key);
>>> >>               IP_FRAG_MBUF2DR(dr, mb);

>>> >> @@ -157,6 +157,7 @@ ip_frag_process(struct ip_frag_pkt *fp, struct rte_ip_frag_death_row *dr,
>>> >>       fp->frags[idx].ofs = ofs;
>>> >>       fp->frags[idx].len = len;
>>> >>       fp->frags[idx].mb = mb;
>>> >> +     tbl->nb_mbufs++;

>>> >>       mb = NULL;

>>> >> @@ -205,8 +206,9 @@ ip_frag_process(struct ip_frag_pkt *fp, struct rte_ip_frag_death_row *dr,
>>> >>                               fp->frags[IP_LAST_FRAG_IDX].len);

>>> >>               /* free associated resources. */
>>> >> -             ip_frag_free(fp, dr);
>>> >> -     }
>>> >> +             ip_frag_free(tbl, fp, dr);
>>> >> +     } else
>>> >> +             tbl->nb_mbufs -= fp->last_idx;

>>> >>       /* we are done with that entry, invalidate it. */
>>> >>       ip_frag_key_invalidate(&fp->key);
>>> >> diff --git a/lib/librte_ip_frag/rte_ip_frag.h b/lib/librte_ip_frag/rte_ip_frag.h
>>> >> index 7f425f610..623934d87 100644
>>> >> --- a/lib/librte_ip_frag/rte_ip_frag.h
>>> >> +++ b/lib/librte_ip_frag/rte_ip_frag.h
>>> >> @@ -96,6 +96,7 @@ struct rte_ip_frag_tbl {
>>> >>       uint32_t             bucket_entries;  /**< hash associativity. */
>>> >>       uint32_t             nb_entries;      /**< total size of the table. */
>>> >>       uint32_t             nb_buckets;      /**< num of associativity lines. */
>>> >> +     uint32_t             nb_mbufs;        /**< num of mbufs holded in the tbl. */
>>> >>       struct ip_frag_pkt *last;         /**< last used entry. */
>>> >>       struct ip_pkt_list lru;           /**< LRU list for table entries. */
>>> >>       struct ip_frag_tbl_stat stat;     /**< statistics counters. */
>>> >> @@ -329,8 +330,23 @@ void
>>> >>  rte_ip_frag_table_statistics_dump(FILE * f, const struct rte_ip_frag_tbl *tbl);

>>> >>  /**
>>> >> - * Delete expired fragments
>>> >> + * Number of mbufs holded in the fragmentation table.
>>> >> + *
>>> >> + * @param tbl
>>> >> + *   Fragmentation table
>>> >>   *
>>> >> + * @return
>>> >> + *   Number of mbufs holded in the fragmentation table.
>>> >> + */
>>> >> +static inline uint32_t __rte_experimental
>>> >> +rte_frag_table_mbuf_count(const struct rte_ip_frag_tbl *tbl)
>>> >> +{
>>> >> +     return tbl->nb_mbufs;
>>> >> +}
>>> >> +
>>> >> +/**
>>> >> + * Delete expired fragments
>>> >> + *
>>> >>   * @param tbl
>>> >>   *   Table to delete expired fragments from
>>> >>   * @param dr
>>> >> diff --git a/lib/librte_ip_frag/rte_ip_frag_common.c b/lib/librte_ip_frag/rte_ip_frag_common.c
>>> >> index a23f6f24f..46c2df84a 100644
>>> >> --- a/lib/librte_ip_frag/rte_ip_frag_common.c
>>> >> +++ b/lib/librte_ip_frag/rte_ip_frag_common.c
>>> >> @@ -75,6 +75,7 @@ rte_ip_frag_table_create(uint32_t bucket_num, uint32_t bucket_entries,
>>> >>       tbl->nb_buckets = bucket_num;
>>> >>       tbl->bucket_entries = bucket_entries;
>>> >>       tbl->entry_mask = (tbl->nb_entries - 1) & ~(tbl->bucket_entries  - 1);
>>> >> +     tbl->nb_mbufs = 0;

>>> >>       TAILQ_INIT(&(tbl->lru));
>>> >>       return tbl;
>>> >> diff --git a/lib/librte_ip_frag/rte_ip_frag_version.map b/lib/librte_ip_frag/rte_ip_frag_version.map
>>> >> index d40d5515f..f4700f460 100644
>>> >> --- a/lib/librte_ip_frag/rte_ip_frag_version.map
>>> >> +++ b/lib/librte_ip_frag/rte_ip_frag_version.map
>>> >> @@ -23,4 +23,5 @@ EXPERIMENTAL {
>>> >>       global:

>>> >>       rte_frag_table_del_expired_entries;
>>> >> +     rte_frag_table_mbuf_count;
>>> >>  };
>>> >> diff --git a/lib/librte_ip_frag/rte_ipv4_reassembly.c b/lib/librte_ip_frag/rte_ipv4_reassembly.c
>>> >> index 4956b99ea..fbdfd860a 100644
>>> >> --- a/lib/librte_ip_frag/rte_ipv4_reassembly.c
>>> >> +++ b/lib/librte_ip_frag/rte_ipv4_reassembly.c
>>> >> @@ -146,7 +146,7 @@ rte_ipv4_frag_reassemble_packet(struct rte_ip_frag_tbl *tbl,


>>> >>       /* process the fragmented packet. */
>>> >> -     mb = ip_frag_process(fp, dr, mb, ip_ofs, ip_len, ip_flag);
>>> >> +     mb = ip_frag_process(tbl, fp, dr, mb, ip_ofs, ip_len, ip_flag);
>>> >>       ip_frag_inuse(tbl, fp);

>>> >>       IP_FRAG_LOG(DEBUG, "%s:%d:\n"
>>> >> diff --git a/lib/librte_ip_frag/rte_ipv6_reassembly.c b/lib/librte_ip_frag/rte_ipv6_reassembly.c
>>> >> index db249fe60..dda5a57b7 100644
>>> >> --- a/lib/librte_ip_frag/rte_ipv6_reassembly.c
>>> >> +++ b/lib/librte_ip_frag/rte_ipv6_reassembly.c
>>> >> @@ -186,7 +186,7 @@ rte_ipv6_frag_reassemble_packet(struct rte_ip_frag_tbl *tbl,


>>> >>       /* process the fragmented packet. */
>>> >> -     mb = ip_frag_process(fp, dr, mb, ip_ofs, ip_len,
>>> >> +     mb = ip_frag_process(tbl, fp, dr, mb, ip_ofs, ip_len,
>>> >>                       MORE_FRAGS(frag_hdr->frag_data));
>>> >>       ip_frag_inuse(tbl, fp);

>>> >> --
>>> >> 2.16.1.windows.1




>>> --
>>> Alex







-- 
Alex



More information about the dev mailing list