[dpdk-dev] [RFC PATCH v1] regexdev: introduce regexdev subsystem
Jerin Jacob Kollanukkaran
jerinj at marvell.com
Tue Sep 10 10:05:39 CEST 2019
Hi Xiang,
Sorry for delay in response(Was busy with 19.11 proposal deadline). Please see inline.
>
> Reply to Xiang's queries in main thread:
>
> Hi all,
>
> Some questions regarding APIs. Could you please give more insights?
>
> 1) rte_regex_ops
> a) rsp_flags
> These two flags RTE_REGEX_OPS_RSP_PMI_SOJ_F and
> RTE_REGEX_OPS_RSP_PMI_EOJ_F are used for cross buffer scan.
> RTE_REGEX_OPS_RSP_PMI_EOJ_F tells whether we have a partial match
> at the end of current buffer after scan.
> What's the purpose of having RTE_REGEX_OPS_RSP_PMI_SOJ_F?
>
> [Jerin] Since we need three states to represent partial match buffer,
> RTE_REGEX_OPS_RSP_PMI_SOJ_F to
> represent start of the buffer, intermediate buffers with no flag, and end of
> the buffer with RTE_REGEX_OPS_RSP_PMI_EOJ
> [Xiang] How could a user leverage these flags for matching? Suppose a large
> buffer is divided into multiple chunks. Will RTE_REGEX_OPS_RSP_PMI_SOJ_F
> cause an early quit once it isn't set after scan the first chunk. Similarly,
> RTE_REGEX_OPS_RSP_PMI_EOJ tells a user whether to stop matching future
> buffers after finish the last chunk?
Let me describe with an example,
Assume,
1) struct rte_regex_dev_info:: max_payload_size set to 1024
2) rte_regex_dev_config:: dev_cfg_flags configured with RTE_REGEX_DEV_CFG_CROSS_BUFFER_SCAN_F
3) Device programmed with matching "hello\s+world" pattern
4) user enqueue struct rte_regex_ops:: buf_addr point following "data" and struct rte_regex_op:: scan_size = 1024
data[0..1021] = data don’t have hello world pattern
data[1022] = 'h'
data[1023] = 'e'
5) user enqueue struct rte_regex_ops:: buf_addr point following "data" and struct rte_regex_op:: scan_size = 9
data[0] = 'l'
data[1] = 'l'
data[2] = 'o'
data[3] = ' '
data[4] = 'w'
data[5] = 'o'
data[6] = 'r'
data[7] = 'l'
data[8] = 'd'
If so,
Response to 4) will be RTE_REGEX_OPS_RSP_PMI_SOJ_F in rte_regex_ops:: rsp_flags on dequeue
Where rte_regex_match:: offset is 1022 and len 2
Response to 5) will be RTE_REGEX_OPS_RSP_PMI_EOJ_F in rte_regex_ops:: rsp_flags on dequeue
Where rte_regex_match:: offset is 0 and len 9
>
> RTE_REGEX_OPS_RSP_MAX_PREFIX_F: This looks like a definition for a
> specific hardware implementation. I am wondering what this PREFIX refers
> to:)?
>
> [Jerin] Yes. Looks like it is for hardware specific implementation. Introduced
> rte_regex_dev_attr_set/get functions to make it portable and
> To add new implementation specific fields.
> For example, if a rule is
> /ABCDEF.*XYZ/, ABCD is considered the prefix, and EF.*XYZ is considered the
> factor. The prefix is a literal
> string, while the factor can contain complex regular expression constructs. As
> a result, rule matching occurs in
> two stages: prefix matching and factor matching.
>
> b) user_id or user_ptr
> Under what kind of circumstances should an application pass value into
> these variables for enqueue and dequeuer operations?
>
> [Jerin] Just like rte_crypto_ops, struct rte_regex_ops also allocated using
> mempool normally, on enqueue, user can specify user_id
> If needed to in order identify the op on dequeue if required. The use case
> could be to store the sequence number from application
> POV or storing the mbuf ptr in which pattern is requested etc.
>
>
> 2) rte_regex_match
> a) offset; /**< Starting Byte Position for matched rule. */ and uint16_t
> len; /**< Length of match in bytes */
> Looks like the matching offset is defined as *starting matching offset*
> instead of *end matching offset*, e.g. report the offset of "a" instead of "c"
> for pattern "abc".
> If so, this makes it hard to integrate software regex libraries such as
> Hyperscan and RE2 as they only report *end matching offset* without length
> of match.
> Although Hyperscan has API for *starting matching offset*, it only delivers
> partial syntax support. So I think we have to define *end of matching offset*
> for software solutions.
>
> [Jerin] I understand the hyperscan's HS_FLAG_SOM_LEFTMOST tradeoffs. I
> thought application would need always the length of the match.
> Probably we will see how other HW implementation (from Mellanox) etc. We
> will try to abstract it, probably we can make it as function of "user
> requested".
> [Xiang] Yes, it will be good to make it per user request. At least from
> Hyperscan user's point of view, start of match and match length are not
> mandatory.
OK. I think, we can introduce RTE_REGEX_DEV_CFG_MATCH_AS_START
In device configure.
Since offset+len == end, we can introduce following generic inline function.
static inline
rte_regex_match_end(truct rte_regex_match *match)
{
match->offset + match->len;
}
Example: pattern to match is "hello\s+world" and data is following
data[4] = 'h'
data[5] = 'e'
data[6] = 'l'
data[7] = 'l'
data[8] = 'o'
data[9] = ' '
data[10] = 'w'
data[11] = 'o'
data[12] = 'r'
data[13] = 'l'
data[14] = 'd'
if device is configured with RTE_REGEX_DEV_CFG_MATCH_AS_START
match->offset returns 4
match->len returns 11
if device is NOT configured with RTE_REGEX_DEV_CFG_MATCH_AS_START
driver MAY return the following(in hyperscan case)
match->offset returns 0
match->len returns 11 + 4
In both case(irrespective of flags, to make application life easy) rte_regex_match_end() would return 15.
If application demands for MATCH_AS_START then driver can return match->offset returns 4 and match->len returns 11
Aka set HS_FLAG_SOM_LEFTMOST in hyperscan driver, But application should use rte_regex_match_end()
for finding the end of the match. To make, work in all cases.
Is it OK?
>
> 3) rte_regex_rule_db_update()
> Does this mean we can dynamically add or delete rules for an already
> generated database without recompile from scratch for hardware Regex
> implementation?
> If so, this isn't possible for software solutions as they don't support
> dynamic database update and require recompile.
>
> [Jerin] rte_regex_rule_db_update() internally it would call recompile
> function for both HW and SW.
> See rte_regex_dev_config::rule_db in rte_regex_dev_configure() for
> precompiled rule database case.
> [Xiang] OK, sounds like we have to save the original rule-set for the device in
> order to do recompile. I see both ADD and REMOVE operators from
> rte_regex_rule.
> For rules with REMOVE operator, what's the expected behavior to handle
> them for the old rule-set? Do we need to go through the old rule-set and
> remove corresponding rules before doing recompile?
Yes.
More information about the dev
mailing list