|SUCCESS| [v1] crypto/cnxk: add ML crypto support

dpdklab at iol.unh.edu dpdklab at iol.unh.edu
Sat May 30 02:00:13 CEST 2026
Previous message (by thread): |WARNING| pw164558 [PATCH] raw/gdtc : update MAINTAINERS entry
Next message (by thread): |SUCCESS| net/zxdh : update MAINTAINERS entry
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Test-Label: ai-code-review
Test-Status: SUCCESS
https://dpdk.org/patch/164561

_AI Code Review_

=== Patch Review: series.patch (via Claude) ===
Review date: 2026-05-29

I've reviewed the patch for adding ML-KEM and ML-DSA support to the DPDK cnxk crypto driver. Here are my findings:

commit (git sha not available - patch file format)
Author: Gowrishankar Muthukrishnan <gmuthukrishn at marvell.com>

crypto/cnxk: add ML crypto support

This patch adds ML-KEM and ML-DSA cryptographic algorithm support to the
cnxk driver for CN20K platform.

> diff --git a/drivers/common/cnxk/roc_platform.h b/drivers/common/cnxk/roc_platform.h
> index e22a50d47a..19f1d6005b 100644
> --- a/drivers/common/cnxk/roc_platform.h
> +++ b/drivers/common/cnxk/roc_platform.h
> @@ -217,6 +218,10 @@ plt_thread_is_valid(plt_thread_t thr)
>  #define plt_memory_order_release  rte_memory_order_release
>  #define plt_memory_order_acquire  rte_memory_order_acquire
>  #define plt_memory_order_relaxed  rte_memory_order_relaxed
> +#define plt_memory_order_seq_cst  rte_memory_order_seq_cst
> +
> +#define plt_atomic_fetch_add_explicit rte_atomic_fetch_add_explicit
> +#define plt_atomic_fetch_sub_explicit rte_atomic_fetch_sub_explicit

Does this code add excessive abstraction layers for platform portability
that are not consistently applied across the codebase? The patch
introduces new plt_* wrappers for atomic operations while using
PLT_ATOMIC and plt_atomic_fetch_*_explicit directly in the ML tables
code. Should the existing PLT_ATOMIC macro be used more consistently
rather than adding separate wrappers?

[ ... ]

> diff --git a/drivers/common/cnxk/roc_re_ml_tables.c b/drivers/common/cnxk/roc_re_ml_tables.c
> new file mode 100644
> index 0000000000..933e35e3c9
> --- /dev/null
> +++ b/drivers/common/cnxk/roc_re_ml_tables.c

> +int
> +roc_re_ml_zeta_get(uint64_t *tbl)
> +{
> +	int len = (RE_MLKEM_ZETA_LEN + RE_MLDSA_ZETA_LEN);
> +	const char name[] = RE_ML_TBL_NAME;
> +	const struct plt_memzone *mz;
> +	struct re_ml_tbl *ml;
> +	uint8_t *data;
> +
> +	if (tbl == NULL)
> +		return -EINVAL;
> +
> +	mz = plt_memzone_lookup(name);
> +	if (mz == NULL) {
> +		/* Create memzone first time */
> +		mz = plt_memzone_reserve_cache_align(name, sizeof(struct re_ml_tbl) + len);
> +		if (mz == NULL)
> +			return -ENOMEM;
> +	}
> +
> +	ml = (struct re_ml_tbl *)mz->addr;
> +	if (plt_atomic_fetch_add_explicit(&ml->refcount, 1, plt_memory_order_seq_cst) != 0)
> +		return 0;

Can roc_re_ml_zeta_get() have a race condition? If two threads
simultaneously call this function and both see refcount==0 before the
fetch_add, the lookup table could be initialized twice. The first
thread to execute fetch_add will see 0 and proceed with memcpy, but if
the second thread has already passed the lookup check and is also in the
memcpy section, does this corrupt the shared memory region?

> +	data = PLT_PTR_ADD(mz->addr, sizeof(uint64_t));
> +	memcpy(data, re_ml_zeta_tbl[0].data, re_ml_zeta_tbl[0].len);
> +	tbl[0] = plt_cpu_to_be_64((uintptr_t)data);
> +
> +	data = PLT_PTR_ADD(data, re_ml_zeta_tbl[0].len);
> +	memcpy(data, re_ml_zeta_tbl[1].data, re_ml_zeta_tbl[1].len);
> +	tbl[1] = plt_cpu_to_be_64((uintptr_t)data);

Does this code violate strict aliasing rules? The pointer arithmetic
using PLT_PTR_ADD on mz->addr, which points to struct re_ml_tbl, to
access raw data after the header could be problematic. Should the code
use the ml_tbl[] flexible array member from struct re_ml_tbl instead of
manual pointer arithmetic?

> +
> +	return 0;
> +}
> +
> +void
> +roc_re_ml_zeta_put(void)
> +{
> +	const char name[] = RE_ML_TBL_NAME;
> +	const struct plt_memzone *mz;
> +	struct re_ml_tbl *ml;
> +
> +	mz = plt_memzone_lookup(name);
> +	if (mz == NULL)
> +		return;
> +
> +	ml = (struct re_ml_tbl *)mz->addr;
> +	if (plt_atomic_fetch_sub_explicit(&ml->refcount, 1, plt_memory_order_seq_cst) == 1)
> +		plt_memzone_free(mz);

Can roc_re_ml_zeta_put() race with roc_re_ml_zeta_get()? If thread A
decrements refcount to 0 and is about to call plt_memzone_free, while
thread B calls roc_re_ml_zeta_get, thread B could look up the memzone,
increment refcount to 1, and proceed to use the memory just as thread A
frees it. Does this cause use-after-free?

[ ... ]

> diff --git a/drivers/crypto/cnxk/cnxk_ae.h b/drivers/crypto/cnxk/cnxk_ae.h
> index 21a0c8068a..691f9bfce5 100644
> --- a/drivers/crypto/cnxk/cnxk_ae.h
> +++ b/drivers/crypto/cnxk/cnxk_ae.h

> +static __rte_always_inline int __rte_hot
> +cnxk_ae_enqueue_ml_kem_op(struct rte_crypto_op *op, struct roc_ae_buf_ptr *meta_buf,
> +		       struct cnxk_ae_sess *sess, struct cpt_inst_s *inst)
> +{
> +	size_t metabuf_len = cnxk_cpt_asym_get_mlen(), reqbuf_len;

[ ... ]

> +	switch (mlkem->op) {
> +	case RTE_CRYPTO_ML_KEM_OP_KEYGEN:
> +		reqbuf_len = mlkem->keygen.d.length + mlkem->keygen.z.length;
> +		if (reqbuf_len > (metabuf_len - dlen)) {
> +			op->status = RTE_CRYPTO_OP_STATUS_INVALID_ARGS;
> +			return -ENOMEM;
> +		}
> +
> +		memcpy(dptr, mlkem->keygen.d.data, mlkem->keygen.d.length);
> +		dptr += mlkem->keygen.d.length;
> +		memcpy(dptr, mlkem->keygen.z.data, mlkem->keygen.z.length);
> +		dptr += mlkem->keygen.z.length;

Does cnxk_ae_enqueue_ml_kem_op() properly validate buffer lengths? The
code checks if reqbuf_len exceeds available space but does not validate
that mlkem->keygen.d.length and mlkem->keygen.z.length are within
expected bounds for the ML-KEM type. Can an oversized length value from
userspace cause memcpy to write beyond the metabuf boundary even after
the reqbuf_len check?

> +
> +		dlen = mlkem->keygen.d.length + mlkem->keygen.z.length;

[ ... ]

> +	case RTE_CRYPTO_ML_KEM_OP_ENCAP:
> +		reqbuf_len = mlkem->encap.message.length + mlkem->encap.ek.length;
> +		if (reqbuf_len > (metabuf_len - dlen)) {
> +			op->status = RTE_CRYPTO_OP_STATUS_INVALID_ARGS;
> +			return -ENOMEM;
> +		}
> +
> +		memcpy(dptr, mlkem->encap.message.data, mlkem->encap.message.length);
> +		dptr += mlkem->encap.message.length;
> +		memcpy(dptr, mlkem->encap.ek.data, mlkem->encap.ek.length);

Does the overflow check use the correct initial dlen value? In
cnxk_ae_enqueue_ml_kem_op(), dlen is initialized to 0 before the
switch statement, so the check '(metabuf_len - dlen)' is equivalent to
'metabuf_len'. However, if this function is extended or refactored, does
the pattern risk incorrect overflow checking?

[ ... ]

> +static __rte_always_inline int __rte_hot
> +cnxk_ae_enqueue_ml_dsa_op(struct rte_crypto_op *op, struct roc_ae_buf_ptr *meta_buf,
> +		       struct cnxk_ae_sess *sess, struct cpt_inst_s *inst)
> +{
> +	size_t metabuf_len = cnxk_cpt_asym_get_mlen(), reqbuf_len;

[ ... ]

> +	case RTE_CRYPTO_ML_DSA_OP_SIGN:

[ ... ]

> +		if (!sign_deterministic) {
> +			if (!mldsa->siggen.seed.length) {
> +				op->status = RTE_CRYPTO_OP_STATUS_INVALID_ARGS;
> +				return -EINVAL;
> +			}
> +
> +			memcpy(dptr, mldsa->siggen.seed.data, mldsa->siggen.seed.length);
> +			dptr += mldsa->siggen.seed.length;
> +			dlen += mldsa->siggen.seed.length;
> +		}
> +
> +		memcpy(dptr, mldsa->siggen.privkey.data, mldsa->siggen.privkey.length);
> +		dptr += mldsa->siggen.privkey.length;
> +		dlen += mldsa->siggen.privkey.length;

Does cnxk_ae_enqueue_ml_dsa_op() check the total accumulated length
correctly? The code accumulates data into dlen without rechecking
against metabuf_len after each addition. The single reqbuf_len check at
the start covers all fields, but if any field length is maliciously or
accidentally larger than expected, can this overflow the meta buffer?

[ ... ]

> diff --git a/drivers/crypto/cnxk/cnxk_cryptodev_ops.c b/drivers/crypto/cnxk/cnxk_cryptodev_ops.c

> @@ -182,8 +196,15 @@ cnxk_cpt_dev_config(struct rte_cryptodev *dev, struct rte_cryptodev_config *conf
>  		ret = roc_ae_ec_grp_get(vf->ec_grp);
>  		if (ret) {
>  			plt_err("Could not get EC grp table");
> -			roc_ae_fpm_put();
> -			return ret;
> +			goto fpm_put;
> +		}
> +
> +		if (roc_model_is_cn20k()) {
> +			ret = roc_re_ml_zeta_get(vf->cnxk_ml_iova);
> +			if (ret) {
> +				plt_err("Could not initialize RE ML lookup table");
> +				goto ec_grp_put;
> +			}
>  		}

Does the error path properly clean up all resources? The goto chain
ec_grp_put->fpm_put looks correct, but does the success path in
cnxk_cpt_dev_clear() correctly call roc_re_ml_zeta_put() before
roc_ae_ec_grp_put()? Reviewing cnxk_cpt_dev_clear(), it does call
roc_re_ml_zeta_put() before roc_ae_ec_grp_put(), which is the reverse
order of acquisition.

[ ... ]

> @@ -1007,6 +1043,7 @@ cnxk_ae_session_cfg(struct rte_cryptodev *dev, struct rte_crypto_asym_xform *xfo
>  
>  	priv->cpt_inst_w7 = w7.u64;
>  	priv->cnxk_fpm_iova = vf->cnxk_fpm_iova;
> +	priv->cnxk_ml_iova = vf->cnxk_ml_iova;

Does cnxk_ae_session_cfg() correctly handle the ML IOVA pointer for
non-CN20K platforms? The code unconditionally assigns vf->cnxk_ml_iova
to priv->cnxk_ml_iova even when roc_model_is_cn20k() is false, meaning
the ML table was never initialized. Can this lead to use of an
uninitialized pointer in session operations on non-CN20K platforms?
Previous message (by thread): |WARNING| pw164558 [PATCH] raw/gdtc : update MAINTAINERS entry
Next message (by thread): |SUCCESS| net/zxdh : update MAINTAINERS entry
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the test-report mailing list