|SUCCESS| [v1] crypto/cnxk: add ML crypto support
dpdklab at iol.unh.edu
dpdklab at iol.unh.edu
Sat May 30 02:00:13 CEST 2026
Test-Label: ai-code-review
Test-Status: SUCCESS
https://dpdk.org/patch/164561
_AI Code Review_
=== Patch Review: series.patch (via Claude) ===
Review date: 2026-05-29
I've reviewed the patch for adding ML-KEM and ML-DSA support to the DPDK cnxk crypto driver. Here are my findings:
commit (git sha not available - patch file format)
Author: Gowrishankar Muthukrishnan <gmuthukrishn at marvell.com>
crypto/cnxk: add ML crypto support
This patch adds ML-KEM and ML-DSA cryptographic algorithm support to the
cnxk driver for CN20K platform.
> diff --git a/drivers/common/cnxk/roc_platform.h b/drivers/common/cnxk/roc_platform.h
> index e22a50d47a..19f1d6005b 100644
> --- a/drivers/common/cnxk/roc_platform.h
> +++ b/drivers/common/cnxk/roc_platform.h
> @@ -217,6 +218,10 @@ plt_thread_is_valid(plt_thread_t thr)
> #define plt_memory_order_release rte_memory_order_release
> #define plt_memory_order_acquire rte_memory_order_acquire
> #define plt_memory_order_relaxed rte_memory_order_relaxed
> +#define plt_memory_order_seq_cst rte_memory_order_seq_cst
> +
> +#define plt_atomic_fetch_add_explicit rte_atomic_fetch_add_explicit
> +#define plt_atomic_fetch_sub_explicit rte_atomic_fetch_sub_explicit
Does this code add excessive abstraction layers for platform portability
that are not consistently applied across the codebase? The patch
introduces new plt_* wrappers for atomic operations while using
PLT_ATOMIC and plt_atomic_fetch_*_explicit directly in the ML tables
code. Should the existing PLT_ATOMIC macro be used more consistently
rather than adding separate wrappers?
[ ... ]
> diff --git a/drivers/common/cnxk/roc_re_ml_tables.c b/drivers/common/cnxk/roc_re_ml_tables.c
> new file mode 100644
> index 0000000000..933e35e3c9
> --- /dev/null
> +++ b/drivers/common/cnxk/roc_re_ml_tables.c
> +int
> +roc_re_ml_zeta_get(uint64_t *tbl)
> +{
> + int len = (RE_MLKEM_ZETA_LEN + RE_MLDSA_ZETA_LEN);
> + const char name[] = RE_ML_TBL_NAME;
> + const struct plt_memzone *mz;
> + struct re_ml_tbl *ml;
> + uint8_t *data;
> +
> + if (tbl == NULL)
> + return -EINVAL;
> +
> + mz = plt_memzone_lookup(name);
> + if (mz == NULL) {
> + /* Create memzone first time */
> + mz = plt_memzone_reserve_cache_align(name, sizeof(struct re_ml_tbl) + len);
> + if (mz == NULL)
> + return -ENOMEM;
> + }
> +
> + ml = (struct re_ml_tbl *)mz->addr;
> + if (plt_atomic_fetch_add_explicit(&ml->refcount, 1, plt_memory_order_seq_cst) != 0)
> + return 0;
Can roc_re_ml_zeta_get() have a race condition? If two threads
simultaneously call this function and both see refcount==0 before the
fetch_add, the lookup table could be initialized twice. The first
thread to execute fetch_add will see 0 and proceed with memcpy, but if
the second thread has already passed the lookup check and is also in the
memcpy section, does this corrupt the shared memory region?
> + data = PLT_PTR_ADD(mz->addr, sizeof(uint64_t));
> + memcpy(data, re_ml_zeta_tbl[0].data, re_ml_zeta_tbl[0].len);
> + tbl[0] = plt_cpu_to_be_64((uintptr_t)data);
> +
> + data = PLT_PTR_ADD(data, re_ml_zeta_tbl[0].len);
> + memcpy(data, re_ml_zeta_tbl[1].data, re_ml_zeta_tbl[1].len);
> + tbl[1] = plt_cpu_to_be_64((uintptr_t)data);
Does this code violate strict aliasing rules? The pointer arithmetic
using PLT_PTR_ADD on mz->addr, which points to struct re_ml_tbl, to
access raw data after the header could be problematic. Should the code
use the ml_tbl[] flexible array member from struct re_ml_tbl instead of
manual pointer arithmetic?
> +
> + return 0;
> +}
> +
> +void
> +roc_re_ml_zeta_put(void)
> +{
> + const char name[] = RE_ML_TBL_NAME;
> + const struct plt_memzone *mz;
> + struct re_ml_tbl *ml;
> +
> + mz = plt_memzone_lookup(name);
> + if (mz == NULL)
> + return;
> +
> + ml = (struct re_ml_tbl *)mz->addr;
> + if (plt_atomic_fetch_sub_explicit(&ml->refcount, 1, plt_memory_order_seq_cst) == 1)
> + plt_memzone_free(mz);
Can roc_re_ml_zeta_put() race with roc_re_ml_zeta_get()? If thread A
decrements refcount to 0 and is about to call plt_memzone_free, while
thread B calls roc_re_ml_zeta_get, thread B could look up the memzone,
increment refcount to 1, and proceed to use the memory just as thread A
frees it. Does this cause use-after-free?
[ ... ]
> diff --git a/drivers/crypto/cnxk/cnxk_ae.h b/drivers/crypto/cnxk/cnxk_ae.h
> index 21a0c8068a..691f9bfce5 100644
> --- a/drivers/crypto/cnxk/cnxk_ae.h
> +++ b/drivers/crypto/cnxk/cnxk_ae.h
> +static __rte_always_inline int __rte_hot
> +cnxk_ae_enqueue_ml_kem_op(struct rte_crypto_op *op, struct roc_ae_buf_ptr *meta_buf,
> + struct cnxk_ae_sess *sess, struct cpt_inst_s *inst)
> +{
> + size_t metabuf_len = cnxk_cpt_asym_get_mlen(), reqbuf_len;
[ ... ]
> + switch (mlkem->op) {
> + case RTE_CRYPTO_ML_KEM_OP_KEYGEN:
> + reqbuf_len = mlkem->keygen.d.length + mlkem->keygen.z.length;
> + if (reqbuf_len > (metabuf_len - dlen)) {
> + op->status = RTE_CRYPTO_OP_STATUS_INVALID_ARGS;
> + return -ENOMEM;
> + }
> +
> + memcpy(dptr, mlkem->keygen.d.data, mlkem->keygen.d.length);
> + dptr += mlkem->keygen.d.length;
> + memcpy(dptr, mlkem->keygen.z.data, mlkem->keygen.z.length);
> + dptr += mlkem->keygen.z.length;
Does cnxk_ae_enqueue_ml_kem_op() properly validate buffer lengths? The
code checks if reqbuf_len exceeds available space but does not validate
that mlkem->keygen.d.length and mlkem->keygen.z.length are within
expected bounds for the ML-KEM type. Can an oversized length value from
userspace cause memcpy to write beyond the metabuf boundary even after
the reqbuf_len check?
> +
> + dlen = mlkem->keygen.d.length + mlkem->keygen.z.length;
[ ... ]
> + case RTE_CRYPTO_ML_KEM_OP_ENCAP:
> + reqbuf_len = mlkem->encap.message.length + mlkem->encap.ek.length;
> + if (reqbuf_len > (metabuf_len - dlen)) {
> + op->status = RTE_CRYPTO_OP_STATUS_INVALID_ARGS;
> + return -ENOMEM;
> + }
> +
> + memcpy(dptr, mlkem->encap.message.data, mlkem->encap.message.length);
> + dptr += mlkem->encap.message.length;
> + memcpy(dptr, mlkem->encap.ek.data, mlkem->encap.ek.length);
Does the overflow check use the correct initial dlen value? In
cnxk_ae_enqueue_ml_kem_op(), dlen is initialized to 0 before the
switch statement, so the check '(metabuf_len - dlen)' is equivalent to
'metabuf_len'. However, if this function is extended or refactored, does
the pattern risk incorrect overflow checking?
[ ... ]
> +static __rte_always_inline int __rte_hot
> +cnxk_ae_enqueue_ml_dsa_op(struct rte_crypto_op *op, struct roc_ae_buf_ptr *meta_buf,
> + struct cnxk_ae_sess *sess, struct cpt_inst_s *inst)
> +{
> + size_t metabuf_len = cnxk_cpt_asym_get_mlen(), reqbuf_len;
[ ... ]
> + case RTE_CRYPTO_ML_DSA_OP_SIGN:
[ ... ]
> + if (!sign_deterministic) {
> + if (!mldsa->siggen.seed.length) {
> + op->status = RTE_CRYPTO_OP_STATUS_INVALID_ARGS;
> + return -EINVAL;
> + }
> +
> + memcpy(dptr, mldsa->siggen.seed.data, mldsa->siggen.seed.length);
> + dptr += mldsa->siggen.seed.length;
> + dlen += mldsa->siggen.seed.length;
> + }
> +
> + memcpy(dptr, mldsa->siggen.privkey.data, mldsa->siggen.privkey.length);
> + dptr += mldsa->siggen.privkey.length;
> + dlen += mldsa->siggen.privkey.length;
Does cnxk_ae_enqueue_ml_dsa_op() check the total accumulated length
correctly? The code accumulates data into dlen without rechecking
against metabuf_len after each addition. The single reqbuf_len check at
the start covers all fields, but if any field length is maliciously or
accidentally larger than expected, can this overflow the meta buffer?
[ ... ]
> diff --git a/drivers/crypto/cnxk/cnxk_cryptodev_ops.c b/drivers/crypto/cnxk/cnxk_cryptodev_ops.c
> @@ -182,8 +196,15 @@ cnxk_cpt_dev_config(struct rte_cryptodev *dev, struct rte_cryptodev_config *conf
> ret = roc_ae_ec_grp_get(vf->ec_grp);
> if (ret) {
> plt_err("Could not get EC grp table");
> - roc_ae_fpm_put();
> - return ret;
> + goto fpm_put;
> + }
> +
> + if (roc_model_is_cn20k()) {
> + ret = roc_re_ml_zeta_get(vf->cnxk_ml_iova);
> + if (ret) {
> + plt_err("Could not initialize RE ML lookup table");
> + goto ec_grp_put;
> + }
> }
Does the error path properly clean up all resources? The goto chain
ec_grp_put->fpm_put looks correct, but does the success path in
cnxk_cpt_dev_clear() correctly call roc_re_ml_zeta_put() before
roc_ae_ec_grp_put()? Reviewing cnxk_cpt_dev_clear(), it does call
roc_re_ml_zeta_put() before roc_ae_ec_grp_put(), which is the reverse
order of acquisition.
[ ... ]
> @@ -1007,6 +1043,7 @@ cnxk_ae_session_cfg(struct rte_cryptodev *dev, struct rte_crypto_asym_xform *xfo
>
> priv->cpt_inst_w7 = w7.u64;
> priv->cnxk_fpm_iova = vf->cnxk_fpm_iova;
> + priv->cnxk_ml_iova = vf->cnxk_ml_iova;
Does cnxk_ae_session_cfg() correctly handle the ML IOVA pointer for
non-CN20K platforms? The code unconditionally assigns vf->cnxk_ml_iova
to priv->cnxk_ml_iova even when roc_model_is_cn20k() is false, meaning
the ML table was never initialized. Can this lead to use of an
uninitialized pointer in session operations on non-CN20K platforms?
More information about the test-report
mailing list