[PATCH v2 0/9] riscv: implement accelerated crc using zbc
David Marchand
david.marchand at redhat.com
Fri Jul 12 19:19:57 CEST 2024
On Fri, Jul 12, 2024 at 5:47 PM Daniel Gregory
<daniel.gregory at bytedance.com> wrote:
>
> The RISC-V Zbc extension adds instructions for carry-less multiplication
> we can use to implement CRC in hardware. This patch set contains two new
> implementations:
>
> - one in lib/hash/rte_crc_riscv64.h that uses a Barrett reduction to
> implement the four rte_hash_crc_* functions
> - one in lib/net/net_crc_zbc.c that uses repeated single-folds to reduce
> the buffer until it is small enough for a Barrett reduction to
> implement rte_crc16_ccitt_zbc_handler and rte_crc32_eth_zbc_handler
>
> My approach is largely based on the Intel's "Fast CRC Computation Using
> PCLMULQDQ Instruction" white paper
> https://www.researchgate.net/publication/263424619_Fast_CRC_computation
> and a post about "Optimizing CRC32 for small payload sizes on x86"
> https://mary.rs/lab/crc32/
>
> Whether these new implementations are enabled is controlled by new
> build-time and run-time detection of the RISC-V extensions present in
> the compiler and on the target system.
>
> I have carried out some performance comparisons between the generic
> table implementations and the new hardware implementations. Listed below
> is the number of cycles it takes to compute the CRC hash for buffers of
> various sizes (as reported by rte_get_timer_cycles()). These results
> were collected on a Kendryte K230 and averaged over 20 samples:
>
> |Buffer | CRC32-ETH (lib/net) | CRC32C (lib/hash) |
> |Size (MB) | Table | Hardware | Table | Hardware |
> |----------|----------|----------|----------|----------|
> | 1 | 155168 | 11610 | 73026 | 18385 |
> | 2 | 311203 | 22998 | 145586 | 35886 |
> | 3 | 466744 | 34370 | 218536 | 53939 |
> | 4 | 621843 | 45536 | 291574 | 71944 |
> | 5 | 777908 | 56989 | 364152 | 89706 |
> | 6 | 932736 | 68023 | 437016 | 107726 |
> | 7 | 1088756 | 79236 | 510197 | 125426 |
> | 8 | 1243794 | 90467 | 583231 | 143614 |
>
> These results suggest a speed-up of lib/net by thirteen times, and of
> lib/hash by four times.
>
> I have also run the hash_functions_autotest benchmark in dpdk_test,
> which measures the performance of the lib/hash implementation on small
> buffers, getting the following times:
>
> | Key Length | Time (ticks/op) |
> | (bytes) | Table | Hardware |
> |------------|----------|----------|
> | 1 | 0.47 | 0.85 |
> | 2 | 0.57 | 0.87 |
> | 4 | 0.99 | 0.88 |
> | 8 | 1.35 | 0.88 |
> | 9 | 1.20 | 1.09 |
> | 13 | 1.76 | 1.35 |
> | 16 | 1.87 | 1.02 |
> | 32 | 2.96 | 0.98 |
> | 37 | 3.35 | 1.45 |
> | 40 | 3.49 | 1.12 |
> | 48 | 4.02 | 1.25 |
> | 64 | 5.08 | 1.54 |
Thanks for the submission.
This series comes late for v24.07 and there was no review, it is
deferred to v24.11.
Cc: Sachin for info.
--
David Marchand
More information about the dev
mailing list