|WARNING| [2/2] app/test: add stress tests for rwlock and pflock

dpdklab at iol.unh.edu dpdklab at iol.unh.edu
Wed Jun 10 20:55:03 CEST 2026


Test-Label: ai-code-review
Test-Status: WARNING
https://dpdk.org/patch/165258

_AI Code Review_

=== Patch Review: series.patch (via Claude) ===
Review date: 2026-06-10

# DPDK Patch Review

## Patch 1/2: eal/pflock: add API to downgrade from wr to rd lock

### Errors

None identified.

### Warnings

1. **Missing experimental API marking**
   The new function `rte_pflock_write_downgrade()` is a public API addition but lacks the `__rte_experimental` tag.
   New API functions must be marked experimental.
   ```c
   __rte_experimental
   static inline void
   rte_pflock_write_downgrade(rte_pflock_t *pf)
   ```

2. **Missing release notes update**
   This patch adds a new public API function but does not include an update to the release notes (`doc/guides/rel_notes/`).
   API additions require documentation in the release notes.

3. **Missing Doxygen for caller preconditions**
   The function comment does not document the precondition that the caller must already hold a write lock.
   While the text mentions "Note that caller is expected to obtain wrlock," this should be in the Doxygen `@param` or `@note` section for clarity.
   ```c
   /**
    * Release a pflock held for writing, while keeping lock for reading.
    *
    * @note Caller must hold the write lock before calling this function.
    *
    * @param pf
    *   A pointer to a pflock structure.
    */
   ```

4. **Memory ordering justification**
   The first atomic operation uses `rte_memory_order_acq_rel` which is stronger than necessary for a simple increment.
   Consider whether `rte_memory_order_release` suffices, or document why acquire semantics are needed when adding a reader.

5. **Missing symbol export macro**
   If this function is intended to be non-inline (though it is static inline here), it would need an `RTE_EXPORT_EXPERIMENTAL_SYMBOL` macro in the `.c` file.
   Since it's inline, this is acceptable, but verify the ABI policy for inline functions.

6. **Trailing blank lines**
   The patch adds two blank lines at the end of the file.
   Files should end with exactly one newline.

### Info

- The atomic operations appear correct for downgrading a write lock to a read lock: migrating to read phase, clearing the write bit, then releasing the write lock.

---

## Patch 2/2: app/test: add stress tests for rwlock and pflock

### Errors

1. **Use-after-free in error handling (Critical)**
   In `handle_error()`, when `write_lock == false`, the function does not unlock the reader but the comment says "Don't unlock here as it's already unlocked by the calling function."
   However, in `handle_reader_work()` at line 224, after detecting `s->counter != local_counter`, the code calls `handle_error()` WITHOUT unlocking first, then returns.
   This leaves the read lock held.
   The same issue occurs at line 215 when array verification fails.
   ```c
   /* Bad - lock still held */
   if (s->counter_array[i] != local_counter) {
       handle_error(s, lcore_id, false, __func__, __LINE__);
       return -1;
   }
   
   /* Good - unlock before error */
   if (s->counter_array[i] != local_counter) {
       s->lock.ops->read_unlock(&s->lock);
       handle_error(s, lcore_id, false, __func__, __LINE__);
       return -1;
   }
   ```

2. **Incorrect write-downgrade path unlock**
   In `handle_writer_work()`, the `DOWNGRADE_TEST` path calls `write_downgrade()` then `read_unlock()`.
   However, if array verification fails inside the downgraded read section (line 188-191), `handle_error()` is called with `write_lock = false`, which does NOT unlock.
   So the read lock is leaked when downgrade verification fails.

3. **Undefined behavior: data race on shared counter**
   The `counter` and `counter_array` fields are marked `volatile`, but the code accesses them from multiple threads without atomic operations.
   `volatile` does not provide atomicity or memory ordering.
   These should use `rte_atomic_load_explicit()` and `rte_atomic_store_explicit()`.
   ```c
   /* Current (wrong) */
   volatile uint64_t counter;
   s->counter += delta;
   
   /* Should be atomic or document that the lock protects them */
   ```
   Note: If the intent is that the lock always protects these fields, then `volatile` is unnecessary and should be removed.
   The test appears to rely on lock protection, so `volatile` is misleading.

4. **Missing error propagation in worker function**
   In `lcore_function()`, when `handle_writer_work()` or `handle_reader_work()` returns `-1`, the code does `continue` instead of breaking the loop or returning.
   This means after an error is detected and `s->stop` is set, the thread keeps running until `s->stop` is observed in the loop condition, potentially performing more incorrect operations.

5. **Implicit integer narrowing in percentage calculation**
   Line 136: `uint32_t readers_pct = (num_readers * 100) / (total_lcores - 1);`
   If `num_readers` is close to `UINT32_MAX`, `num_readers * 100` will overflow before the division.
   Use a 64-bit intermediate: `(uint64_t)num_readers * 100`.

6. **Missing bounds check on lcore_id**
   Throughout the code, `s->stats[lcore_id]` is accessed without verifying `lcore_id < RTE_MAX_LCORE`.
   While DPDK's `rte_lcore_id()` should never return an out-of-range value, defensive programming or an assertion would prevent potential out-of-bounds access.

### Warnings

1. **Missing release notes update**
   New tests are added but no release notes entry.
   While test-only changes typically don't require release notes, stress tests that validate new API (the `write_downgrade` function) should be mentioned in the context of the API's release notes entry.

2. **Inconsistent error handling between reader and writer**
   The `handle_error()` function unlocks for writers but not readers, with a comment explaining readers are "already unlocked by the calling function."
   This is fragile -- the calling function at line 215 and 224 does NOT unlock before calling `handle_error()`.
   Refactor so `handle_error()` consistently handles unlocking based on the lock state.

3. **Missing test registration documentation**
   The `REGISTER_STRESS_TEST` macro is used but not defined in the patch.
   Verify this macro exists and is the correct registration mechanism for stress tests.

4. **Shared state not declared in shared memory**
   `struct rwlock_test_shared` is allocated on the stack in `test_rwlock_stress_impl()` and passed to remote lcores.
   Stack-allocated memory is process-private.
   Multi-process stress tests would fail.
   If only single-process testing is intended, this is acceptable but should be documented.
   If multi-process is intended, use `rte_zmalloc()`.

5. **Test duration hardcoded**
   `TEST_DURATION_SEC` is fixed at 5 seconds.
   Consider making this configurable or documenting why 5 seconds is the right duration.

6. **Missing `const` on ops tables**
   `pflock_ops` and `rwlock_ops` are function pointer arrays that are never modified at runtime.
   They should be declared `const`:
   ```c
   static const struct rwlock_ops pflock_ops = {
   ```
   (Both are already `const` -- this is correct, no issue.)

7. **Function pointer `write_downgrade` can be NULL**
   In `handle_writer_work()`, the code checks `if (s->lock.ops->write_downgrade)` before calling it.
   However, the `rwlock_ops` table does not provide this function pointer, leaving it NULL.
   This is correct, but the code should document that NULL means "downgrade not supported" and that `DOWNGRADE_TEST` should only run on locks that provide this operation.

8. **Missing header documentation**
   The header `test_rwlock_stress_impl.h` lacks a high-level comment describing its purpose (generic stress test infrastructure) and usage pattern (include and define ops table).

9. **Potential integer overflow in acquire_time**
   `acquire_time` accumulates the maximum wait time across all operations.
   If wait times are consistently large, `uint64_t` should be sufficient, but document the units (TSC cycles).

10. **Statistics structure padding**
    `struct lcore_stats` is marked `__rte_cache_aligned` to avoid false sharing.
    Verify each field is accessed only by its owning lcore to prevent cross-lcore cache coherency traffic.

### Info

- The test design (shared counter, array consistency checks, reader/writer roles) is sound for stress testing lock correctness.
- The use of a generic `rwlock_ops` interface to test both `pflock` and `rwlock` is good code reuse.
- Test registration uses `REGISTER_STRESS_TEST` which appears to be a custom macro not in the standard DPDK `REGISTER_FAST_TEST` / `REGISTER_PERF_TEST` set -- ensure this is defined in the test framework.

---

## Summary

**Patch 1/2**: Missing experimental tag, release notes, and Doxygen clarifications.

**Patch 2/2**: Critical correctness bugs in error-path lock handling (locks not released before returning) and inappropriate use of `volatile` instead of atomics (or remove `volatile` if locks always protect the fields). Also missing release notes.

Both patches require updates to release notes and API tagging. Patch 2/2 requires fixing error-path unlocking and clarifying the memory model for shared counters.


More information about the test-report mailing list