Skip to content

bench: per-chip TX throughput/latency harness; replace README "TX + RX" with real numbers#108

Closed
josephnef wants to merge 1 commit into
masterfrom
feat/throughput-benchmark
Closed

bench: per-chip TX throughput/latency harness; replace README "TX + RX" with real numbers#108
josephnef wants to merge 1 commit into
masterfrom
feat/throughput-benchmark

Conversation

@josephnef

Copy link
Copy Markdown
Collaborator

What

Replaces the generic "TX + RX" band cells in the README "Hardware landscape" table with measured devourer TX throughput, and adds the harness that produces it.

New: tests/bench_tput.py

Per-chip TX throughput + per-frame latency, devourer vs the host kernel driver, across bands (2.4 / UNII-1 / UNII-2·3) and PSDU sizes (1500 / 3994 B). Reuses regress.py for DUT discovery, kernel bind/unbind, USB power-cycle, process hygiene and log parsers. Resumable, --quick smoke, CSV + markdown output.

Method — the clean metric: TX rate = usbmon bulk-OUT completions at the source chip (true frames-accepted rate). Counting frames at a sniffer instead measures the sniffer's RX ceiling (~336 fps here) — a trap that makes every transmitter look identical. devourer has no host-side TX backpressure (it pipelines URBs), so latency is taken from a separate non-saturating pass.

Driver/injector support

  • txdemo: DEVOURER_TX_PAYLOAD_BYTES=N pads the 802.11 PSDU to N bytes (on-wire N+40; PKT_SIZE is 16-bit, so 3994 fits).
  • inject_beacon.py: --size N + --max-rate (blocking AF_PACKET blaster ≈ kernel TX-completion rate).

Headline results (HT MCS7, 20 MHz, monitor injection)

Band Part TX dev 1500/3994 (Mbps) TX ker 1500/3994 lat dev (µs)
2.4 GHz RTL8812AU 46 / 58 0.9 / — 116
2.4 GHz RTL8814AU 0.3 / 22 ⚠ 1.0 / — 29
2.4 GHz RTL8821AU 41 / 60 0.8 / — 128
UNII-1 RTL8812AU 49 / 61 5.8 / — 116
UNII-2/3 RTL8812AU 49 / 62 5.9 / — 115
  • devourer direct-USB TX is 8–60× faster than kernel AF_PACKET monitor injection (it pipelines bulk-OUT URBs; the kernel path blocks on the TX ring per frame).
  • The kernel monitor path can't inject 3994 B frames at all (AF_PACKET > iface MTU); devourer sends them at 58–62 Mbps.
  • Throughput scales with frame size; devourer per-frame latency 17–128 µs.
  • 8814 TX is the family's least reliable (high run-to-run variance — flagged ⚠).
  • 8821AU 2.4 GHz on par with 8812; ~½ throughput at 5 GHz UNII (documented UNII-2/3 asymmetry).

RX — honestly, not tabulated

RX throughput cannot be measured cleanly on a 2-USB-bus bench: same-bus TX/RX pairs (8812 + 8821 share a host controller) contend, the only reliable cross-bus flooder (8812 → 8814) saturates the receiver at full TX rate, and the 8814 RX path is itself intermittent. A clean cross-bus moderate-rate flood (8812 → 8814) does receive ~3100 frames / 12 s, confirming RX works; a capacity number needs a 3-bus rig or a calibrated SDR transmitter. Full caveats in tests/README.md.

Test

cmake --build build; ctest green. Benchmark: sudo tests/bench_tput.py --quick, then sudo tests/bench_tput.py --directions tx. Run on RTL8812AU/8814AU/8821AU.

🤖 Generated with Claude Code

…X" with numbers

Adds tests/bench_tput.py — a per-chip TX throughput + per-frame latency
benchmark (devourer vs host kernel driver) across bands (2.4/UNII-1/UNII-2-3)
and PSDU sizes (1500 / 3994 B). TX rate is measured from usbmon bulk-OUT
completions at the source chip (the true frames-accepted rate; counting at a
sniffer measures the sniffer's RX ceiling instead — a trap). Reuses regress.py
for DUT discovery, kernel bind/unbind, USB power-cycle, process hygiene and log
parsers.

Driver/injector support:
- txdemo: DEVOURER_TX_PAYLOAD_BYTES=N pads the 802.11 PSDU to N bytes (on-wire
  N+40; PKT_SIZE is 16-bit) so we can TX 1500/3994 B frames.
- inject_beacon.py: --size N (matching sized PSDU) and --max-rate (blocking
  AF_PACKET blaster ~= the kernel TX-completion rate).

README "Hardware landscape": the generic "TX + RX" band cells are replaced with
measured devourer TX throughput (Mbps @ 1500 / 3994 B), plus a Measured
throughput subsection with the kernel-driver comparison and latency.

Headline results (HT MCS7, 20 MHz, monitor injection): devourer direct-USB TX is
8-60x faster than kernel AF_PACKET monitor injection (e.g. 8812 2.4 GHz: 46 vs
0.9 Mbps); the kernel monitor path cannot inject 3994 B frames at all (AF_PACKET
> MTU) while devourer hits 58-62 Mbps; throughput scales with frame size;
devourer per-frame latency 17-128 us; the 8814 TX path is the family's least
reliable (high variance, flagged). RX is not tabulated — it cannot be measured
cleanly on a 2-USB-bus rig (same-bus contention, flooder saturation, flaky 8814
RX); methodology + caveats in tests/README.md.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant