feat: Add L2-L5 CPU kernels (WHT, FWHT/ACDC, Tropical, HRR) with dispatch integration by peder1981 · Pull Request #570 · microsoft/BitNet

peder1981 · 2026-06-21T19:15:58Z

Add L2–L5 algebraic kernels for CPU-only 1.58-bit inference

This PR adds four new algebraic kernels for the CPU-only inference path:

Level	Algebra	Kernel	Saves
L2	Walsh–Hadamard (no multiplications)	`ggml-bitnet-wht`	Replaces 256 maddubs with adds/subs in `vec_dot`
L3	ACDC (FWHT + diagonal)	`ggml-bitnet-fwht`	O(n log n) GEMV; needs ACDC-diagonalizable W
L4	Tropical (max, +)	`ggml-bitnet-tropical`	O(n·d + K·d) attention via top-K softmax over keys
L5	Holographic Reduced Repr. (FFT)	`ggml-bitnet-hrr`	d-dim vector stores N ≪ d "memories"

Files

src/ggml-bitnet-{wht,fwht,tropical,hrr,common,dispatch,kv-cache,rag}.cpp — kernel implementations
include/ggml-bitnet-{wht,fwht,tropical,hrr,common,dispatch,kv-cache,rag}.h — headers
CMakeLists.txt, src/CMakeLists.txt, src/ggml-bitnet-mad.cpp — build integration
patches/llama.cpp/ (5 patches) + scripts/apply-dispatch-patches.sh — Llama dispatch
.gitmodules — ignore = dirty for local patch workflow

Design

All kernels are opt-in via env vars (default = untouched I2_S GEMV)
No GPU, no telemetry, no cloud calls
Submodule pinned to 1f86f05 (same as upstream)

Part of a split from original PR #567. This is the core code portion.

peder1981 · 2026-06-27T19:59:40Z

CI Workflow — Approval Requested

The kernel-ci workflow (run #27916512618) is currently in action_required state, awaiting maintainer approval to execute.

What has been validated:

✅ YAML syntax — parsed without errors
✅ All 20+ referenced files (scripts, tests, patches, CMakeLists) exist on branch pr/1-kernels
✅ 16 add_test entries confirmed in tests/CMakeLists.txt
✅ Submodule Eddie-Wang1120/llama.cpp is public, branch merge-dev at commit 1f86f05 (matches patch base)
✅ Python venv installs numpy, scipy, safetensors — covers all test imports
✅ Air-gapped boot test skips gracefully (exit 0) when no model/binary is present
✅ NO-06/NO-07 telemetry/cloud audits use || true + 2>/dev/null to avoid false positives
✅ Cross-validation script is self-contained (numpy only), handles missing binaries gracefully
✅ Runner ubuntu-24.04 has clang-18, libstdc++-14-dev, ninja-build available

Request: Could a maintainer please click "Approve and run" on the latest workflow run so CI can execute? The two older runs (#27915939133, #27916000655) from previous pushes are obsolete and have been cleaned up.

Thank you!

feat: L2-L5 kernels + CMake + 16 tests + CI + fix test import

bc00b82

peder1981 force-pushed the pr/1-kernels branch from 9a35702 to bc00b82 Compare June 21, 2026 20:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Add L2-L5 CPU kernels (WHT, FWHT/ACDC, Tropical, HRR) with dispatch integration#570

feat: Add L2-L5 CPU kernels (WHT, FWHT/ACDC, Tropical, HRR) with dispatch integration#570
peder1981 wants to merge 1 commit into
microsoft:mainfrom
peder1981:pr/1-kernels

peder1981 commented Jun 21, 2026

Uh oh!

peder1981 commented Jun 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

peder1981 commented Jun 21, 2026

Add L2–L5 algebraic kernels for CPU-only 1.58-bit inference

Files

Design

Uh oh!

peder1981 commented Jun 27, 2026

CI Workflow — Approval Requested

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant