Skip to content

feat: Add L2-L5 CPU kernels (WHT, FWHT/ACDC, Tropical, HRR) with dispatch integration#570

Open
peder1981 wants to merge 1 commit into
microsoft:mainfrom
peder1981:pr/1-kernels
Open

feat: Add L2-L5 CPU kernels (WHT, FWHT/ACDC, Tropical, HRR) with dispatch integration#570
peder1981 wants to merge 1 commit into
microsoft:mainfrom
peder1981:pr/1-kernels

Conversation

@peder1981

Copy link
Copy Markdown

Add L2–L5 algebraic kernels for CPU-only 1.58-bit inference

This PR adds four new algebraic kernels for the CPU-only inference path:

Level Algebra Kernel Saves
L2 Walsh–Hadamard (no multiplications) ggml-bitnet-wht Replaces 256 maddubs with adds/subs in vec_dot
L3 ACDC (FWHT + diagonal) ggml-bitnet-fwht O(n log n) GEMV; needs ACDC-diagonalizable W
L4 Tropical (max, +) ggml-bitnet-tropical O(n·d + K·d) attention via top-K softmax over keys
L5 Holographic Reduced Repr. (FFT) ggml-bitnet-hrr d-dim vector stores N ≪ d "memories"

Files

  • src/ggml-bitnet-{wht,fwht,tropical,hrr,common,dispatch,kv-cache,rag}.cpp — kernel implementations
  • include/ggml-bitnet-{wht,fwht,tropical,hrr,common,dispatch,kv-cache,rag}.h — headers
  • CMakeLists.txt, src/CMakeLists.txt, src/ggml-bitnet-mad.cpp — build integration
  • patches/llama.cpp/ (5 patches) + scripts/apply-dispatch-patches.sh — Llama dispatch
  • .gitmodulesignore = dirty for local patch workflow

Design

  • All kernels are opt-in via env vars (default = untouched I2_S GEMV)
  • No GPU, no telemetry, no cloud calls
  • Submodule pinned to 1f86f05 (same as upstream)

Part of a split from original PR #567. This is the core code portion.

@peder1981

Copy link
Copy Markdown
Author

CI Workflow — Approval Requested

The kernel-ci workflow (run #27916512618) is currently in action_required state, awaiting maintainer approval to execute.

What has been validated:

  • ✅ YAML syntax — parsed without errors
  • ✅ All 20+ referenced files (scripts, tests, patches, CMakeLists) exist on branch pr/1-kernels
  • ✅ 16 add_test entries confirmed in tests/CMakeLists.txt
  • ✅ Submodule Eddie-Wang1120/llama.cpp is public, branch merge-dev at commit 1f86f05 (matches patch base)
  • ✅ Python venv installs numpy, scipy, safetensors — covers all test imports
  • ✅ Air-gapped boot test skips gracefully (exit 0) when no model/binary is present
  • ✅ NO-06/NO-07 telemetry/cloud audits use || true + 2>/dev/null to avoid false positives
  • ✅ Cross-validation script is self-contained (numpy only), handles missing binaries gracefully
  • ✅ Runner ubuntu-24.04 has clang-18, libstdc++-14-dev, ninja-build available

Request: Could a maintainer please click "Approve and run" on the latest workflow run so CI can execute? The two older runs (#27915939133, #27916000655) from previous pushes are obsolete and have been cleaned up.

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant