feature: Add validation layer Timing Checker for host-side API timing#481
feature: Add validation layer Timing Checker for host-side API timing#481MichalMrozek wants to merge 1 commit into
Conversation
There was a problem hiding this comment.
Pull request overview
Adds a new validation-layer “Timing Checker” that measures host-side (CPU) duration of Level Zero API calls and reports per-API aggregated statistics, with optional live stderr logging and CSV export. This fits into the validation layer’s existing checker framework (global checker registration + generated per-API overrides).
Changes:
- Introduces a new
timingchecker (engine + registration) that records Prologue/Epilogue timestamps and aggregates per-function timing stats, printing a teardown summary and optionally exporting CSV. - Wires build-time generation of per-API timing override headers via a new Mako template and updates the code generator to ensure output directories exist.
- Adds documentation and loader/validation-layer unit tests to validate transparency and live per-call timing output.
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| test/loader_validation_layer.cpp | Adds gtest coverage for timing checker transparency and live per-call output parsing |
| test/CMakeLists.txt | Registers new timing-checker-focused CTest entries with required env vars |
| source/layers/validation/README.md | Documents enabling and usage of the new timing checker mode |
| source/layers/validation/checkers/timing/zel_timing_checker.h | Declares timing checker and generated entrypoint wrappers |
| source/layers/validation/checkers/timing/zel_timing_checker.cpp | Implements timestamping, aggregation, stderr output, CSV writing, and handler registration |
| source/layers/validation/checkers/timing/zel_global_timing_state.h | Defines shared aggregation state and APIs for timing collection/output |
| source/layers/validation/checkers/timing/timing_checker.md | Adds detailed user documentation and examples for summary/CSV/live modes |
| source/layers/validation/checkers/timing/CMakeLists.txt | Adds timing checker sources to the validation layer build |
| source/layers/validation/checkers/CMakeLists.txt | Hooks the new timing checker subdirectory into the checker build |
| scripts/templates/validation/timing.h.mako | Adds generated Prologue/Epilogue overrides that call into GlobalTimingState |
| scripts/generate_code.py | Registers timing template output and ensures generated output subdirs are created |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| static LARGE_INTEGER frequency = {}; | ||
| if (frequency.QuadPart == 0) { | ||
| QueryPerformanceFrequency(&frequency); | ||
| } | ||
| LARGE_INTEGER ticks; | ||
| QueryPerformanceCounter(&ticks); | ||
| if (frequency.QuadPart == 0) { | ||
| return 0; | ||
| } | ||
| return static_cast<uint64_t>(ticks.QuadPart) * (NSEC_IN_SEC / static_cast<uint64_t>(frequency.QuadPart)); | ||
| #else |
There was a problem hiding this comment.
Fixed in d50e8a3. The Windows path now converts ticks to nanoseconds without truncation or overflow by splitting into whole seconds plus the sub-second remainder: (t / freq) * NSEC_IN_SEC + ((t % freq) * NSEC_IN_SEC) / freq. This stays exact when the counter frequency does not evenly divide 1e9 and avoids the ticks * 1e9 overflow.
| timing_checker.zetValidation = zetChecker; | ||
| timing_checker.zesValidation = zesChecker; | ||
| timing_checker.zerValidation = zerChecker; | ||
| validation_layer::context.getInstance().validationHandlers.push_back(&timing_checker); |
There was a problem hiding this comment.
Fixed in d50e8a3. The checker is now inserted at the front of validationHandlers (handlers.insert(handlers.begin(), &timing_checker)) instead of push_back, so its Prologue/Epilogue run first and every call is measured even when a later handler's Epilogue returns a non-success result.
Add a new validation-layer checker (ZEL_ENABLE_TIMING_CHECKER) that measures the host-side (CPU) duration of every Level Zero API call and aggregates per-API statistics (call count, total, min, max, average, and percentage share of total host time). For each API the checker stamps a high-resolution monotonic timestamp in the Prologue and reads it again in the Epilogue (QueryPerformanceCounter on Windows, CLOCK_MONOTONIC_RAW elsewhere). The checker is registered at the front of the validation handler list so it measures every call, even when a later handler reports an error. All APIs are covered through per-API override headers generated from scripts/templates/validation/timing.h.mako (wired into scripts/generate_code.py) and checked in under the checker's generated/ directory, consistent with the other generated validation-layer files. Output is written directly to stderr and is independent of the loader logging system, so no ZEL_*_LOGGING variables are required. Three independently controlled modes: - summary table printed at teardown, sorted by percentage share (default) - ZEL_TIMING_CHECKER_CSV: export per-API stats to a PID-suffixed CSV - ZEL_TIMING_CHECKER_LIVE: print each call's duration as it happens Adds README documentation, a dedicated usage guide (checkers/timing/timing_checker.md), and unit tests asserting both result transparency and correct per-API timing data. Signed-off-by: Michal Mrozek <michal.mrozek@intel.com>
252b248 to
d50e8a3
Compare
| putenv_safe(const_cast<char *>("ZE_ENABLE_VALIDATION_LAYER=1")); | ||
| putenv_safe(const_cast<char *>("ZEL_ENABLE_TIMING_CHECKER=1")); | ||
| putenv_safe(const_cast<char *>("ZEL_TIMING_CHECKER_LIVE=1")); | ||
| putenv_safe(const_cast<char *>("ZE_ENABLE_NULL_DRIVER=1")); | ||
| putenv_safe(const_cast<char *>("ZEL_TEST_NULL_DRIVER_TYPE=GPU")); |
There was a problem hiding this comment.
While this works, we have typically put putting the environ config within the CMakeFile.txt within the /test folder. So this may be duplicated.
Summary
Adds a new validation-layer checker, enabled with
ZEL_ENABLE_TIMING_CHECKER=1, that measures the host-side (CPU) duration of every Level Zero API call and aggregates per-API statistics: call count, total, min, max, average, and percentage share of total host time.For each API the checker stamps a high-resolution monotonic timestamp in the Prologue and reads it again in the Epilogue (
QueryPerformanceCounteron Windows,clock_gettime(CLOCK_MONOTONIC_RAW)elsewhere). The measured span is dominated by the underlying driver call and is consistent across calls, making it suitable for relative host-cost analysis.Output
Output is written directly to
stderrand is independent of the loader logging system, so theZEL_*_LOGGINGvariables are not required. Summary rows are sorted by percentage share of total host time, highest first.ZEL_ENABLE_TIMING_CHECKER0ZEL_TIMING_CHECKER_CSV=<path>ZEL_TIMING_CHECKER_LIVE0Example summary:
Changes
source/layers/validation/checkers/timing/(engine, registration, CMake) and a usage guidetiming_checker.md, referenced from the validation-layer README.scripts/templates/validation/timing.h.mako(wired intoscripts/generate_code.py) and are not checked in.test/loader_validation_layer.cppcovering result transparency and per-API timing data.