test: add LCOWV2 feature flag and v2 LCOW test surface#2747
Open
shreyanshjain7174 wants to merge 2 commits into
Open
test: add LCOWV2 feature flag and v2 LCOW test surface#2747shreyanshjain7174 wants to merge 2 commits into
shreyanshjain7174 wants to merge 2 commits into
Conversation
Introduce a `-feature LCOWV2` gate across the functional and cri-containerd test suites so the v2 LCOW controller can be exercised end-to-end without disturbing the existing v1 LCOW pipeline. Functional suite: * LCOWV2 implies LCOW in TestMain so featureLCOW-gated tests are reachable, then defaultLCOWOptions calls requireV1Only to short- circuit every v1 path cleanly. Net effect: only TestLCOW_V2_* runs. * Add helpers_v2_test.go and lcow_v2_test.go covering the v2 surface via internal/builder/vm/lcow + the v2 controller in-process. * Export LCOWBootFilesPath from test/pkg/uvm so v2 tests can resolve boot files without going through v1 *uvm.OptionsLCOW. cri-containerd suite: * Mirror the LCOWV2-implies-LCOW pattern; thread RuntimeHandler onto the CRI ImageSpec when pulling LCOW images so containerd selects the windows-lcow snapshotter and linux/amd64 platform (the sandbox- platform label alone is not honored by containerd >=2.0). * Add lcow_v2_test.go and the runhcs-lcow-v2 runtime handler constant. Flag plumbing: * Add IncludesExplicit and Include to IncludeExcludeStringSet so test TestMain hooks can implement feature implications safely after flag.Parse without breaking default-when-unset semantics. CI: * New `Build and run functional testing binary (LCOWV2)` step that invokes functional.test.exe -feature=LCOWV2. continue-on-error while the v2 surface is being grown. CRI v2 testing is intentionally deferred to a follow-up alongside the integration-tests v2 setup. * Build and upload containerd-shim-lcow-v2.exe as a test artifact. Signed-off-by: Shreyansh Sancheti <shsancheti@microsoft.com>
c90f032 to
b034f3f
Compare
v2.1.0 fails to typecheck cross-module imports from test/ that resolve back into the parent module via a replace directive, producing: could not import github.com/Microsoft/hcsshim/internal/... (-: build constraints exclude all Go files in ...) This only affects the (windows, test) lint matrix entry; the same imports typecheck cleanly under (windows, "") and locally with v2.11.x. v2.5+ resolves this. Signed-off-by: Shreyansh Sancheti <shsancheti@microsoft.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Adds an opt-in
-feature LCOWV2gate across the functional and cri-containerd test suites so the in-progress v2 LCOW controller (internal/controller/vm+internal/builder/vm/lcow) can be exercised end-to-end without disturbing the existing v1 LCOW pipeline.Why
The v2 controller landed across recent PRs (#2627, #2629, the VM SCSI/VPMem controllers, etc.) but lacked a dedicated functional surface. Existing LCOW tests are tightly coupled to v1
*uvm.OptionsLCOWand calltestuvm.CreateLCOW/CreateAndStartLCOWFromOpts— they cannot drive the v2 controller without refactoring every test. A feature flag lets us:TestLCOW_V2_*;LCOWV2gate select the v2 scenarios both here and in the downstream pipelines that actually run LCOW UVMs (see "Where v2 is actually validated" below).Changes
Functional suite (
test/functional)LCOWV2impliesLCOWinTestMainsofeatureLCOW-gated tests are reachable; thendefaultLCOWOptionscallsrequireV1Onlyand every v1 path short-circuits cleanly. Net effect under-feature LCOWV2: only the newTestLCOW_V2_*tests actually run.helpers_v2_test.goandlcow_v2_test.gocovering the v2 surface viainternal/builder/vm/lcow+ the v2 controller in-process.LCOWBootFilesPathfromtest/pkg/uvmso v2 tests can resolve boot files without going through v1*uvm.OptionsLCOW.cri-containerd suite (
test/cri-containerd)TestMain.RuntimeHandleronto the CRIImageSpecwhen pulling LCOW images so containerd selects thewindows-lcowsnapshotter andlinux/amd64platform (thesandbox-platformlabel alone is not honored by containerd ≥ 2.0).runhcs-lcow-v2runtime handler constant andlcow_v2_test.goscaffold. This handler name is the integration contract with the downstream deploy configuration that registers the v2 shim — keeping it identical here and there means the same-feature LCOWV2test set runs in both places.Flag plumbing (
test/pkg/flag)IncludesExplicitandIncludeonIncludeExcludeStringSetso testTestMainhooks can implement feature implications safely afterflag.Parsewithout breaking default-when-unset semantics.CI (
.github/workflows/ci.yml)Build and run functional testing binary (LCOWV2)step that invokesfunctional.test.exe -feature=LCOWV2 -exclude=LCOWIntegrity. Markedcontinue-on-error: true— see below.containerd-shim-lcow-v2.exeas a test artifact so downstream pipelines can consume it.Where v2 is actually validated (testing architecture)
This is the part reviewers usually ask about, so calling it out explicitly:
-exclude=LCOW,LCOWIntegrity, "Windows uVM tests will be run on 1ES runner pool"). The newLCOWV2step is in the same boat: it builds the v2 shim and exercises the in-process/host-side surface, but anything requiring a live LCOW UVM is skipped on these runners. That is why it iscontinue-on-error: trueand not a merge gate.hcsshimbuild artifacts (includingcontainerd-shim-lcow-v2.exefrom this PR) and run the functional/CloudTest lanes on hosts with nested virt (WS2022 / WS2025 images). Those lanes select the v2 scenarios through the sameLCOWV2feature gate this PR introduces.LCOWV2step stays a non-blocking smoke build and the real signal comes from the downstream pipelines.In short: this PR provides the test scaffolding + feature gate; the downstream pipelines provide the execution environment. The two are kept in sync by the shared
LCOWV2flag and therunhcs-lcow-v2handler name.Out of scope (follow-up)
Test_V2_LCOW_*intest/cri-containerd) requires a CI step that starts containerd withsnapshotter = "windows-lcow"set on both therunhcs-lcowANDrunhcs-lcow-v2runtime blocks. That step is deferred to a follow-up PR alongside the integration-tests v2 setup.TestLCOW_V2_*surface beyond the initial scaffold.LCOWV2step a hard gate — only meaningful once GitHub-triggered CloudTest (or a nested-virt runner pool) is available.Validation
functional.test.exe -feature=LCOWV2on a Windows host with nested virt.-feature LCOWinvocations route to the same code paths as before.LCOWV2gate are consumed by the internal ContainerPlat/azcri CloudTest lanes (WS2022/WS2025); that v2 lane is still being stabilized, which is the other reason the GitHub step is non-blocking.platforms.DefaultSpec()inimage_pull.goand ignores theruntime_platformsmapping; gating those tests on CI is intentionally deferred.DCO
Signed-off-by in commit.