Add Java low-level tool definition E2E test and skill [1/6]#1721
Conversation
Add a new Java failsafe integration test and replay snapshot that exercise the current explicit tool-definition APIs before ergonomic annotations are added. Related to issue #1682 but does not fix #1682. Changes: - Add \LowLevelToolDefinitionIT\ covering \create\, \createOverride\, \getArgumentsAs(record)\, \getArguments()\, and \ToolSet\ available tools - Add \ est/snapshots/tools/low_level_tool_definition.yaml\ with multi-turn tool call and final response replay conversations - Add \.github/skills/new-java-e2e-test-yaml-and-test\ skill documenting the workflow for creating new Java E2E tests with handcrafted YAML snapshots - Fix \ est/snapshots/abort/should_abort_during_active_streaming.yaml\ to handle cleared-history recovery request (adds second conversation entry for the 2-message recovery code path) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR adds baseline Java E2E (failsafe) coverage for the current “low-level” tool-definition APIs, along with a new replay-proxy snapshot and a reusable Copilot skill documenting how to add Java E2E tests backed by handcrafted YAML snapshots.
Changes:
- Added a new Java failsafe integration test that defines custom tools via
ToolDefinition.create(...)/createOverride(...)and validates tool execution via a replay snapshot. - Added a new replay-proxy YAML snapshot for the low-level tool-definition scenario.
- Added a new repository Copilot skill documenting the workflow for adding Java E2E tests + YAML snapshots.
- Extended an existing abort snapshot with an additional conversation entry for a recovery path.
Show a summary per file
| File | Description |
|---|---|
test/snapshots/tools/low_level_tool_definition.yaml |
New replay snapshot for low-level tool definition E2E flow (custom tool calls + results). |
test/snapshots/abort/should_abort_during_active_streaming.yaml |
Adds an additional conversation entry to cover an abort recovery path. |
java/src/test/java/com/github/copilot/LowLevelToolDefinitionIT.java |
New Java failsafe IT that configures the proxy snapshot and exercises low-level tool-definition APIs. |
.github/skills/new-java-e2e-test-yaml-and-test/SKILL.md |
New Copilot skill describing how to add Java E2E tests and craft YAML snapshots. |
.github/skills/new-java-e2e-test-yaml-and-test/examples.md |
Examples supporting the new skill documentation. |
Copilot's findings
- Files reviewed: 5/5 changed files
- Comments generated: 4
This comment has been minimized.
This comment has been minimized.
- Validate and assert search_items keyword in LowLevelToolDefinitionIT so getArguments() is meaningfully exercised. - Correct skill docs to require explicit snapshot base names (no camelCase to snake_case conversion assumption). - Correct replay matching description to 'next assistant message after matched request prefix' semantics. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
modified: .github/skills/java-coding-skill/SKILL.md - While working on #1721, I discovered and hereby fix this important omission. Signed-off-by: Ed Burns <edburns@microsoft.com>
Cross-SDK Consistency Review ✅I reviewed this PR against all six SDK implementations (Node.js, Python, Go, .NET, Rust, Java) for cross-language consistency. SummaryNo significant cross-SDK consistency gaps found. This PR is well-structured as part of an intentional incremental series. Changes analyzed
Feature parity check for APIs exercised in the new Java testAll APIs under test have consistent equivalents across SDKs (accounting for language idioms):
Abort snapshot fix (lines 31–37 added)This fix is additive — it adds a second conversation entry to handle the cleared-history recovery code path. All SDKs with abort tests (Go, Node.js, Python, .NET, Rust) share this snapshot. The change is backward-compatible: the first conversation still handles the normal case; the second is a fallback for when session history is cleared after abort. No action needed in other SDKs. The incremental PR structure (6 PRs, one per SDK) is a sound approach: the shared snapshot is already in place so PRs #2–#6 can each add their equivalent test independently.
|
* On branch edburns/java-add-spotless-to-java-coding-skill modified: .github/skills/java-coding-skill/SKILL.md - While working on #1721, I discovered and hereby fix this important omission. Signed-off-by: Ed Burns <edburns@microsoft.com> * Potential fix for pull request finding Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> --------- Signed-off-by: Ed Burns <edburns@microsoft.com> Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
* Add Node.js low-level tool-definition E2E test Related to issue #1682 but does not fix #1682. Align low_level_tool_definition coverage with PR #1721 snapshot behavior by only defining tools exercised by the shared snapshot. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Fix Node.js PR formatting and scope - Apply Prettier formatting to tools.e2e.test.ts so Node ubuntu format check passes. - Drop session lifecycle carryover from this PR by restoring Node session lifecycle files to upstream/main content, keeping this PR focused on low-level tool-definition coverage. Related to issue #1682 but does not fix #1682. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* Add Go low-level tool-definition E2E test Related to issue #1682 but does not fix #1682. Align low_level_tool_definition coverage with PR #1721 snapshot behavior by only defining tools exercised by the shared snapshot. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Address Go PR review suggestions for low-level tool test Synchronize handler-updated state with a mutex and move keyword assertion to the main test goroutine to avoid calling t.Fatalf from a tool handler goroutine. Related to issue #1682 but does not fix #1682. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* Add .NET low-level tool-definition E2E test Related to issue #1682 but does not fix #1682. Align low_level_tool_definition coverage with PR #1721 snapshot behavior by only defining tools exercised by the shared snapshot. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Fix .NET session lifecycle replay mismatch in PR 1728 Restore the second lifecycle prompt to 'Say world' to match the existing session_lifecycle snapshot and avoid replay cache misses in CI. Related to issue #1682 but does not fix #1682. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* Add Rust low-level tool-definition E2E test Related to issue #1682 but does not fix #1682. Align low_level_tool_definition coverage with PR #1721 snapshot behavior by only defining tools exercised by the shared snapshot. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix: revert session_lifecycle.rs Say hi -> Say world to match snapshot The snapshot expects 'Say world' but the branch had changed it to 'Say hi', causing 'No cached response found' failures across all three OS variants. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Summary
Adds a new Java failsafe integration test (LowLevelToolDefinitionIT) and the accompanying replay proxy YAML snapshot that exercise the current explicit tool-definition APIs.
This is the first of six focused PRs that break apart the changes originally combined in #1692. It does not fix issue #1682; rather it establishes baseline E2E coverage of the low-level tool-definition API before the ergonomic annotations are introduced.
Changes
New: Java E2E integration test
Covers \CopilotTool.create, \createOverride, \getArgumentsAs(record),
\getArguments(), available-tools filtering (\custom:*\ + \�uiltin:web_fetch),
and mutable handler state (\currentPhase) asserted after tool execution.
New: Replay snapshot
Multi-turn replay snapshot: first exchange triggers the tool call, second
exchange supplies the tool result and receives the final response.
This snapshot is also used by PRs Fix snapshot filename collisions on case-insensitive filesystems #2–Fix issue-triage workflow to add labels via add-labels safe-output #6 (go, nodejs, python, rust, dotnet).
New: Copilot skill
Packages the knowledge of creating net-new Java E2E integration tests with
handcrafted YAML snapshots into a reusable Copilot skill.
Fix: abort snapshot
Adds a second conversation entry to handle the cleared-history recovery
code path that can occur after an abort during active streaming.
Related