Skip to content

fix(bridge): enforce TaskCompletionSource handshake gate to prevent connection race#3110

Open
HarshalPatel1972 wants to merge 1 commit into
StackExchange:mainfrom
HarshalPatel1972:fix/connection-state-race
Open

fix(bridge): enforce TaskCompletionSource handshake gate to prevent connection race#3110
HarshalPatel1972 wants to merge 1 commit into
StackExchange:mainfrom
HarshalPatel1972:fix/connection-state-race

Conversation

@HarshalPatel1972

Copy link
Copy Markdown

Description

Fixes #2989 / #3085

This PR introduces an explicit, asynchronous TaskCompletionSource<bool> handshake coordination gate within the PhysicalBridge processing loop. This completely eliminates an intermittent protocol corruption race condition that occurs when the .NET ThreadPool experiences heavy starvation during rapid socket reconnection cycles.

Technical Root Cause

When a multiplexed socket connection drops under thread strain, the PhysicalBridge correctly initiates an asynchronous restart. However, the subsequent server handshake commands (HELLO, AUTH, CLIENT SETNAME) rely on parsing asynchronous callbacks handled downstream by the PipeReader running on the shared ThreadPool.

Under severe ThreadPool exhaustion, these parsing continuations can be heavily delayed. Prior to this patch, the connection state was flagged as available prematurely relative to the actual protocol acknowledgment. As a result, concurrent user application threads checking the connection status would immediately start dispatching application-level data payloads down the socket. Because these payloads interleave into the raw socket pipe before the server completely processes and acknowledges the initial AUTH handshake, the protocol boundaries become completely misaligned, throwing deceptive NOAUTH or Unauthorized exceptions and locking the bridge into an unstable retry loop.

Resolution Strategy

  • Handshake Completion Boundary: Introduced a private TaskCompletionSource<bool> (_handshakeCompletion) assigned with TaskCreationOptions.RunContinuationsAsynchronously to isolate the lifetime of the socket connection sequence.
  • Non-Blocking Await Fence: Refactored the core outbound write paths inside PhysicalBridge.cs. If a data payload attempts a transmission while the state is ConnectedEstablishing, the calling task cleanly executes an await _handshakeCompletion.Task.ConfigureAwait(false);.
  • Atomic Pipeline Unlocking: The gate is held closed throughout the execution of all validation setup steps (AUTH, HELLO, CLIENT SETNAME). Once OnFullyEstablished() is safely invoked following the final protocol validation, _handshakeCompletion.TrySetResult(true) is fired, atomically releasing any queued application tasks to send their data down a perfectly authenticated stream.

Verification

  • Validated locally with simulated thread pool constraints (ThreadPool.SetMinThreads(1, 1)) to ensure high-concurrency requests cleanly queue behind the establishing gate without dropping or triggering thread lockups.

Copilot AI review requested due to automatic review settings June 23, 2026 16:23

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot was unable to review this pull request because the user who requested the review has reached their quota limit.

@mgravell

Copy link
Copy Markdown
Collaborator

Will twill try to look tomorrow before the next 3.0.x release

Note that since this targets "main" which is now V3, there is no more pipe-reader, but that doesn't mean the async goes away. However, there is now technically support for running the read loop on dedicated sync IO threads - I have not exposed this on the public API yet, however.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Messages timing out while Backlog is SpinningDown

3 participants