fix(metrics): correct GFE metrics extraction and enable by default#17561
fix(metrics): correct GFE metrics extraction and enable by default#17561sinhasubham wants to merge 1 commit into
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces deferred metrics recording for streaming and async streaming RPC responses in Google Cloud Spanner by wrapping responses in specialized wrappers. It also implements GFE latency extraction from response metadata and adds corresponding tests. The review feedback highlights several critical improvements for robustness: refining streaming response detection to check for iterator methods (next and anext) rather than iterable methods to prevent incorrect wrapping of standard iterables; wrapping telemetry and metrics recording blocks in try-except blocks to ensure telemetry failures do not disrupt the main application flow; defensively validating metadata elements before unpacking to avoid unpacking errors; and properly decoding bytes metadata values before regex matching.
|
@sinhasubham, Please merge main into this branch. The presubmit failures were addressed in #17578 |
b0c3ca5 to
4226490
Compare
4226490 to
ad028db
Compare
Description
This PR resolves a critical issue where Spanner GFE (Google Front End) latency metrics were not being properly captured, and ensures these metrics are always enabled by default.
Key Changes:
MetricsInterceptorwhereinitial_metadata()returning standard headers (e.g.,content-type) maskedtrailing_metadata(). Theserver-timingheader is now properly extracted from both initial and trailing metadata._StreamingResponseWrapperand_AsyncStreamingResponseWrapperto the interceptor. This correctly defers metrics recording until the streaming iterators finish, ensuringtrailing_metadatais fully populated and available before attempting extraction.gfe_enabledtoggle inSpannerMetricsTracerFactory. GFE metrics capture is now always-on whenever OpenTelemetry tracing is enabled.