π΄ Required Information
Describe the Bug:
GeminiContextCacheManager contains a method _find_count_of_contents_to_cache specifically designed to exclude the dynamic instruction_provider content from the cache fingerprint β but it is never called. As a result, using the documented static_instruction + instruction pattern for context caching produces a permanently unstable fingerprint, making the cache never hit.
Root cause (code walkthrough):
static_instruction + instruction is the ADK-recommended pattern for context caching: the static part goes to system_instruction (stable, fingerprinted), while the instruction provider result is appended to llm_request.contents as a user-role Content (dynamic, should be excluded from the fingerprint).
The intended mechanism for excluding this dynamic content exists in gemini_context_cache_manager.py:
def _find_count_of_contents_to_cache(self, contents):
"""Find the number of contents to cache based on user content strategy.
Strategy: Find the last continuous batch of user contents and cache
all contents before them.
"""
last_user_batch_start = len(contents)
for i in range(len(contents) - 1, -1, -1):
if contents[i].role == "user":
last_user_batch_start = i
else:
break
return last_user_batch_start
At turn 1, with instruction_provider appending a user-role block at the end, all contents are user-role β this function returns N=0 β fingerprint = hash(system_instruction + tools) only β stable across all turns.
However, in handle_context_caching, the actual fingerprint count is computed as:
# No existing cache metadata - return fingerprint-only metadata
total_contents_count = len(llm_request.contents) # β bug: should use _find_count_of_contents_to_cache
fingerprint = self._generate_cache_fingerprint(llm_request, total_contents_count)
return CacheMetadata(fingerprint=fingerprint, contents_count=total_contents_count)
_find_count_of_contents_to_cache is defined but never called anywhere in the codebase.
Why this breaks turn-by-turn:
- Turn 1 contents (after
instruction_provider appends): [user_msg_1, dynamic_ctx_t1] β N=2, fingerprint covers both
- Turn 2 contents (first N=2):
[user_msg_1, model_resp_1] β model response now occupies the slot where dynamic_ctx_t1 was
- Fingerprint mismatch β N reset to 4 (total contents) β same problem repeats every turn
- Cache is never created
Steps to Reproduce:
- Create an
LlmAgent with static_instruction (stable string) and instruction (dynamic provider returning session-dependent content)
- Enable
ContextCacheConfig on the App
- Run a multi-turn conversation
- Enable
GOOGLE_ADK_LOG_LEVEL=DEBUG and observe logs
Expected Behavior:
The instruction_provider content (user-role, appended at end of contents) is excluded from the cache fingerprint. The fingerprint covers only system_instruction + tools, which is stable across turns. The cache is created on turn 2 and reused on subsequent turns as long as system_instruction and tools do not change.
Observed Behavior:
The fingerprint includes the instruction_provider content (via len(llm_request.contents)). Since that content changes each turn (or is displaced by the model's response in the first-N window), the fingerprint changes on every turn. Debug logs show:
Cache content fingerprint mismatch
Fingerprints don't match, returning fingerprint-only metadata
The cache is never created. cache_hit_pct = 0%.
Proposed Fix:
In handle_context_caching, replace len(llm_request.contents) with the existing (but uncalled) _find_count_of_contents_to_cache:
# Before (buggy):
total_contents_count = len(llm_request.contents)
# After (fix):
total_contents_count = self._find_count_of_contents_to_cache(llm_request.contents)
This aligns the implementation with the documented static_instruction + instruction pattern and with the evident design intent of _find_count_of_contents_to_cache.
Environment Details:
- ADK Library Version:
google-adk==1.32.0
- Desktop OS: macOS (Darwin 24.6.0)
- Python Version: 3.13.11
Model Information:
- LiteLLM: No
- Model:
gemini-2.0-flash-lite (Gemini API)
π‘ Optional Information
Minimal Reproduction Code:
from google.adk.agents import LlmAgent
from google.adk.apps.app import App
from google.adk.agents.context_cache_config import ContextCacheConfig
from google.adk.agents.readonly_context import ReadonlyContext
from google.adk.models import Gemini
_STATIC_PROMPT = "You are a helpful assistant. " * 300 # large enough to exceed 4096 tokens with tools
def dynamic_instruction(context: ReadonlyContext) -> str:
# Simulates per-turn dynamic content (e.g. session state)
return f"<session_state>turn_data={context.state.get('turn', 0)}</session_state>"
agent = LlmAgent(
name="test_agent",
model=Gemini(model="gemini-2.0-flash-lite"),
static_instruction=_STATIC_PROMPT,
instruction=dynamic_instruction,
)
app = App(
name="test",
root_agent=agent,
context_cache_config=ContextCacheConfig(ttl_seconds=1800, min_tokens=4096),
)
# Run multi-turn: observe "fingerprint mismatch" in DEBUG logs on every turn
How often has this issue occurred?: Always (100%)
Additional Context:
The workaround is to inject dynamic content via a before_model_callback that calls llm_request.contents.insert(0, ...) instead of using instruction_provider. Because the dynamic block is then at position 0 on every turn, the first-N fingerprint window consistently starts with it, and the fingerprint is stable as long as the dynamic content itself doesn't change. This is semantically equivalent to the intended instruction_provider behavior but should not be necessary.
π΄ Required Information
Describe the Bug:
GeminiContextCacheManagercontains a method_find_count_of_contents_to_cachespecifically designed to exclude the dynamicinstruction_providercontent from the cache fingerprint β but it is never called. As a result, using the documentedstatic_instruction+instructionpattern for context caching produces a permanently unstable fingerprint, making the cache never hit.Root cause (code walkthrough):
static_instruction+instructionis the ADK-recommended pattern for context caching: the static part goes tosystem_instruction(stable, fingerprinted), while theinstructionprovider result is appended tollm_request.contentsas a user-roleContent(dynamic, should be excluded from the fingerprint).The intended mechanism for excluding this dynamic content exists in
gemini_context_cache_manager.py:At turn 1, with
instruction_providerappending a user-role block at the end, all contents are user-role β this function returns N=0 β fingerprint =hash(system_instruction + tools)only β stable across all turns.However, in
handle_context_caching, the actual fingerprint count is computed as:_find_count_of_contents_to_cacheis defined but never called anywhere in the codebase.Why this breaks turn-by-turn:
instruction_providerappends):[user_msg_1, dynamic_ctx_t1]β N=2, fingerprint covers both[user_msg_1, model_resp_1]β model response now occupies the slot wheredynamic_ctx_t1wasSteps to Reproduce:
LlmAgentwithstatic_instruction(stable string) andinstruction(dynamic provider returning session-dependent content)ContextCacheConfigon theAppGOOGLE_ADK_LOG_LEVEL=DEBUGand observe logsExpected Behavior:
The
instruction_providercontent (user-role, appended at end ofcontents) is excluded from the cache fingerprint. The fingerprint covers onlysystem_instruction + tools, which is stable across turns. The cache is created on turn 2 and reused on subsequent turns as long assystem_instructionandtoolsdo not change.Observed Behavior:
The fingerprint includes the
instruction_providercontent (vialen(llm_request.contents)). Since that content changes each turn (or is displaced by the model's response in the first-N window), the fingerprint changes on every turn. Debug logs show:The cache is never created.
cache_hit_pct = 0%.Proposed Fix:
In
handle_context_caching, replacelen(llm_request.contents)with the existing (but uncalled)_find_count_of_contents_to_cache:This aligns the implementation with the documented
static_instruction+instructionpattern and with the evident design intent of_find_count_of_contents_to_cache.Environment Details:
google-adk==1.32.0Model Information:
gemini-2.0-flash-lite(Gemini API)π‘ Optional Information
Minimal Reproduction Code:
How often has this issue occurred?: Always (100%)
Additional Context:
The workaround is to inject dynamic content via a
before_model_callbackthat callsllm_request.contents.insert(0, ...)instead of usinginstruction_provider. Because the dynamic block is then at position 0 on every turn, the first-N fingerprint window consistently starts with it, and the fingerprint is stable as long as the dynamic content itself doesn't change. This is semantically equivalent to the intendedinstruction_providerbehavior but should not be necessary.