feat: capture Gemini cache and thinking tokens in instrument_google_genai#1961
feat: capture Gemini cache and thinking tokens in instrument_google_genai#1961JonathanTsen wants to merge 3 commits into
Conversation
…enai Patches the upstream OTel GoogleGenAiSdkInstrumentor to also extract cached_content_token_count, thoughts_token_count and tool_use_prompt_token_count from response.usage_metadata, and computes operation.cost via genai-prices when available. Closes the gap where direct google.genai users (not going through pydantic-ai) were missing cache and thinking metrics in their spans.
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
Adds a test that exercises the early-return branch when the Gemini response has no usage_metadata, and marks the defensive `usage_data.model is not None` check (unreachable in practice: the google extractor raises LookupError for unknown models rather than returning model=None) with `# pragma: no branch`. Restores 100% coverage broken by the previous commit.
References the official Gemini API pages for context caching, thinking tokens, function calling, and pricing, plus the python-genai field description that documents prompt_token_count already including cached tokens.
There was a problem hiding this comment.
1 issue found across 1 file (changes from recent commits).
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="docs/integrations/llms/google-genai.md">
<violation number="1" location="docs/integrations/llms/google-genai.md:59">
P2: Documentation for `operation.cost` should clarify that the attribute is only present when `genai-prices` is installed and the model pricing is known, since the implementation silently skips cost calculation otherwise.</violation>
</file>
Tip: Review your code locally with the cubic CLI to iterate faster.
Re-trigger cubic
| - `gen_ai.usage.cache_read.input_tokens` — tokens served from [context cache](https://ai.google.dev/gemini-api/docs/caching) (cache hit) | ||
| - `gen_ai.usage.details.thoughts_tokens` — [reasoning tokens](https://ai.google.dev/gemini-api/docs/thinking) (Gemini 2.5 / 3.x) | ||
| - `gen_ai.usage.details.tool_use_prompt_tokens` — tokens used for [tool definitions](https://ai.google.dev/gemini-api/docs/function-calling) | ||
| - `operation.cost` — calculated price in USD using the [official Gemini pricing tables](https://ai.google.dev/gemini-api/docs/pricing) via [`genai-prices`](https://pypi.org/project/genai-prices/) |
There was a problem hiding this comment.
P2: Documentation for operation.cost should clarify that the attribute is only present when genai-prices is installed and the model pricing is known, since the implementation silently skips cost calculation otherwise.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At docs/integrations/llms/google-genai.md, line 59:
<comment>Documentation for `operation.cost` should clarify that the attribute is only present when `genai-prices` is installed and the model pricing is known, since the implementation silently skips cost calculation otherwise.</comment>
<file context>
@@ -53,10 +53,13 @@ following attributes may appear depending on the response:
+- `gen_ai.usage.cache_read.input_tokens` — tokens served from [context cache](https://ai.google.dev/gemini-api/docs/caching) (cache hit)
+- `gen_ai.usage.details.thoughts_tokens` — [reasoning tokens](https://ai.google.dev/gemini-api/docs/thinking) (Gemini 2.5 / 3.x)
+- `gen_ai.usage.details.tool_use_prompt_tokens` — tokens used for [tool definitions](https://ai.google.dev/gemini-api/docs/function-calling)
+- `operation.cost` — calculated price in USD using the [official Gemini pricing tables](https://ai.google.dev/gemini-api/docs/pricing) via [`genai-prices`](https://pypi.org/project/genai-prices/)
Note that, unlike Anthropic, the Gemini API's `prompt_token_count` already includes
</file context>
| - `operation.cost` — calculated price in USD using the [official Gemini pricing tables](https://ai.google.dev/gemini-api/docs/pricing) via [`genai-prices`](https://pypi.org/project/genai-prices/) | |
| - `operation.cost` — calculated price in USD using the [official Gemini pricing tables](https://ai.google.dev/gemini-api/docs/pricing) via [`genai-prices`](https://pypi.org/project/genai-prices/) (only present when the package is installed and model pricing is known) |
| Note that, unlike Anthropic, the Gemini API's `prompt_token_count` already includes | ||
| the cached tokens; Logfire does not sum them again. This is documented in the | ||
| [`GenerateContentResponseUsageMetadata.prompt_token_count`](https://googleapis.github.io/python-genai/genai.html#genai.types.GenerateContentResponseUsageMetadata.prompt_token_count) | ||
| field description: *"When `cached_content` is set, this also includes the number | ||
| of tokens in the cached content."* |
There was a problem hiding this comment.
I don't think this is needed
| self._lf_thoughts = thoughts | ||
| if tool_use := getattr(metadata, 'tool_use_prompt_token_count', None): | ||
| self._lf_tool_use_prompt = tool_use | ||
| self._lf_response = response |
There was a problem hiding this comment.
it looks like _lf_response is the only thing that needs to be stored, and the rest can be retrieved in _wrapped_create_final_attributes
| return _generate | ||
|
|
||
|
|
||
| def _build_fake_genai_response( |
There was a problem hiding this comment.
is extra mocking needed? can we stick to vcr for the new tests?
There was a problem hiding this comment.
please don't edit this file
Summary
_GenerateContentInstrumentationHelperto extractcached_content_token_count,thoughts_token_countandtool_use_prompt_token_countfromresponse.usage_metadataand emit them asgen_ai.usage.cache_read.input_tokens,gen_ai.usage.details.thoughts_tokensandgen_ai.usage.details.tool_use_prompt_tokens.operation.costviagenai-priceswhen available (silent failure if the package is missing or the model is unknown).google.genaiusers (not going throughpydantic-ai) were missing cache and thinking metrics in their spans.Implementation notes
_maybe_update_token_counts,create_final_attributes), following the same pattern already used elsewhere inlogfire/_internal/integrations/google_genai.py. Wrapped intry/exceptat module load so a future upstream rename keeps the base instrumentor working.opentelemetry-instrumentation-google-genai0.7b0; current pinning inpyproject.tomlis>= 0.4b0.None/0, so partial chunks don't overwrite a final value.prompt_token_countalready includes cached tokens, so we exposecache_readseparately rather than summing.genai-pricesis invoked withresponse.model_dump(by_alias=True)because the extractor expects camelCase JSON keys (usageMetadata,modelVersion).Test plan
uv run pytest tests/otel_integrations/test_google_genai.py— 8 passing (3 new + 5 existing; existing VCR snapshots updated to include the new attributes that real Gemini responses already carry).make lintmake typecheck