Turn on extended thinking for Anthropic and OpenAI with one switch, control depth and whether the trace is returned, and read the reasoning tokens back from every run.
Reasoning and thinking
Modern Anthropic and OpenAI models can spend extra tokens "thinking" before they answer. dendrux exposes this as a small, uniform set of controls that work the same way across both vendors, and it threads the resulting reasoning — token counts and, where the vendor returns it, the summary text — through the same run-result, persistence, streaming, and dashboard rails as everything else.
It is off by default. Every existing app keeps its exact behavior until you opt in. Everything below is read from the dendrux==0.2.0a4 source.
The controls
The reasoning knobs are constructor arguments on the provider, and every one of them can also be overridden per call via agent.run(...) / agent.stream(...) kwargs.
from dendrux import Agent
from dendrux.llm.anthropic import AnthropicProvider
agent = Agent(
prompt="…",
provider=AnthropicProvider(model="claude-opus-4-8", thinking=True, effort="high"),
)
# Per-call override — turn thinking up for one hard question, off for the rest.
await agent.run("Prove this edge case is unreachable.", effort="max")effort is the cross-vendor vocabulary. max is Anthropic-only; OpenAI additionally accepts none / minimal. Each provider normalizes the value to its own API field, so you write one word and it lands correctly on either vendor.
What each provider does
The switch is uniform; what the vendor returns is not. The key split is token counts everywhere, summary text only where the API offers it.
Adaptive thinking is the only mode current Anthropic models (4.6–4.8 / Fable / Mythos) support; thinking_budget exists for older models and fixed-cost workloads. See Models and providers for picking between the two OpenAI providers.
Reading reasoning back
Reasoning rides the existing token-and-evidence rails, so it shows up everywhere usage already does.
On the run result. RunResult.usage.reasoning_tokens is the run total (None if the provider did not report it):
result = await agent.run("…")
print(result.usage.reasoning_tokens) # e.g. 72While streaming. When the vendor returns a trace, agent.stream(...) emits REASONING_DELTA events alongside the usual text deltas; the chunk text is on event.text.
from dendrux.types import RunEventType
async for event in agent.stream("…"):
if event.type == RunEventType.REASONING_DELTA:
print(event.text, end="")From the store, after the fact. Per-call reasoning is on LLMCall and the run total is on RunDetail:
These are backed by three nullable columns added without breaking the schema — llm_interactions.reasoning_tokens, token_usage.reasoning_tokens, and agent_runs.total_reasoning_tokens (see State persistence). The dashboard surfaces the same numbers: a run-header total, a per-call stat, and the summary text in the payload inspector.
Reasoning tokens and your budget
Reasoning tokens are billed inside output_tokens — the provider already counts them there. dendrux reports reasoning_tokens as an informational breakdown and never adds it on top of the output total, so a budget is not double-charged. Turning thinking on will raise output-token spend (that is the point); the budget sees that spend through the normal output count, with the reasoning portion broken out so you can see how much of it was thinking.
Backward compatibility
thinking defaults to False, the new persistence columns are nullable, and the cross-vendor effort values are opt-in — so an app that never sets these controls sends the exact same requests and stores the exact same rows it did before reasoning existed. A reasoning kwarg meant for one provider (e.g. thinking on the Chat Completions provider, which has no trace to return) is accepted and ignored rather than forwarded into a request that would break.