Skip to content

Reasoning Content Handling

Last verified: 2026-05-17

Reasoning-capable models (DeepSeek R1, GLM thinking, Qwen thinking, Kimi thinking, Magistral, gpt-oss, etc.) return their chain-of-thought separately from the final answer. Kai handles reasoning along two axes: wire-side (whether to echo the trace back to the provider on the next request) and display-side (whether to show it to the user in the chat UI). When a turn also contains tool_calls, some providers require the chain-of-thought to be echoed back to preserve reasoning continuity across the tool round-trip — and others strictly reject the same field. This page documents what each provider does, what Kai sends, and where we trade fidelity for simplicity.

Why per-provider handling

OpenAI's chat-completions schema has no standardized place for prior-turn reasoning, so each provider invented their own. The result is three sources of variation:

  • Field namereasoning_content (DeepSeek lineage), reasoning (OpenRouter/Cerebras lineage), reasoning_details[] (preserves thought signatures for Anthropic/Gemini via OpenRouter), or embedded <think>...</think> tags inside content (MiniMax M2 native, older Magistral).
  • Whether echoing back is required — some providers reconstruct internal state from it (Z.AI Coding Plan, OpenCode Zen via DeepSeek route, Kimi k2.6 with thinking.keep="all"); others accept it as a no-op; others reject any unknown field outright.
  • Paired flags — preservation sometimes requires an additional request flag (Z.AI clear_thinking: false, Fireworks reasoning_history: "preserved").

Outgoing assistant-message behavior matrix

Behavior of each provider when an assistant-role message with prior tool_calls carries a reasoning_content field.

Provider Status Wire field expected Notes Source
Groq Rejected n/a — strip on send Strict schema validator: 400 ('messages.N': property 'reasoning_content' is unsupported) on any unknown assistant-message field console.groq.com/docs/reasoning
DeepSeek Rejected n/a — strip on send Reasoner model rejects reasoning_content on input and does not support function calling at all api-docs.deepseek.com/guides/reasoning_model
Cerebras Rejected (wrong field) reasoning (not reasoning_content) GLM-4.7 multi-step returns 400 when sent reasoning_content; expects reasoning. Kai sends neither (mode = NONE) inference-docs.cerebras.ai/api-reference/chat-completions
Z.AI Coding Plan Required reasoning_content Preserved Thinking is on by default on /api/coding/paas/v4; dropping the field breaks reasoning coherence docs.z.ai/guides/capabilities/thinking-mode
OpenCode Zen (DeepSeek route) Required reasoning_content Pass-through gateway. When routing to DeepSeek-V4 Pro thinking, once any assistant message carries reasoning, all subsequent assistant messages must opencode.ai/docs/zen/
Moonshot / Kimi Required for kimi-k2.6 with thinking.keep="all" reasoning_content Only that specific model + flag combination requires the echo. Other Kimi thinking models accept it as a no-op. Kai does not currently set thinking.keep platform.kimi.com/docs/api/chat
Fireworks AI Accepted (documented) reasoning_content Officially supported field on ChatMessage. Full preservation also requires reasoning_history: "preserved" on the request — Kai does not set this docs.fireworks.ai/api-reference/post-chatcompletions
Z.AI standard Accepted (documented, inert without flag) reasoning_content Preserved Thinking is opt-in on /api/paas/v4; without clear_thinking: false (which Kai does not send) the echo is ignored docs.z.ai/guides/capabilities/thinking-mode
OpenRouter Accepted (alias) Canonical reasoning; reasoning_content is a documented alias. Anthropic/Gemini-via-OR need reasoning_details[] with thought signatures, which Kai does not send openrouter.ai/docs/guides/best-practices/reasoning-tokens
LongCat Tolerated (undocumented) Schema documents role + content only; field is passed through silently longcat.chat/platform/docs/APIDocs.html
Venice AI Tolerated (undocumented) Pass-through policy: "Request fields not listed may be passed through but are not validated" docs.venice.ai
MiniMax M2 Tolerated but wrong mechanism Native mode expects <think>...</think> inside content; split mode expects reasoning_details. Top-level reasoning_content is undocumented and likely ignored platform.minimax.io/docs/guides/text-m2-function-call
xAI, NVIDIA, Mistral, Ollama Cloud, Together, HuggingFace, DeepInfra, AIHubMix, Public AI, OpenAI, Free, OpenAI-Compatible API Accepted (silent ignore) Either documented or behave as permissive OpenAI-compatible proxies that drop unknown fields (per-provider docs)
Anthropic, Gemini, LiteRT Out of scope These use entirely separate request DTOs (Anthropic Messages API, Gemini Generative Language API, on-device LiteRT). Reasoning is handled inside those code paths, not via reasoning_content.

What Kai does today

Kai gates the field on Service.reasoningRequestMode (NONE or REASONING_CONTENT). When REASONING_CONTENT is set and the prior assistant turn carried tool_calls, Kai emits the field on the next request.

Services currently set to REASONING_CONTENT: OpenRouter, LongCat, Venice, Moonshot, Z.AI, Z.AI Coding Plan, MiniMax, Fireworks, OpenCode.

All other services use the default NONE (the field is stripped on send). This is the safe default — any service we don't yet have evidence about will not regress.

The chain-of-thought is preserved on History.reasoningContent regardless of the wire-side decision so the UI can render thinking traces independently of what gets transmitted on the next request. This applies to assistant turns received over the OpenAI-compatible path; the Anthropic and Gemini paths have their own thinking handling and do not currently populate this field. Capture happens going forward — conversations saved before the persistence support was added will not retroactively gain reasoning content on reload.

Display in chat UI

The chain-of-thought is always rendered when present. Each assistant bubble with reasoning content prepends a collapsible "Thinking" section above the answer: collapsed by default, the first line of the most recent reasoning segment shown as a preview; expanded reveals the full trace in a dim blockquote. Thinking-only turns (where the model returned reasoning but no answer, typically as a precursor to a tool call) surface as standalone reasoning bubbles while in flight; once the answer arrives, they're absorbed into the answer's grouped section so a multi-tool response shows a single "Thinking" disclosure rather than several.

Reasoning is only visible on messages captured after persistence support landed, and only on the OpenAI-compatible path. Old conversations saved before persistence appear unchanged.

Known gaps

Documented here so future work has a starting point. None of these are bugs today; they are fidelity improvements:

  • OpenRouter reasoning_details[] — needed to preserve thought signatures for Anthropic and Gemini models routed via OpenRouter. Without them, those models lose continuity across tool calls.
  • MiniMax M2 native mode — should embed <think>...</think> inside content rather than (or in addition to) sending a top-level field; currently the field is silently ignored.
  • MiniMax M2 split mode — alternative path uses reasoning_details instead.
  • Z.AI standard clear_thinking: false — without this paired request flag, Preserved Thinking is off and our echo is a no-op.
  • Fireworks reasoning_history: "preserved" — without this paired request flag, Fireworks does not actually preserve reasoning across turns.
  • Cerebras reasoning field — Cerebras uses reasoning, not reasoning_content. Mode is currently NONE because we have no reasoning field on the request DTO. Adding it would unlock GLM-4.7 multi-step on Cerebras.
  • Per-model dispatchreasoningRequestMode is per-service. Moonshot is set to REASONING_CONTENT because of Kimi k2.6, but the flag is inert for older Kimi thinking models. Per-model precision would tighten this.

Adding any of these means either widening ReasoningRequestMode (new enum values), adding fields to OpenAICompatibleChatRequestDto, adding paired-flag plumbing on the request side, or moving to a per-model handler. None of these have been done because the current binary dispatch covers all known live-broken cases.

Key Files

File Purpose
composeApp/src/commonMain/.../data/Service.kt ReasoningRequestMode enum + per-service mode assignment
composeApp/src/commonMain/.../ui/chat/ChatUiState.kt History.toGroqMessageDto() — gates emission of reasoning_content based on mode
composeApp/src/commonMain/.../data/RemoteDataRepository.kt buildOpenAIMessages() — passes Service.reasoningRequestMode into the DTO mapper
composeApp/src/commonMain/.../network/dtos/openaicompatible/OpenAICompatibleChatRequestDto.kt Request DTO with @SerialName("reasoning_content") on assistant messages
composeApp/src/commonMain/.../network/dtos/openaicompatible/OpenAICompatibleChatResponseDto.kt Response DTO; reads reasoning_content and reasoning and normalizes to effectiveReasoning
composeApp/src/commonTest/.../ui/chat/ToGroqMessageDtoReasoningTest.kt Guards the per-mode emission behavior
composeApp/src/commonMain/.../data/Conversation.kt Conversation.Message.reasoningContent — persisted reasoning trace for round-tripping across app restarts
composeApp/src/commonMain/.../ui/chat/composables/BotMessage.kt Renders the dim-blockquote reasoning section above the answer when reasoningContent is supplied
composeApp/src/commonMain/.../ui/chat/ChatScreen.kt Groups all reasoning segments in a response under the answer-bearing assistant message; renders standalone thinking-only bubbles for in-flight turns