Multi-Service

Last verified: 2026-05-14

Kai supports 26 LLM providers (plus a built-in Free tier). Each provider uses one of three API formats: OpenAI-compatible (most services), Gemini native, or Anthropic native -- plus LiteRT on-device for local inference. Users can configure multiple service instances, reorder them, and Kai automatically falls back through the chain on failure.

Concepts

Service

A supported LLM provider. Each service is defined by:

A unique name and ID
Whether it requires an API key
Which API format it uses (OpenAI-compatible, Gemini native, or Anthropic native)
Links to the provider's API-key management page

Service Instance

A configured connection to a service. Users can add multiple instances of the same service (e.g. two OpenAI accounts with different keys). Each instance stores its own:

API key
Selected model
Base URL (relevant for the OpenAI-Compatible API service)

Free Tier

A built-in service that requires no API key. Free is never shown in the service picker — it is used as:

The sole service when no other services are configured
A last-resort fallback when "Use as fallback" is enabled (default)

Fallback Chain

Configured instances are tried in the order the user arranged them
Only instances with valid API keys are considered
If no instances are configured, the Free tier is used as the only service
If instances exist and "Use as fallback" is enabled (default), the Free tier is appended as the last resort
Each service attempt retries up to 2 times with increasing delays before moving to the next service in the chain
On failure, the next instance in the chain is tried; if all fail, the last error is shown
If a fallback succeeds, the response indicates which service answered
While the chain is being walked, the thinking indicator shows per-attempt status — the name of the service currently being tried, or the reason the previous one failed before moving on — so silent fallbacks are visible to the user
Entries whose context window can't fit the current chat history are skipped during the walk
On-device (Local Model) failures are not silently absorbed — they short-circuit the fallback chain so the user sees the actual error rather than being quietly bumped to a cloud service
Certain non-retryable errors (notably Anthropic's "insufficient credits" and quota-exhausted responses from OpenAI-compatible providers) bypass both the per-service retry loop and the fallback chain — the error surfaces immediately

API Formats

Most services use the OpenAI-compatible chat completions format. Gemini uses Google's native Generative Language API. Anthropic uses its own Messages API with x-api-key header authentication and a different request/response structure. LiteRT runs inference on-device using Google's LiteRT LM SDK -- no HTTP, no API key, fully offline.

The OpenAI-Compatible API service supports a custom base URL, defaulting to localhost:11434/v1 for local Ollama setups. The base URL should include the version path segment (e.g., http://localhost:11434/v1 or https://my-provider.com/api/v1), following the OpenAI SDK convention. Kai appends only /chat/completions or /models to this base URL.

Supported Services

Service	`id`	Requires API Key	API Type
Free	`free`	No	OpenAI-compatible
Gemini	`gemini`	Yes	Gemini native
Anthropic	`anthropic`	Yes	Anthropic native
OpenAI	`openai`	Yes	OpenAI-compatible
DeepSeek	`deepseek`	Yes	OpenAI-compatible
Mistral	`mistral`	Yes	OpenAI-compatible
xAI	`xai`	Yes	OpenAI-compatible
OpenRouter	`openrouter`	Yes	OpenAI-compatible
GroqCloud	`groqcloud`	Yes	OpenAI-compatible
NVIDIA	`nvidia`	Yes	OpenAI-compatible
Cerebras	`cerebras`	Yes	OpenAI-compatible
Ollama Cloud	`ollamacloud`	Yes	OpenAI-compatible
LongCat	`longcat`	Yes	OpenAI-compatible (ships with a curated default model list; also exposes a `/models` endpoint used during validation)
Together AI	`together`	Yes	OpenAI-compatible
Hugging Face	`huggingface`	Yes	OpenAI-compatible
Venice AI	`venice`	Yes	OpenAI-compatible
Moonshot AI	`moonshot`	Yes	OpenAI-compatible
Z.AI	`zai`	Yes	OpenAI-compatible
Z.AI Coding Plan	`zai-coding-plan`	Yes	OpenAI-compatible
MiniMax	`minimax`	Yes	OpenAI-compatible
AIHubMix	`aihubmix`	Yes	OpenAI-compatible
Deep Infra	`deepinfra`	Yes	OpenAI-compatible
Fireworks AI	`fireworksai`	Yes	OpenAI-compatible
OpenCode	`opencode`	Yes	OpenAI-compatible
Public AI	`publicai`	Yes	OpenAI-compatible
OpenAI-Compatible API	`openai-compatible`	No (optional)	OpenAI-compatible
Local Model	`litert`	No	On-device (LiteRT LM)

Connection Validation

When the user enters or changes an API key (or base URL), the app validates the connection after an 800 ms debounce and shows a status indicator: checking, connected, invalid key, quota exhausted, rate limited, or connection failed. Validation also runs for all services when the settings screen opens. Services validate by fetching their model list — Gemini, Anthropic, and OpenAI-compatible services (including LongCat) each call their respective models endpoint. On a successful connection, the available model list is refreshed.

Model Selection

When a connection is validated and models are fetched, the app auto-selects a model if none is chosen — first checking for a per-service default model (e.g. LongCat defaults to "LongCat-Flash-Lite"), then preferring "kimi-k2.5" if available, otherwise the first model in the list. Services filter their model lists: - OpenAI shows only chat-oriented models (prefix filter) - GroqCloud shows only models marked as active - Together AI filters by type == "chat" to exclude non-chat models (embedding, code, etc.) - Other services show all non-retired models

Model Cards

The model picker modal shows each candidate as a card with consistent metadata regardless of provider:

Title (top left) — a human-readable display name from the curated catalog or the provider's API; falls back to the raw model id only when no display name is available
Arena score (top right) — LMArena Elo rating as colored text, gradient from green (>= 1400) through lime/yellow to orange (< 1250)
Detail line (below the title) — release date, parameter count, and context window joined into a single muted line separated by · (e.g. Mar 2025 · 70B · 200K ctx); any missing field is simply omitted from the line

The card representing the currently selected model is highlighted with a filled accent background so users can identify their current choice at a glance when reopening the picker.

The modal includes sort chips (Date, Score, Ctx) below the search field. Tapping a chip switches which field is active; all sorts are descending (highest or most recent first), with no ascending option. Default sort is by score.

Context window and release date come from two sources, merged by the mapping layer: a bundled curated catalog of well-known models, and whatever the provider's own models endpoint returns (e.g. OpenAI-compat context_window and created, Anthropic created_at). The curated catalog wins; provider-supplied values are used only as a fallback when the catalog has no entry for that model. The catalog is hand-maintained to correct inconsistencies in what providers report. Models not present in the catalog still render — they just use whatever the API provided, and any unknown fields are hidden.

Chat Screen Service Toggle

The chat screen builds an "interactive services" list that includes two distinct Free entries — Free FAST and Free EXPERT — followed by every configured non-Free service whose currently selected model supports tool use (agentic flows). When this list contains more than one entry, a circular service icon button appears to the right of the chat input, next to the send/stop button. Because the two Free modes alone already count as two entries, the toggle can appear even when no non-Free services are configured. The icon represents the current primary service (each service has its own simplified vector icon). Tapping it opens a dropdown listing the eligible services with their icons, names, and model IDs; the current primary is highlighted with a primary container background.

Selecting a non-Free entry reorders the configured list so the chosen service becomes first (primary). Selecting a Free entry (FAST or EXPERT) instead flips an "is Free primary" flag and records the chosen Free mode — Free can be promoted to primary independently of the configured fallback order, without rearranging the non-Free chain. The fallback chain picks up the new state automatically. The fallback walker also skips any entry whose context window can't fit the current chat history, so very long conversations may transparently move past services that would otherwise be eligible.

Attachments

Image attachments are broadly supported across cloud services. PDF attachments are advertised only by services with native document support — currently Anthropic, Gemini, OpenAI, and OpenRouter — and only those services accept PDFs in a chat turn. The on-device Local Model hides file attachment affordances entirely; users running purely locally don't see attachment buttons.

Settings UI

Users manage services through the settings screen: - Add — pick from the list of available services (can add the same service multiple times); the OpenAI-Compatible API and the on-device Local Model are pinned to the top of the picker, with the remaining providers sorted alphabetically - Remove — delete an instance and its stored credentials; deletion is deferred with a snackbar "Undo" option (~4 seconds) before the service is permanently removed - Reorder — drag to change priority (first = primary, rest = fallbacks) - Configure — per-instance API key, model selection, base URL (OpenAI-Compatible only) - Free fallback toggle — controls whether Free is appended as last resort

Key Files

File	Purpose
`composeApp/src/commonMain/.../data/Service.kt`	Service definitions, all provider metadata
`composeApp/src/commonMain/.../data/ModelCatalog.kt`	Curated context window / release date for well-known models
`composeApp/src/commonMain/.../data/ModelTransformations.kt`	Maps provider model DTOs to `SettingsModel`, merges with catalog
`composeApp/src/commonMain/.../data/AppSettings.kt`	Service instance storage, credential persistence, migration
`composeApp/src/commonMain/.../data/RemoteDataRepository.kt`	Fallback chain, request orchestration
`composeApp/src/commonMain/.../network/Requests.kt`	HTTP clients for all three API formats
`composeApp/src/commonMain/.../network/dtos/anthropic/`	Anthropic Messages API DTOs
`composeApp/src/commonMain/.../ui/settings/SettingsViewModel.kt`	Connection validation, service management UI logic
`composeApp/src/commonMain/.../ui/chat/ChatScreen.kt`	Chat screen, renders ServiceSelector
`composeApp/src/commonMain/.../ui/chat/composables/ServiceSelector.kt`	Compact service toggle dropdown
`composeApp/src/commonMain/.../ui/chat/ChatViewModel.kt`	Wires service selection and reordering