Recommended Models

Which AI models work best with Continuum Writer, by use case and performance.

AI models evolve quickly. The recommendations below were accurate at the time of writing, but new models are released regularly. Check with your provider for the latest available models — the best option today may have a better successor tomorrow.

Understanding the Three Model Roles

Continuum Writer uses three separate AI models, each with a different job. You can configure these independently in Settings.

Chat Model — The main model you interact with. It answers questions, suggests edits, creates content, and uses tools to search your project. This should be your most capable model, and ideally one that supports tool use (though tool use can be disabled in Settings).
Router Model — Classifies the intent behind each message so the app knows how to handle it. This runs behind the scenes and doesn't need to be a large model.
Summarise Model — Handles entity extraction and summarisation tasks (e.g. identifying characters and locations during import). Again, a smaller model works fine here.

For cloud providers, you can use the same model for all three roles if you prefer simplicity. For local models, using a smaller model for the Router and Summarise roles keeps things fast.

Cloud Providers (Best Experience)

Cloud models give the most reliable results, especially for longer projects, tool use, and consistent writing quality.

OpenAI

Chat: GPT-4o or the latest flagship model — best overall quality for understanding story context, suggesting edits, and creative writing assistance.
Router / Summarise: GPT-4o-mini or equivalent — fast and affordable, more than capable for classification and extraction tasks.

Anthropic

Chat: Claude Sonnet (latest version) — excellent at creative and nuanced writing tasks, particularly good at maintaining character voice.
Router / Summarise: Claude Haiku (latest version) — fast and affordable, good for quick classification and extraction.

Local Models (Via Ollama)

Running models locally gives you full control, privacy, and no per-message cost — but performance depends heavily on your hardware.

Run Locally

For the Chat Model, use a capable model that supports tool use. The size you can run depends on your hardware:

16GB RAM: qwen3.5:9b or gemma3:12b
32GB+ RAM: qwen3.5:27b, qwen3.6:35b, or gemma4:27b

For the Router and Summarise Models, a smaller model works well — it doesn't need tool support or thinking capabilities:

qwen3.5:4b — fast and lightweight, good enough for classification and extraction

Ollama Cloud Models

Ollama also offers cloud-hosted models that don't require powerful hardware. These are free or very low cost, and can be a good option if your machine struggles with larger local models:

Kimi K2 — strong general-purpose model, good for the chat role
DeepSeek — capable reasoning model, good for the chat role
Gemma 4 — good balance of quality and speed

How to install a model via Ollama

To install a model, open Terminal and run: ollama pull [model name]

For example, to install the qwen3.5:9b model, run: ollama pull qwen3.5:9b