Update Jul 16, 2026 tracked by Updatify

What’s Changed

Improved Gemma 4 tool calling and multi-turn reasoning, including more reliable tool-response continuations
Fixed a recurrent MLX model cache leak that could increase memory use across requests, and improved cache snapshot performance
MLX text model loading now respects OLLAMA_LOAD_TIMEOUT
Agent web search and fetch now tell users to run ollama signin when authentication is required
The interactive agent now receives the current working directory for better project context
Fixed ollama launch so choosing Pick another model for a deprecated model passed with --model opens the model picker
Updated VS Code setup documentation for the official Ollama extension

Full Changelog: https://github.com/ollama/ollama/compare/v0.32.0…v0.32.1-rc0

Update Jul 11, 2026 tracked by Updatify

What’s Changed

New interactive agent experience: running ollama now launches an agent to help you code and delegate work

❯ ollama
Ollama 0.32.0

▸ Chat, Code, & Work (glm-5.2:cloud)
Chat with models, code, search the web, and delegate real work

Renamed the Codex App integration to ChatGPT: use ollama launch chatgpt (and –restore to return to your usual ChatGPT profile)
Simplified integration selection: the ollama launch menu now only offers the most popular integrations (other integrations can be accessed through ollama launch
Warns before launching older agent models: CodeLlama, Qwen2.5(-coder), Llama 3.x, Mistral, StarCoder, and the base DeepSeek-R1 tags now prompt a deprecation warning before ollama launch continues

Full Changelog: https://github.com/ollama/ollama/compare/v0.31.2…v0.32.0

Original source ↗

Update Jul 6, 2026 tracked by Updatify

What’s Changed

Enabled flash attention on older NVIDIA GPUs (compute capability 6.x)
iGPU can now offload vision models with padding to fit available memory
Fixed structured output for thinking models when thinking is disabled
Hardened GGUF model creation
ollama launch for Claude Code now disables telemetry by default
Fixed loading models on paths with non-UTF-8 characters
Updated the MLX and llama.cpp engines

New Contributors

@kevinpark1217 made their first contribution in https://github.com/ollama/ollama/pull/16949

Full Changelog: https://github.com/ollama/ollama/compare/v0.31.1…v0.31.2

Original source ↗

Update Jun 30, 2026 tracked by Updatify

Faster Gemma 4 on Apple Silicon

Gemma 4 is now significantly faster in Ollama on Apple Silicon, generating tokens nearly 90% faster on average across a coding-agent benchmark by leveraging multi-token prediction (MTP). Ollama auto-tunes how many tokens to draft as it runs, so the speedup is on by default, requires no configuration, and does not change the model’s output.

What’s Changed

Tightened Gemma 4 MoE model loading in the MLX engine
Updated the MLX engine to the latest version, including a new small-batch matmul kernel
Updated the underlying llama.cpp engine to build 9840
Improved Gemma 4 multi-token prediction (MTP) performance

Full Changelog: https://github.com/ollama/ollama/compare/v0.30.12…v0.31.1

Original source ↗

Update Jun 30, 2026 tracked by Updatify

v0.31.1

What’s Changed

mlx: tighten up gemma4 moe loading code by @pdevine in https://github.com/ollama/ollama/pull/16964
mlx: bump to latest version to include new small batch matmul kernel @jessegross @dhiltgen
llama.cpp: bump to b9840 @dhiltgen
improved gemma4 MTP performance @jessegross

Full Changelog: https://github.com/ollama/ollama/compare/v0.31.0…v0.31.1

Original source ↗

Update Jun 25, 2026 tracked by Updatify

v0.30.11

What’s Changed

launch: add thinking capability detection to opencode by @hoyyeva in https://github.com/ollama/ollama/pull/15434
launch: auto-install Claude Code by @hoyyeva in https://github.com/ollama/ollama/pull/16802
launch: auto-install opencode when missing by @hoyyeva in https://github.com/ollama/ollama/pull/16806
discover: fix inverted iGPU/dGPU Vulkan classification on Windows hybrid graphics by @Sahil170595 in https://github.com/ollama/ollama/pull/16669
mlxrunner: unify and tune speculative decoding by @jessegross in https://github.com/ollama/ollama/pull/16791
launch/codex: detect model drift when Codex App UI switches by @BruceMacD in https://github.com/ollama/ollama/pull/16864
llama: add sm_86 architecture to cuda_v13_windows preset by @anishesg in https://github.com/ollama/ollama/pull/16834
llm: size mmproj offload by projector memory by @dhiltgen in https://github.com/ollama/ollama/pull/16866
docs: document max think level by @ParthSareen in https://github.com/ollama/ollama/pull/16877
llm: preserve generation headroom for shifted prompts by @ParthSareen in https://github.com/ollama/ollama/pull/16856
llama: default qwen2.5vl window attention metadata by @dhiltgen in https://github.com/ollama/ollama/pull/16868
llm: use host Vulkan loader on Windows by @dhiltgen in https://github.com/ollama/ollama/pull/16869
mlx: update and fix CUDA JIT packaging by @dhiltgen in https://github.com/ollama/ollama/pull/16871
llm: fix ollama ps double-counting mmap’d weights on partial offload by @discobot in https://github.com/ollama/ollama/pull/16709
docs: redesign docs landing and integrations overview by @hoyyeva in https://github.com/ollama/ollama/pull/16807
server: align generate with native chat templates by @dhiltgen in https://github.com/ollama/ollama/pull/16878
jetson: add CC 87 for CUDA v13 by @dhiltgen in https://github.com/ollama/ollama/pull/16628
llama.cpp version update by @dhiltgen in https://github.com/ollama/ollama/pull/16548

New Contributors

@Sahil170595 made their first contribution in https://github.com/ollama/ollama/pull/16669
@anishesg made their first contribution in https://github.com/ollama/ollama/pull/16834
@discobot made their first contribution in https://github.com/ollama/ollama/pull/16709

Full Changelog: https://github.com/ollama/ollama/compare/v0.30.10…v0.30.11-rc0

Original source ↗

Update Jun 17, 2026 tracked by Updatify

v0.30.10

What’s Changed

Command A and North family models now run on Apple Silicon with the MLX engine
Updated the underlying llama.cpp engine to build 9672
Fixed build artifacts for MLX

Full Changelog: https://github.com/ollama/ollama/compare/v0.30.9…v0.30.10

Original source ↗

Update Jun 15, 2026 tracked by Updatify

v0.30.9

What’s Changed

Support for Cohere2Moe architecture
Fixed LFM2 parser/render for cases where thinking was not emitted
Fixed issue where ollama launch claude and other coding agent or assistant use cases would only output one token
Ollama will now return an error if a single message is larger than the current context window

Full Changelog: https://github.com/ollama/ollama/compare/v0.30.8…v0.30.9-rc1

Original source ↗

Update Jun 12, 2026 tracked by Updatify

v0.30.8

What’s Changed

Fixed ollama launch selecting the wrong provider in some cases
Improved prompt caching by decoupling it from context shift for better KV cache reuse
More stable MLX inference with hardened linear and embedding layers
MLX runner now creates snapshots during prompt processing and speculative decoding for improved reliability
Improved recurrent model support with per-boundary states from the gated-delta kernels

Full Changelog: https://github.com/ollama/ollama/compare/v0.30.7…v0.30.8

Original source ↗

Update Jun 7, 2026 tracked by Updatify

v0.30.7

Ollama Launch now supports Hermes Desktop, a native desktop interface for the Hermes agent. Run it alongside your Hermes agent to get a visual interface for managing conversations, integrations, and messaging apps.

ollama launch hermes-desktop

What’s Changed

Hermes Desktop is now available via ollama launch hermes-desktop with native Windows configuration path support
OpenAI-compatible API models list now aligns with available model tags
Added documentation describing the llama.cpp update process
Updated Zod schema examples to use the native toJSONSchema helper

Full Changelog: https://github.com/ollama/ollama/compare/v0.30.6…v0.30.7

Original source ↗

Update Jun 5, 2026 tracked by Updatify

v0.30.6

New models

Gemma 4 QAT weights: the Gemma 4 family is now optimized with Quantization-Aware Training (QAT) to dramatically reduce memory requirements and maximize on-device performance. Look for the tags ending in -qat:
- gemma4:e2b-it-qat
- gemma4:e4b-it-qat
- gemma4:12b-it-qat
- gemma4:26b-a4b-it-qat
- gemma4:31b-it-qat

What’s Changed

ollama launch omp now integrates with Oh My Pi, an AI coding agent with IDE integration
MLX embedding layers now use NVFP4 global scale for improved quantization on Apple Silicon

Full Changelog: https://github.com/ollama/ollama/compare/v0.30.5…v0.30.6

Original source ↗

Update Jun 4, 2026 tracked by Updatify

v0.30.5

What’s Changed

Fixed the gemma4:12b floating point exception crash on x86, CUDA, Linux, and Windows systems.
ollama launch hermes-desktop now launches Hermes Desktop and can skip rebuilding when a packaged desktop app is already installed.
ollama launch hermes now supports native Windows installs through the Hermes PowerShell installer.
Added Cline CLI integration docs.

Full Changelog: https://github.com/ollama/ollama/compare/v0.30.4…v0.30.5

Original source ↗

Update Jun 3, 2026 tracked by Updatify

v0.30.4

New models

Nemotron-3-Ultra: NVIDIA Nemotron 3 Ultra is built for high-throughput reasoning and long-running agent workflows.

What’s Changed

Fixed multimodal models not using GPU on the llama.cpp backend can now use Metal GPU offload on Apple Silicon, improving multimodal performance on supported Macs.
ollama create --experimental now respects REQUIRES in Modelfiles for MLX-based models.
ollama launch codex now cleans up old conflicting Codex profile config before launching.
ollama launch pi now migrates users from the legacy Pi package to the official package and preserves the correct npm install prefix.
Pi web search setup now updates only when a newer package is available.
Windows cleanup now terminates the llama.cpp backend more reliably.
Updated the llama.cpp backend.

Known Issues

gemma4:12b crashes with floating point exception

Full Changelog: https://github.com/ollama/ollama/compare/v0.30.3…v0.30.4

Original source ↗

Update Jun 3, 2026 tracked by Updatify

v0.30.3

New models

Gemma 4 12B: high-performance multimodal intelligence that runs directly on laptops, combining efficiency with advanced reasoning.

What’s Changed

Added support for gemma4:12b.

Full Changelog: https://github.com/ollama/ollama/compare/v0.30.2…v0.30.3

Original source ↗

Update Jun 3, 2026 tracked by Updatify

v0.30.2

What’s Changed

ollama launch now supports Qwen Code and can guide users through installing the Cline CLI when it is missing.
ollama launch codex now uses an isolated launch configuration, avoiding conflicts with a user’s existing Codex settings.
Added llama.cpp backend compatibility support for Poolside’s Laguna architecture.
The llama.cpp backend now includes cached prompt tokens in token accounting, improving usage reporting for requests with prompt cache hits.
The llama.cpp backend now ignores SSE ping comments, improving streaming compatibility with newer backend behavior.
The llama.cpp backend now detects load stalls from server output so failed model loads surface more reliably instead of hanging.
Radeon 8060S integrated GPUs are now allowed by default.
Template details are included in logs to make troubleshooting model prompts easier.
Added Hermes Desktop configuration docs.
Fixed a build issue in the Laguna compatibility patch, restoring Laguna support in release builds.

Full Changelog: https://github.com/ollama/ollama/compare/v0.30.0…v0.30.2

Original source ↗

Update May 14, 2026 tracked by Updatify

v0.24.0

Codex App

Ollama 0.24 includes support for the Codex App, OpenAI’s desktop experience for working on Codex threads in parallel with built-in worktree support and git functionality.

ollama launch codex-app

Built-in browser

Codex can load local servers and sites in its built-in browser, enabling you to directly annotate on the page to request changes.

Review mode

Review code inside the app, leave comments, and iterate without leaving your workspace.

Choosing a model

For difficult coding and agentic tasks:

kimi-k2.6 (with vision support)
glm-5.1

For local use without an Ollama Cloud subscription:

nemotron-3-super
gemma4:31b
qwen3.6

Restore anytime

To restore the previous configuration of Codex App, run:

ollama launch codex-app --restore

What’s Changed

Reworked the MLX sampler for improved generation quality on Apple Silicon

Full Changelog: https://github.com/ollama/ollama/compare/v0.23.0…v0.24.0

Original source ↗

Update May 13, 2026 tracked by Updatify

v0.23.4

What’s Changed

ollama launch opencode now supports vision models with image inputs
Fixed formatting of Claude tool results when using local image paths

Full Changelog: https://github.com/ollama/ollama/compare/v0.23.3…v0.23.4

Original source ↗

Update May 13, 2026 tracked by Updatify

v0.30.0

Ollama 0.30 is now available, with improved compatibility and performance using llama.cpp. This augments the MLX engine on Apple Silicon, bringing support to a wider range of hardware.

This release brings support for a wider range of models, including GGUF-based models from Hugging Face and your own fine-tuned models along with faster performance on NVIDIA hardware.

Known issues:

laguna-xs.2 is not yet supported on Windows/Linux.
llama3.2-vision is not yet supported
nomic-embed-text now converts inputs to lowercase per the model card where prior Ollama versions incorrectly preserved mixed case

Original source ↗

Update May 12, 2026 tracked by Updatify

v0.23.3

What’s Changed

mlx: refined model push behavior by @dhiltgen in https://github.com/ollama/ollama/pull/15431
test: integration test hardening by @dhiltgen in https://github.com/ollama/ollama/pull/13532
app: harden update flows by @dhiltgen in https://github.com/ollama/ollama/pull/16100
mlx: update the imagegen runner for mlx thread affinity by @pdevine in https://github.com/ollama/ollama/pull/16096
mlx: avoid status timeout during inference by @dhiltgen in https://github.com/ollama/ollama/pull/16086
mlx: fix macOS 26 target leakage in v3 metallib by @dhiltgen in https://github.com/ollama/ollama/pull/16053

Full Changelog: https://github.com/ollama/ollama/compare/v0.23.2…v0.23.3

Original source ↗

Update May 7, 2026 tracked by Updatify

v0.23.2

What’s Changed

ollama launch no longer includes Claude Desktop due to the third-party integration being limited to Anthropic models.
Use ollama launch claude-desktop --restore to restore Claude Desktop to its normal state.
/api/show responses are now cached, improving median latency by ~6.7x which will increase load speed for integrations like VS Code.
Improved backup workflow when managing launch integrations
Cleaner image generation layout in the MLX runner

Full Changelog: https://github.com/ollama/ollama/compare/v0.23.1…v0.23.2

Original source ↗

Update May 5, 2026 tracked by Updatify

v0.23.1

Gemma 4 MTP (Multi-token Processing) for the MLX runner

Gemma 4 MTP speculative decoding is now supported on Macs. This can give over a 2x speed increase for the Gemma 4 31B model on coding tasks.

ollama run gemma4:31b-coding-mtp-bf16

What’s Changed

Update MLX and MLX-C with threading fixes by @dhiltgen in https://github.com/ollama/ollama/pull/15845
go: bump to 1.26 by @ParthSareen in https://github.com/ollama/ollama/pull/15904
Add Gemma 4 MTP speculative decoding by @pdevine in https://github.com/ollama/ollama/pull/15980

Full Changelog: https://github.com/ollama/ollama/compare/v0.23.0…v0.23.1

Original source ↗

Update May 3, 2026 tracked by Updatify

v0.23.0

Claude Desktop

Claude Desktop is now supported with Ollama Launch.

Claude Cowork and Claude Code are supported within the Claude Desktop App.

ollama launch claude-desktop

Claude Cowork

Claude Code

Claude Code on the terminal can still be accessed through the CLI with:

ollama launch claude

Not supported yet

Web Search (coming soon)
Extensions

What’s Changed

Launch Claude Desktop with ollama launch claude-desktop
The Ollama app now surfaces featured models from server-driven recommendations
Fixed OpenClaw gateway timeout on Windows by enforcing IPv4 loopback (thanks @UniquePratham)
Hardened Metal initialization to gracefully handle ggml kernel compilation failures

New Contributors

@UniquePratham made their first contribution in https://github.com/ollama/ollama/pull/15726

Full Changelog: https://github.com/ollama/ollama/compare/v0.22.1…v0.23.0

Original source ↗

Update Apr 28, 2026 tracked by Updatify

v0.22.1

What’s Changed

Updated the Gemma 4 renderer for thinking and tool calling improvements
Model recommendations are now updated without updating Ollama
Aligned the desktop app’s launch page with ollama launch integrations
Fixed the Poolside integration title in ollama launch

Full Changelog: https://github.com/ollama/ollama/compare/v0.22.0…v0.22.1

Original source ↗

Update Apr 28, 2026 tracked by Updatify

v0.22.0

New models

NVIDIA’s Nemotron 3 Omni
Poolside’s first open-weight coding model - Laguna XS.2

Full Changelog: https://github.com/ollama/ollama/compare/v0.21.2…v0.22.0

Original source ↗

Update Apr 24, 2026 tracked by Updatify

v0.21.3

What’s Changed

api: accept “max” as a think value by @ParthSareen in https://github.com/ollama/ollama/pull/15787
openai: map responses reasoning effort to think by @ParthSareen in https://github.com/ollama/ollama/pull/15789

Full Changelog: https://github.com/ollama/ollama/compare/v0.21.2…v0.21.3-rc0

Original source ↗

Update Apr 23, 2026 tracked by Updatify

v0.21.2

What’s Changed

Improved reliability of the OpenClaw onboarding flow in ollama launch
Recommended models in ollama launch now appear in a fixed, canonical order
OpenClaw integration now bundles Ollama’s web search plugin in OpenClaw

New Contributors

@madflow made their first contribution in https://github.com/ollama/ollama/pull/15733

Full Changelog: https://github.com/ollama/ollama/compare/v0.21.1…v0.21.2

Original source ↗

Update Apr 22, 2026 tracked by Updatify

v0.21.1

What’s Changed

Kimi CLI

You can now install and run the Kimi CLI through Ollama.

ollama launch kimi --model kimi-k2.6:cloud

Kimi CLI with Kimi K2.6 excels at long horizon agentic execution tasks through a multi-agent system.

MLX runner adds logprobs support for compatible models
Faster MLX sampling with fused top-P and top-K in a single sort pass, plus repeat penalties applied in the sampler
Improved MLX prompt tokenization by moving tokenization into request handler goroutines
Better MLX thread safety for array management
GLM4 MoE Lite performance improvement with a fused sigmoid router head
Fixed model picker showing stale model after switching chats in the macOS app
Fixed structured outputs for Gemma 4 when think=false

Full Changelog: https://github.com/ollama/ollama/compare/v0.21.0…v0.21.1

Original source ↗

Update Apr 16, 2026 tracked by Updatify

v0.21.0

Hermes Agent

ollama launch hermes

Hermes learns with you, automatically creating skills to better serve your workflows. Great for research and engineering tasks.

What’s Changed

Gemma 4 on MLX. Added support for running Gemma 4 via MLX on Apple Silicon, including a text-only MLX runtime for the model. The MLX backend also picked up mixed-precision quantization, better capability detection, and a batch of new op wrappers (Conv2d, Pad, activations, trig, masked SDPA, and RoPE-with-freqs).
Hermes and GitHub Copilot CLI in ollama launch. Added both integrations, which can now be configured in one command alongside the rest of the supported coding agents.
OpenCode moved to inline config. ollama launch opencode now writes its config inline rather than to a separate file, matching how other integrations are handled.
ollama launch no longer rewrites config when nothing changed. Pressing → on a configured multi-model integration, or passing --model with the current primary, used to trigger a confirmation prompt and rewrite both the editor’s config file and config.json. Now it’s a no-op when the resolved model list matches what’s already saved.
Fixed ollama launch openclaw --yes so it correctly skips the channels configuration step, so non-interactive setups complete cleanly.
Restored the Gemma 4 nothink renderer with the e2b-style prompt.
Fixed the Gemma 4 compiler error that was breaking Metal builds.
Fixed macOS cross-compiles so they no longer trigger generate, which was breaking cmake builds on some Xcode versions.
Quieted cgo builds by suppressing deprecated warnings during go build.

Full Changelog: https://github.com/ollama/ollama/compare/v0.20.7…v0.21.0

Original source ↗

Update Apr 13, 2026 tracked by Updatify

v0.20.7

What’s Changed

Fix quality of gemma:e2b and gemma:e4b when thinking is disabled
ROCm: Update to ROCm 7.2.1 on Linux by @saman-amd in https://github.com/ollama/ollama/pull/15483

Full Changelog: https://github.com/ollama/ollama/compare/v0.20.6…v0.20.7

Original source ↗

Update Apr 12, 2026 tracked by Updatify

v0.20.6

What’s Changed

Gemma 4 tool calling ability is improved and updated to use Google’s latest post-launch fixes
Parallel tool calling improved for streaming responses
Hermes agent Ollama integration guide is now available
Ollama app is updated to fix image attachment errors

New Contributors

@matteocelani made their first contribution in #15272

Full Changelog: https://github.com/ollama/ollama/compare/v0.20.5…v0.20.6

Original source ↗

Update Apr 9, 2026 tracked by Updatify

v0.20.5

OpenClaw channel setup with `ollama launch`

What’s Changed

OpenClaw channel setup: connect WhatsApp, Telegram, Discord, and other messaging channels through ollama launch openclaw
Enable flash attention for Gemma 4 on compatible GPUs
ollama launch opencode now detects curl-based OpenCode installs at ~/.opencode/bin
Fix /save command for models imported from safetensors

New Contributors

@sjhddh made their first contribution in https://github.com/ollama/ollama/pull/15424

Full Changelog: https://github.com/ollama/ollama/compare/v0.20.4…v0.20.5

Original source ↗

Update Apr 7, 2026 tracked by Updatify

v0.20.4

What’s Changed

mlx: Improve M5 performance with NAX
gemma4: enable flash attention

Full Changelog: https://github.com/ollama/ollama/compare/v0.20.3…v0.20.4

Original source ↗

Update Apr 7, 2026 tracked by Updatify

v0.20.3

What’s Changed

Gemma 4 Tool Calling improvements
Added latest models to Ollama App
OpenClaw fixes for launching TUI

Full Changelog: https://github.com/ollama/ollama/compare/v0.20.2…v0.20.3

Original source ↗

Update Apr 4, 2026 tracked by Updatify

v0.20.2

What’s Changed

app: default app home view to new chat instead of launch by @jmorganca in https://github.com/ollama/ollama/pull/15312

Full Changelog: https://github.com/ollama/ollama/compare/v0.20.1…v0.20.2

Original source ↗

Update Apr 2, 2026 tracked by Updatify

v0.20.0

Gemma 4

Effective 2B (E2B)

ollama run gemma4:e2b

Effective 4B (E4B)

ollama run gemma4:e4b

26B (Mixture of Experts model with 4B active parameters)

ollama run gemma4:26b

31B (Dense)

ollama run gemma4:31b

What’s Changed

docs: update pi docs by @ParthSareen in https://github.com/ollama/ollama/pull/15152
mlx: respect tokenizer add_bos_token setting in pipeline by @dhiltgen in https://github.com/ollama/ollama/pull/15185
tokenizer: add SentencePiece-style BPE support by @dhiltgen in https://github.com/ollama/ollama/pull/15162

Full Changelog: https://github.com/ollama/ollama/compare/v0.19.0…v0.20.0-rc0

Original source ↗

Update Mar 27, 2026 tracked by Updatify

v0.19.0

Ollama is now powered by MLX on Apple Silicon in preview

Ollama on Apple silicon is now built on top of Apple’s machine learning framework, MLX, to take advantage of its unified memory architecture.

https://github.com/user-attachments/assets/600297b0-3167-46a5-8e3a-fefda3a51b84

What’s Changed

Ollama’s app will now no longer incorrectly show “model is out of date”
ollama launch pi now includes web search plugin that uses Ollama’s web search
Improved KV cache hit rate when using the Anthropic-compatible API
Fixed tool call parsing issue with Qwen3.5 where tool calls would be output in thinking
MLX runner will now create periodic snapshots during prompt processing
Fixed KV cache snapshot memory leak in MLX runner
Fixed issue where flash attention would be incorrectly enabled for grok models
Fixed qwen3-next:80b not loading in Ollama

New Contributors

@amatas made their first contribution in https://github.com/ollama/ollama/pull/15022

Full Changelog: https://github.com/ollama/ollama/compare/v0.18.3…v0.19.0

Original source ↗

Update Mar 26, 2026 tracked by Updatify

v0.18.4

What’s Changed

ggml: force flash attention off for grok by @rick-github in https://github.com/ollama/ollama/pull/15050
mlx: fix KV cache snapshot memory leak by @jessegross in https://github.com/ollama/ollama/pull/15065
mlxrunner: schedule periodic snapshots during prefill by @jessegross in https://github.com/ollama/ollama/pull/15058
doc: update vscode doc by @hoyyeva in https://github.com/ollama/ollama/pull/15064

Full Changelog: https://github.com/ollama/ollama/compare/v0.18.3…v0.18.4-rc0

Original source ↗

Update Mar 25, 2026 tracked by Updatify

v0.18.3

Visual Studio Code

Microsoft Visual Studio Code now directly integrates with Ollama via GitHub Copilot.

If you have Ollama installed, any local or cloud model from Ollama can be selected for use within visual studio code.

Ollama screenshot 2026-03-26 at 01 43 57@2x

What’s Changed

GLM parser improvements for tool calls
OpenClaw integration improvements for gateway checks

Full Changelog: https://github.com/ollama/ollama/compare/v0.18.2…v0.18.3

Original source ↗

Update Mar 18, 2026 tracked by Updatify

v0.18.2

What’s Changed

Add extra check to ensure npm and git are installed before installing OpenClaw
Claude Code will now be faster when run locally, due to preventing cache breakages
Fix to correctly support ollama launch openclaw --model <model>
Register Ollama’s websearch package correctly for OpenClaw

Full Changelog: https://github.com/ollama/ollama/compare/v0.18.1…v0.18.2

Original source ↗

Update Mar 17, 2026 tracked by Updatify

v0.18.1

Web Search and Fetch in OpenClaw

Ollama now ships with web search and web fetch plugin for OpenClaw. This allows Ollama’s models (local or cloud) to search the web for the latest content and news. This also allows OpenClaw with Ollama to be able to fetch the web and extract readable content for processing. This feature does not execute JavaScript.

When using local models with web search in OpenClaw, ensure you are signed into Ollama with ollama signin

ollama launch openclaw

You can install web search directly into OpenClaw as a plugin if you already have OpenClaw configured and working:

Ollama web search plugin

openclaw plugins install @ollama/openclaw-web-search

Non-interactive (headless) mode for ollama launch

ollama launch can now run in non-interactive mode.

Perfect for:

Docker/containers: spin up an integration as a pipeline step to run evals, test prompts, or validate model behavior as part of your build. Tear it down when the job ends.
CI/CD: Generate code reviews, security checks, and other tasks within your CI
Scripts/automation: Kick off automated tasks with Ollama and claude code
--model must be specified to run in headless mode
--yes flag will auto-pull the model and skip any selectors

Try with: ollama launch claude --model kimi-k2.5:cloud --yes -- -p "how does this repository work?"

Use non-interactive mode in OpenClaw

You can ask your OpenClaw to run tasks using claude with subagents:

ollama launch claude --model kimi-k2.5:cloud --yes -- -p "how does this repository work?" using a subagent

What’s Changed

ollama launch openclaw will now use the official Ollama auth and model provider for OpenClaw
Improvements to Ollama’s benchmarking tool in ./cmd/bench
ollama launch openclaw will now skip --install-daemon when systemd is unavailable

Full Changelog: https://github.com/ollama/ollama/compare/v0.18.0…v0.18.1

Original source ↗

Update Mar 14, 2026 tracked by Updatify

v0.18.0

Ollama 0.18 includes improved performance for OpenClaw and Ollama’s cloud models, including the new Nemotron-3-Super model by NVIDIA designed for high-performance agentic reasoning tasks.

Improved OpenClaw performance with Kimi-K2.5

This release of Ollama improves performance of cloud models and their reliability.

Up to 2x faster speeds with Kimi-K2.5
Tool calling accuracy has been improved

ollama launch openclaw --model kimi-k2.5

Ollama is now a provider in OpenClaw

Ollama can now be selected as an authentication and model provider during OpenClaw onboarding (thanks @BruceMacD for contributing and @steipete for reviewing!)

openclaw onboard --auth-choice ollama

More information: https://docs.openclaw.ai/providers/ollama

Nemotron-3-Super

Nemotron-3-Super: is a new 122B parameter model with strong reasoning and tool calling capability, while having top performance when run on modern hardware:

ollama run nemotron-3-super:cloud
ollama run nemotron-3-super to run locally (requires 96GB+ of VRAM)

Nemotron-3-Super scores highest of any open model on PinchBench, a benchmark suite that measures how successful models are at completing tasks when used with OpenClaw.

ollama launch openclaw --model nemotron-3-super:cloud

Or using OpenClaw’s onboarding:

openclaw onboard \
	--auth-choice ollama \
	--custom-model-id nemotron-3-super:cloud

Non-interactive task support

ollama launch now supports non-interactive tasks by passing in --yes. This enables using Claude, Codex, Pi and more in scripts, GitHub Actions, and other non-interactive environments.

ollama launch claude \
	--model glm-5:cloud \
	--yes \
	-- "Do a quick code review of this pull request and respond on GitHub with a comment summarizing your feedback."

Lower latency on MiniMax-M2.5 and Qwen3.5 on Ollama’s cloud

For customers in North America, MiniMax-M2.5 and Qwen3.5 on Ollama’s cloud now respond much faster, up to 10x and up to 2x faster respectively, and often in less than a second. This is ideal for tasks that require a fast Time To First Token (TTFT) when needing quick answers from OpenClaw or quick back-to-back coding tasks.

ollama launch claude --model minimax-m2.5

Driver updates required for ROCm 7

This version of Ollama ships with ROCm 7, and requires updating drivers to the latest version for continued support.

What’s Changed

Ollama’s cloud models no longer require downloading via ollama pull. Setting :cloud as a tag will now automatically connect to cloud models.
New --yes flag for ollama launch that skips all prompts, making it possible to run AI assistants and other tools in non-interactive environments
Fixed issue where “Reset to Defaults” in Ollama’s app would disable downloading automatic updates.
Ollama will now ensure context compaction occurs at the correct context length for each model when using ollama launch claude

New Contributors

@flipbit03 made their first contribution in https://github.com/ollama/ollama/pull/14821
@shivamtiwari3 made their first contribution in https://github.com/ollama/ollama/pull/14825

Full Changelog: https://github.com/ollama/ollama/compare/v0.17.7…v0.18.0

Original source ↗

Update Mar 10, 2026 tracked by Updatify

v0.17.8

What’s Changed

parsers: repair unclosed arg_value tags in GLM tool calls by @BruceMacD in https://github.com/ollama/ollama/pull/14656
Reapply “don’t require pulling stubs for cloud models” again by @jmorganca in https://github.com/ollama/ollama/pull/14608
docs: format compat docs by @mxyng in https://github.com/ollama/ollama/pull/14678
create: fix localhost handling by @dhiltgen in https://github.com/ollama/ollama/pull/14681
build: smarter docker parallelism by @dhiltgen in https://github.com/ollama/ollama/pull/14653
mlx: int4 groupsize 64 by @pdevine in https://github.com/ollama/ollama/pull/14682
cloud_proxy: handle stream disconnects gracefully by @drifkin in https://github.com/ollama/ollama/pull/14685
x/mlxrunner: replace sampler interface chain with single stateful Sampler by @pdevine in https://github.com/ollama/ollama/pull/14652
rocm: update linux to v7.2 by @dhiltgen in https://github.com/ollama/ollama/pull/14391
app: fix reset to defaults disabling auto-update by @hoyyeva in https://github.com/ollama/ollama/pull/14741
mlx: get parameters from modelfile during model creation by @pdevine in https://github.com/ollama/ollama/pull/14747
MLX: add header vendoring and remove go build tag by @dhiltgen in https://github.com/ollama/ollama/pull/14642
ci: Fix windows build by @dhiltgen in https://github.com/ollama/ollama/pull/14754

Full Changelog: https://github.com/ollama/ollama/compare/v0.17.7…v0.17.8-rc1

Original source ↗

Update Mar 5, 2026 tracked by Updatify

v0.17.7

What’s Changed

Allow thinking levels such as "medium" to correctly interpreted in Ollama’s API for all thinking models
Add context length to support compaction when using ollama launch

Full Changelog: https://github.com/ollama/ollama/compare/v0.17.6…v0.17.7

Original source ↗

Update Mar 4, 2026 tracked by Updatify

v0.17.6

What’s Changed

Fixed issue where GLM-OCR would not work due to incorrect prompt rendering
Fixed tool calling parsing and rendering for Qwen 3.5 models

New Contributors

@Victor-Quqi made their first contribution in https://github.com/ollama/ollama/pull/14584

Full Changelog: https://github.com/ollama/ollama/compare/v0.17.5…v0.17.6

Original source ↗

Update Mar 2, 2026 tracked by Updatify

v0.17.5

New models

Qwen3.5: the small Qwen 3.5 model series is now available in 0.8B, 2B, 4B and 9B parameter sizes.

What’s Changed

Fixed crash in Qwen 3.5 models when split over GPU & CPU
Fixed issue where Qwen 3.5 models would repeat themselves due to no presence penalty (note: you may have to redownload the qwen3.5 models: ollama pull qwen3.5:35b for example)
ollama run --verbose will now show peak memory usage when using Ollama’s MLX engine
Fixed memory issues and crashes in MLX runner
Fixed issue where Ollama would not be able to run models imported from Qwen3.5 GGUF files

Full Changelog: https://github.com/ollama/ollama/compare/v0.17.4…v0.17.5

Original source ↗

Update Feb 27, 2026 tracked by Updatify

v0.17.4

New models

Qwen 3.5: a family of open-source multimodal models that delivers exceptional utility and performance.
LFM 2: LFM2 is a family of hybrid models designed for on-device deployment. LFM2-24B-A2B is the largest model in the family, scaling the architecture to 24 billion parameters while keeping inference efficient.

Note: for users on 0.17.1, this version will not automatically update. Re-downloading is required to receive the latest version of Ollama.

What’s Changed

Tool call indices will now be included in parallel tool calls

Full Changelog: https://github.com/ollama/ollama/compare/v0.17.3…v0.17.4

Original source ↗

Update Feb 27, 2026 tracked by Updatify

v0.17.3

What’s Changed

Fixed issue where tool calls in the Qwen 3 and Qwen 3.5 model families would not be parsed correctly if emitted during thinking

Full Changelog: https://github.com/ollama/ollama/compare/v0.17.2…v0.17.3

Original source ↗

Update Feb 26, 2026 tracked by Updatify

v0.17.2

What’s Changed

Fixed issue where Ollama’s app on Windows would crash when a new update has been downloaded

Full Changelog: https://github.com/ollama/ollama/compare/v0.17.1…v0.17.2

Original source ↗

Update Feb 24, 2026 tracked by Updatify

v0.17.1

What’s Changed

Nemotron architecture support in Ollama’s engine
MLX engine now has improved memory usage
Ollama’s app will now allow models that support tools to use web search capabilities
Improved LFM2 and LFM2.5 models in Ollama’s engine
ollama create will no longer default to affine quantization for unquantized models when using the MLX engine
Added configuration for disabling automatic update downloading

Full Changelog: https://github.com/ollama/ollama/compare/v0.17.0…v0.17.1

Original source ↗

Update Feb 21, 2026 tracked by Updatify

v0.17.0

OpenClaw

OpenClaw can now be installed and configured automatically via Ollama, making it the easiest way to get up and running with OpenClaw with open models like Kimi-K2.5, GLM-5, and Minimax-M2.5.

Get started

ollama launch openclaw

Web search in OpenClaw

When using cloud models, websearch is enabled - allowing OpenClaw to search the internet.

What’s Changed

Improved tokenizer performance
Ollama’s macOS and Windows apps will now default to a context length based on available VRAM

New Contributors

@natl-set made their first contribution in https://github.com/ollama/ollama/pull/14322

Full Changelog: https://github.com/ollama/ollama/compare/v0.16.3…v0.17.0

Original source ↗

What’s Changed

What’s Changed

What’s Changed

New Contributors

Faster Gemma 4 on Apple Silicon

What’s Changed

What’s Changed

What’s Changed

New Contributors

What’s Changed

What’s Changed

What’s Changed

New models

What’s Changed

What’s Changed

New models

What’s Changed

Known Issues

New models

What’s Changed

What’s Changed

Codex App

Built-in browser

Review mode

Choosing a model

Restore anytime

What’s Changed

What’s Changed

Known issues:

What’s Changed

What’s Changed

Gemma 4 MTP (Multi-token Processing) for the MLX runner

What’s Changed

Claude Desktop

Claude Cowork

Claude Code

Not supported yet

What’s Changed

New Contributors

What’s Changed

New models

What’s Changed

What’s Changed

New Contributors

What’s Changed

Kimi CLI

Hermes Agent

What’s Changed

What’s Changed

What’s Changed

New Contributors

OpenClaw channel setup with ollama launch

What’s Changed

New Contributors

v0.20.4

What’s Changed

What’s Changed

What’s Changed

Gemma 4

What’s Changed

Ollama is now powered by MLX on Apple Silicon in preview

What’s Changed

New Contributors

What’s Changed

Visual Studio Code

What’s Changed

What’s Changed

Web Search and Fetch in OpenClaw

Ollama web search plugin

Non-interactive (headless) mode for ollama launch

Use non-interactive mode in OpenClaw

What’s Changed

Improved OpenClaw performance with Kimi-K2.5

Ollama is now a provider in OpenClaw

Nemotron-3-Super

Non-interactive task support

Lower latency on MiniMax-M2.5 and Qwen3.5 on Ollama’s cloud

Driver updates required for ROCm 7

What’s Changed

New Contributors

OpenClaw channel setup with `ollama launch`