Updatify / Ollama | Release notes

Create your changelog

Open-source framework that allows you to download, set up, and run Large Language Models (LLMs)—like Llama, Mistral, and DeepSeek

Update Jun 5, 2026 tracked by Updatify

v0.30.6

New models

  • Gemma 4 QAT weights: the Gemma 4 family is now optimized with Quantization-Aware Training (QAT) to dramatically reduce memory requirements and maximize on-device performance. Look for the tags ending in -qat:
    • gemma4:e2b-it-qat
    • gemma4:e4b-it-qat
    • gemma4:12b-it-qat
    • gemma4:26b-a4b-it-qat
    • gemma4:31b-it-qat

What’s Changed

  • ollama launch omp now integrates with Oh My Pi, an AI coding agent with IDE integration
  • MLX embedding layers now use NVFP4 global scale for improved quantization on Apple Silicon

Full Changelog: https://github.com/ollama/ollama/compare/v0.30.5…v0.30.6

Update Jun 4, 2026 tracked by Updatify

v0.30.5

What’s Changed

  • Fixed the gemma4:12b floating point exception crash on x86, CUDA, Linux, and Windows systems.
  • ollama launch hermes-desktop now launches Hermes Desktop and can skip rebuilding when a packaged desktop app is already installed.
  • ollama launch hermes now supports native Windows installs through the Hermes PowerShell installer.
  • Added Cline CLI integration docs.

Full Changelog: https://github.com/ollama/ollama/compare/v0.30.4…v0.30.5

Update Jun 3, 2026 tracked by Updatify

v0.30.4

New models

  • Nemotron-3-Ultra: NVIDIA Nemotron 3 Ultra is built for high-throughput reasoning and long-running agent workflows.

What’s Changed

  • Fixed multimodal models not using GPU on the llama.cpp backend can now use Metal GPU offload on Apple Silicon, improving multimodal performance on supported Macs.
  • ollama create --experimental now respects REQUIRES in Modelfiles for MLX-based models.
  • ollama launch codex now cleans up old conflicting Codex profile config before launching.
  • ollama launch pi now migrates users from the legacy Pi package to the official package and preserves the correct npm install prefix.
  • Pi web search setup now updates only when a newer package is available.
  • Windows cleanup now terminates the llama.cpp backend more reliably.
  • Updated the llama.cpp backend.

Known Issues

  • gemma4:12b crashes with floating point exception

Full Changelog: https://github.com/ollama/ollama/compare/v0.30.3…v0.30.4

Update Jun 3, 2026 tracked by Updatify

v0.30.2

What’s Changed

  • ollama launch now supports Qwen Code and can guide users through installing the Cline CLI when it is missing.
  • ollama launch codex now uses an isolated launch configuration, avoiding conflicts with a user’s existing Codex settings.
  • Added llama.cpp backend compatibility support for Poolside’s Laguna architecture.
  • The llama.cpp backend now includes cached prompt tokens in token accounting, improving usage reporting for requests with prompt cache hits.
  • The llama.cpp backend now ignores SSE ping comments, improving streaming compatibility with newer backend behavior.
  • The llama.cpp backend now detects load stalls from server output so failed model loads surface more reliably instead of hanging.
  • Radeon 8060S integrated GPUs are now allowed by default.
  • Template details are included in logs to make troubleshooting model prompts easier.
  • Added Hermes Desktop configuration docs.
  • Fixed a build issue in the Laguna compatibility patch, restoring Laguna support in release builds.

Full Changelog: https://github.com/ollama/ollama/compare/v0.30.0…v0.30.2

Update May 14, 2026 tracked by Updatify

v0.24.0

Codex App

Ollama 0.24 includes support for the Codex App, OpenAI’s desktop experience for working on Codex threads in parallel with built-in worktree support and git functionality.

ollama launch codex-app
CleanShot 2026-05-14 at 15 04 18@2x

Built-in browser

Codex can load local servers and sites in its built-in browser, enabling you to directly annotate on the page to request changes.

codex-annotate copy

Review mode

Review code inside the app, leave comments, and iterate without leaving your workspace.

codex-comments copy 2

Choosing a model

For difficult coding and agentic tasks:

  • kimi-k2.6 (with vision support)
  • glm-5.1

For local use without an Ollama Cloud subscription:

  • nemotron-3-super
  • gemma4:31b
  • qwen3.6

Restore anytime

To restore the previous configuration of Codex App, run:

ollama launch codex-app --restore

What’s Changed

  • Reworked the MLX sampler for improved generation quality on Apple Silicon

Full Changelog: https://github.com/ollama/ollama/compare/v0.23.0…v0.24.0

Update May 13, 2026 tracked by Updatify

v0.30.0

Ollama 0.30 is now available, with improved compatibility and performance using llama.cpp. This augments the MLX engine on Apple Silicon, bringing support to a wider range of hardware.

This release brings support for a wider range of models, including GGUF-based models from Hugging Face and your own fine-tuned models along with faster performance on NVIDIA hardware.

Known issues:

  • laguna-xs.2 is not yet supported on Windows/Linux.
  • llama3.2-vision is not yet supported
  • nomic-embed-text now converts inputs to lowercase per the model card where prior Ollama versions incorrectly preserved mixed case

Update May 12, 2026 tracked by Updatify

v0.23.3

What’s Changed

Full Changelog: https://github.com/ollama/ollama/compare/v0.23.2…v0.23.3

Update May 7, 2026 tracked by Updatify

v0.23.2

What’s Changed

  • ollama launch no longer includes Claude Desktop due to the third-party integration being limited to Anthropic models.
  • Use ollama launch claude-desktop --restore to restore Claude Desktop to its normal state.
  • /api/show responses are now cached, improving median latency by ~6.7x which will increase load speed for integrations like VS Code.
  • Improved backup workflow when managing launch integrations
  • Cleaner image generation layout in the MLX runner

Full Changelog: https://github.com/ollama/ollama/compare/v0.23.1…v0.23.2

Update May 5, 2026 tracked by Updatify

v0.23.1

Gemma 4 MTP (Multi-token Processing) for the MLX runner

Gemma 4 MTP speculative decoding is now supported on Macs. This can give over a 2x speed increase for the Gemma 4 31B model on coding tasks.

ollama run gemma4:31b-coding-mtp-bf16

What’s Changed

Full Changelog: https://github.com/ollama/ollama/compare/v0.23.0…v0.23.1

Update May 3, 2026 tracked by Updatify

v0.23.0

Claude Desktop

Claude Desktop is now supported with Ollama Launch.

Claude Cowork and Claude Code are supported within the Claude Desktop App.

ollama launch claude-desktop

Claude Cowork

ca1

Claude Code

ca2

Claude Code on the terminal can still be accessed through the CLI with:

ollama launch claude

Not supported yet

  • Web Search (coming soon)
  • Extensions

What’s Changed

  • Launch Claude Desktop with ollama launch claude-desktop
  • The Ollama app now surfaces featured models from server-driven recommendations
  • Fixed OpenClaw gateway timeout on Windows by enforcing IPv4 loopback (thanks @UniquePratham)
  • Hardened Metal initialization to gracefully handle ggml kernel compilation failures

New Contributors

Full Changelog: https://github.com/ollama/ollama/compare/v0.22.1…v0.23.0

Update Apr 22, 2026 tracked by Updatify

v0.21.1

What’s Changed

Kimi CLI

You can now install and run the Kimi CLI through Ollama.

ollama launch kimi --model kimi-k2.6:cloud

Kimi CLI with Kimi K2.6 excels at long horizon agentic execution tasks through a multi-agent system.

  • MLX runner adds logprobs support for compatible models
  • Faster MLX sampling with fused top-P and top-K in a single sort pass, plus repeat penalties applied in the sampler
  • Improved MLX prompt tokenization by moving tokenization into request handler goroutines
  • Better MLX thread safety for array management
  • GLM4 MoE Lite performance improvement with a fused sigmoid router head
  • Fixed model picker showing stale model after switching chats in the macOS app
  • Fixed structured outputs for Gemma 4 when think=false

Full Changelog: https://github.com/ollama/ollama/compare/v0.21.0…v0.21.1

Update Apr 16, 2026 tracked by Updatify

v0.21.0

Hermes Agent

ollama launch hermes

Hermes learns with you, automatically creating skills to better serve your workflows. Great for research and engineering tasks.

image

What’s Changed

  • Gemma 4 on MLX. Added support for running Gemma 4 via MLX on Apple Silicon, including a text-only MLX runtime for the model. The MLX backend also picked up mixed-precision quantization, better capability detection, and a batch of new op wrappers (Conv2d, Pad, activations, trig, masked SDPA, and RoPE-with-freqs).
  • Hermes and GitHub Copilot CLI in ollama launch. Added both integrations, which can now be configured in one command alongside the rest of the supported coding agents.
  • OpenCode moved to inline config. ollama launch opencode now writes its config inline rather than to a separate file, matching how other integrations are handled.
  • ollama launch no longer rewrites config when nothing changed. Pressing → on a configured multi-model integration, or passing --model with the current primary, used to trigger a confirmation prompt and rewrite both the editor’s config file and config.json. Now it’s a no-op when the resolved model list matches what’s already saved.
  • Fixed ollama launch openclaw --yes so it correctly skips the channels configuration step, so non-interactive setups complete cleanly.
  • Restored the Gemma 4 nothink renderer with the e2b-style prompt.
  • Fixed the Gemma 4 compiler error that was breaking Metal builds.
  • Fixed macOS cross-compiles so they no longer trigger generate, which was breaking cmake builds on some Xcode versions.
  • Quieted cgo builds by suppressing deprecated warnings during go build.

Full Changelog: https://github.com/ollama/ollama/compare/v0.20.7…v0.21.0

Update Apr 9, 2026 tracked by Updatify

v0.20.5

OpenClaw channel setup with ollama launch

image

What’s Changed

  • OpenClaw channel setup: connect WhatsApp, Telegram, Discord, and other messaging channels through ollama launch openclaw
  • Enable flash attention for Gemma 4 on compatible GPUs
  • ollama launch opencode now detects curl-based OpenCode installs at ~/.opencode/bin
  • Fix /save command for models imported from safetensors

New Contributors

Full Changelog: https://github.com/ollama/ollama/compare/v0.20.4…v0.20.5

Update Apr 2, 2026 tracked by Updatify

v0.20.0

Gemma 4

Gemma 4

Effective 2B (E2B)

ollama run gemma4:e2b

Effective 4B (E4B)

ollama run gemma4:e4b

26B (Mixture of Experts model with 4B active parameters)

ollama run gemma4:26b

31B (Dense)

ollama run gemma4:31b

What’s Changed

Full Changelog: https://github.com/ollama/ollama/compare/v0.19.0…v0.20.0-rc0

Update Mar 27, 2026 tracked by Updatify

v0.19.0

image

Ollama is now powered by MLX on Apple Silicon in preview

Ollama on Apple silicon is now built on top of Apple’s machine learning framework, MLX, to take advantage of its unified memory architecture.

https://github.com/user-attachments/assets/600297b0-3167-46a5-8e3a-fefda3a51b84

Read more: https://ollama.com/blog/mlx

What’s Changed

  • Ollama’s app will now no longer incorrectly show “model is out of date”
  • ollama launch pi now includes web search plugin that uses Ollama’s web search
  • Improved KV cache hit rate when using the Anthropic-compatible API
  • Fixed tool call parsing issue with Qwen3.5 where tool calls would be output in thinking
  • MLX runner will now create periodic snapshots during prompt processing
  • Fixed KV cache snapshot memory leak in MLX runner
  • Fixed issue where flash attention would be incorrectly enabled for grok models
  • Fixed qwen3-next:80b not loading in Ollama

New Contributors

Full Changelog: https://github.com/ollama/ollama/compare/v0.18.3…v0.19.0

Update Mar 26, 2026 tracked by Updatify

v0.18.4

What’s Changed

Full Changelog: https://github.com/ollama/ollama/compare/v0.18.3…v0.18.4-rc0

Update Mar 17, 2026 tracked by Updatify

v0.18.1

Web Search and Fetch in OpenClaw

Ollama now ships with web search and web fetch plugin for OpenClaw. This allows Ollama’s models (local or cloud) to search the web for the latest content and news. This also allows OpenClaw with Ollama to be able to fetch the web and extract readable content for processing. This feature does not execute JavaScript.

When using local models with web search in OpenClaw, ensure you are signed into Ollama with ollama signin

ollama launch openclaw

You can install web search directly into OpenClaw as a plugin if you already have OpenClaw configured and working:

Ollama web search plugin

openclaw plugins install @ollama/openclaw-web-search

Non-interactive (headless) mode for ollama launch

ollama launch can now run in non-interactive mode.

Perfect for:

  • Docker/containers: spin up an integration as a pipeline step to run evals, test prompts, or validate model behavior as part of your build. Tear it down when the job ends.

  • CI/CD: Generate code reviews, security checks, and other tasks within your CI

  • Scripts/automation: Kick off automated tasks with Ollama and claude code

  • --model must be specified to run in headless mode

  • --yes flag will auto-pull the model and skip any selectors

Try with: ollama launch claude --model kimi-k2.5:cloud --yes -- -p "how does this repository work?"

Use non-interactive mode in OpenClaw

You can ask your OpenClaw to run tasks using claude with subagents:

ollama launch claude --model kimi-k2.5:cloud --yes -- -p "how does this repository work?" using a subagent

What’s Changed

  • ollama launch openclaw will now use the official Ollama auth and model provider for OpenClaw
  • Improvements to Ollama’s benchmarking tool in ./cmd/bench
  • ollama launch openclaw will now skip --install-daemon when systemd is unavailable

Full Changelog: https://github.com/ollama/ollama/compare/v0.18.0…v0.18.1

Update Mar 14, 2026 tracked by Updatify

v0.18.0

Ollama 0.18 includes improved performance for OpenClaw and Ollama’s cloud models, including the new Nemotron-3-Super model by NVIDIA designed for high-performance agentic reasoning tasks.

Improved OpenClaw performance with Kimi-K2.5

This release of Ollama improves performance of cloud models and their reliability.

  • Up to 2x faster speeds with Kimi-K2.5
  • Tool calling accuracy has been improved
ollama launch openclaw --model kimi-k2.5

Ollama is now a provider in OpenClaw

Ollama can now be selected as an authentication and model provider during OpenClaw onboarding (thanks @BruceMacD for contributing and @steipete for reviewing!)

openclaw onboard --auth-choice ollama

More information: https://docs.openclaw.ai/providers/ollama

Nemotron-3-Super

Nemotron-3-Super: is a new 122B parameter model with strong reasoning and tool calling capability, while having top performance when run on modern hardware:

  • ollama run nemotron-3-super:cloud
  • ollama run nemotron-3-super to run locally (requires 96GB+ of VRAM)

Nemotron-3-Super scores highest of any open model on PinchBench, a benchmark suite that measures how successful models are at completing tasks when used with OpenClaw.

ollama launch openclaw --model nemotron-3-super:cloud

Or using OpenClaw’s onboarding:

openclaw onboard \
	--auth-choice ollama \
	--custom-model-id nemotron-3-super:cloud

Non-interactive task support

ollama launch now supports non-interactive tasks by passing in --yes. This enables using Claude, Codex, Pi and more in scripts, GitHub Actions, and other non-interactive environments.

ollama launch claude \
	--model glm-5:cloud \
	--yes \
	-- "Do a quick code review of this pull request and respond on GitHub with a comment summarizing your feedback."

Lower latency on MiniMax-M2.5 and Qwen3.5 on Ollama’s cloud

For customers in North America, MiniMax-M2.5 and Qwen3.5 on Ollama’s cloud now respond much faster, up to 10x and up to 2x faster respectively, and often in less than a second. This is ideal for tasks that require a fast Time To First Token (TTFT) when needing quick answers from OpenClaw or quick back-to-back coding tasks.

ollama launch claude --model minimax-m2.5

Driver updates required for ROCm 7

This version of Ollama ships with ROCm 7, and requires updating drivers to the latest version for continued support.

What’s Changed

  • Ollama’s cloud models no longer require downloading via ollama pull. Setting :cloud as a tag will now automatically connect to cloud models.
  • New --yes flag for ollama launch that skips all prompts, making it possible to run AI assistants and other tools in non-interactive environments
  • Fixed issue where “Reset to Defaults” in Ollama’s app would disable downloading automatic updates.
  • Ollama will now ensure context compaction occurs at the correct context length for each model when using ollama launch claude

New Contributors

Full Changelog: https://github.com/ollama/ollama/compare/v0.17.7…v0.18.0

Update Mar 10, 2026 tracked by Updatify

v0.17.8

What’s Changed

Full Changelog: https://github.com/ollama/ollama/compare/v0.17.7…v0.17.8-rc1

Update Mar 2, 2026 tracked by Updatify

v0.17.5

New models

  • Qwen3.5: the small Qwen 3.5 model series is now available in 0.8B, 2B, 4B and 9B parameter sizes.

What’s Changed

  • Fixed crash in Qwen 3.5 models when split over GPU & CPU
  • Fixed issue where Qwen 3.5 models would repeat themselves due to no presence penalty (note: you may have to redownload the qwen3.5 models: ollama pull qwen3.5:35b for example)
  • ollama run --verbose will now show peak memory usage when using Ollama’s MLX engine
  • Fixed memory issues and crashes in MLX runner
  • Fixed issue where Ollama would not be able to run models imported from Qwen3.5 GGUF files

Full Changelog: https://github.com/ollama/ollama/compare/v0.17.4…v0.17.5

Update Feb 27, 2026 tracked by Updatify

v0.17.4

New models

  • Qwen 3.5: a family of open-source multimodal models that delivers exceptional utility and performance.
  • LFM 2: LFM2 is a family of hybrid models designed for on-device deployment. LFM2-24B-A2B is the largest model in the family, scaling the architecture to 24 billion parameters while keeping inference efficient.

Note: for users on 0.17.1, this version will not automatically update. Re-downloading is required to receive the latest version of Ollama.

What’s Changed

  • Tool call indices will now be included in parallel tool calls

Full Changelog: https://github.com/ollama/ollama/compare/v0.17.3…v0.17.4

Update Feb 24, 2026 tracked by Updatify

v0.17.1

What’s Changed

  • Nemotron architecture support in Ollama’s engine
  • MLX engine now has improved memory usage
  • Ollama’s app will now allow models that support tools to use web search capabilities
  • Improved LFM2 and LFM2.5 models in Ollama’s engine
  • ollama create will no longer default to affine quantization for unquantized models when using the MLX engine
  • Added configuration for disabling automatic update downloading

Full Changelog: https://github.com/ollama/ollama/compare/v0.17.0…v0.17.1

Update Feb 21, 2026 tracked by Updatify

v0.17.0

OpenClaw

OpenClaw can now be installed and configured automatically via Ollama, making it the easiest way to get up and running with OpenClaw with open models like Kimi-K2.5, GLM-5, and Minimax-M2.5.

Get started

ollama launch openclaw

oc1

Web search in OpenClaw

When using cloud models, websearch is enabled - allowing OpenClaw to search the internet.

oc3

What’s Changed

  • Improved tokenizer performance
  • Ollama’s macOS and Windows apps will now default to a context length based on available VRAM

New Contributors

Full Changelog: https://github.com/ollama/ollama/compare/v0.16.3…v0.17.0

Update Feb 14, 2026 tracked by Updatify

v0.16.2

What’s Changed

  • ollama launch claude now supports searching the web when using :cloud models
  • Fixed rendering issue when running ollama in PowerShell
  • New setting in Ollama’s app makes it easier to disable cloud models for sensitive and private tasks where data cannot leave your computer. For Linux or when running ollama serve manually, set OLLAMA_NO_CLOUD=1.
  • Fixed issue where experimental image generation models would not run in 0.16.0 and 0.16.1

Full Changelog: https://github.com/ollama/ollama/compare/v0.16.1…v0.16.2-rc0

Update Feb 12, 2026 tracked by Updatify

v0.16.0

New models

  • GLM-5: A strong reasoning and agentic model from Z.ai with 744B total parameters (40B active), built for complex systems engineering and long-horizon tasks.
  • MiniMax-M2.5: a new state-of-the-art large language model designed for real-world productivity and coding tasks.

New ollama

The new ollama command makes it easy to launch your favorite apps with models using Ollama

Ollama screenshot 2026-02-12 at 04 48 55@2x

What’s Changed

  • Launch Pi with ollama launch pi
  • Improvements to Ollama’s MLX runner to support GLM-4.7-Flash
  • Ctrl+G will now allow for editing text prompts in a text editor when running a model

Full Changelog: https://github.com/ollama/ollama/compare/v0.15.6…v0.16.0

Update Feb 3, 2026 tracked by Updatify

v0.15.5

New models

  • Qwen3-Coder-Next: a coding-focused language model from Alibaba’s Qwen team, optimized for agentic coding workflows and local development.
  • GLM-OCR: GLM-OCR is a multimodal OCR model for complex document understanding, built on the GLM-V encoder–decoder architecture.

Improvements to ollama launch

  • ollama launch can now be provided arguments, for example ollama launch claude -- --resume
  • ollama launch will now work run subagents when using ollama launch claude
  • Ollama will now set context limits for a set of models when using ollama launch opencode

What’s Changed

  • Sub-agent support for ollama launch for planning, deep research, and similar tasks
  • ollama signin will now open a browser window to make signing in easier
  • Ollama will now default to the following context lengths based on VRAM:
    • < 24 GiB VRAM: 4,096 context
    • 24-48 GiB VRAM: 32,768 context
    • &gt;= 48 GiB VRAM: 262,144 context
  • GLM-4.7-Flash support on Ollama’s experimental MLX engine
  • ollama signin will now open the browser to the connect page
  • Fixed off by one error when using num_predict in the API
  • Fixed issue where tokens from a previous sequence would be returned when hitting num_predict

New Contributors

Full Changelog: https://github.com/ollama/ollama/compare/v0.15.4…v0.15.5

Update Feb 1, 2026 tracked by Updatify

v0.15.3

What’s Changed

New Contributors

Full Changelog: https://github.com/ollama/ollama/compare/v0.15.2…v0.15.3

Update Jan 24, 2026 tracked by Updatify

v0.15.1

What’s Changed

  • GLM-4.7-Flash performance and correctness improvements, fixing repetitive answers and tool calling quality
  • Fixed performance issues on macOS and arm64 Linux
  • Fixed issue where ollama launch would not detect claude and would incorrectly update opencode configurations

New Contributors

Full Changelog: https://github.com/ollama/ollama/compare/v0.15.0…v0.15.1

Update Jan 21, 2026 tracked by Updatify

v0.15.0

An image of Ollama building rapidly on the computer. Build with Ollama!

ollama launch

A new ollama launch command to use Ollama’s models with Claude Code, Codex, OpenCode, and Droid without separate configuration.

What’s Changed

  • New ollama launch command for Claude Code, Codex, OpenCode, and Droid
  • Fixed issue where creating multi-line strings with """ would not work when using ollama run
  • <kbd>Ctrl</kbd>+<kbd>J</kbd> and <kbd>Shift</kbd>+<kbd>Enter</kbd> now work for inserting newlines in ollama run
  • Reduced memory usage for GLM-4.7-Flash models

Update Jan 16, 2026 tracked by Updatify

v0.14.3

Ollama screenshot 2026-01-20 at 23 41 54@2x
  • Z-Image Turbo: 6 billion parameter text-to-image model from Alibaba’s Tongyi Lab. It generates high-quality photorealistic images.
  • Flux.2 Klein: Black Forest Labs’ fastest image-generation models to date.

New models

  • GLM-4.7-Flash: As the strongest model in the 30B class, GLM-4.7-Flash offers a new option for lightweight deployment that balances performance and efficiency.
  • LFM2.5-1.2B-Thinking: LFM2.5 is a new family of hybrid models designed for on-device deployment.

What’s Changed

  • Fixed issue where Ollama’s macOS app would interrupt system shutdown
  • Fixed ollama create and ollama show commands for experimental models
  • The /api/generate API can now be used for image generation
  • Fixed minor issues in Nemotron-3-Nano tool parsing
  • Fixed issue where removing an image generation model would cause it to first load
  • Fixed issue where ollama rm would only stop the first model in the list if it were running

Full Changelog: https://github.com/ollama/ollama/compare/v0.14.2…v0.14.3

Update Jan 16, 2026 tracked by Updatify

v0.14.2

New models

  • TranslateGemma: A new collection of open translation models built on Gemma 3, helping people communicate across 55 languages.

What’s Changed

  • <kbd>Shift</kbd> + <kbd>Enter</kbd> (or <kbd>Ctrl</kbd> + <kbd>j</kbd>) will now enter a newline in Ollama’s CLI
  • Improve /v1/responses API to better confirm to OpenResponses specification

New Contributors

Full Changelog: https://github.com/ollama/ollama/compare/v0.14.1…v0.14.2

Update Jan 14, 2026 tracked by Updatify

v0.14.1

Image generation models (experimental)

Experimental image generation models are available for macOS and Linux (CUDA) in Ollama:

Available models

ollama run x/z-image-turbo

Note: x is a username on ollama.com where experimental models are uploaded

More models coming soon:

  1. Qwen-Image-2512
  2. Qwen-Image-Edit-2511
  3. GLM-Image

What’s Changed

  • fix macOS auto-update signature verification failure

New Contributors

Full Changelog: https://github.com/ollama/ollama/compare/v0.14.0…v0.14.1

Update Jan 10, 2026 tracked by Updatify

v0.14.0

What’s Changed

  • ollama run --experimental CLI will now open a new Ollama CLI that includes an agent loop and the bash tool
  • Anthropic API compatibility: support for the /v1/messages API
  • A new REQUIRES command for the Modelfile allows declaring which version of Ollama is required for the model
  • For older models, Ollama will avoid an integer underflow on low VRAM systems during memory estimation
  • More accurate VRAM measurements for AMD iGPUs
  • Ollama’s app will now highlight swift source code
  • An error will now return when embeddings return NaN or -Inf
  • Ollama’s Linux install bundles files now use zst compression
  • New experimental support for image generation models, powered by MLX

New Contributors

Full Changelog: https://github.com/ollama/ollama/compare/v0.13.5…v0.14.0-rc2

Update Dec 18, 2025 tracked by Updatify

v0.13.5

New Models

  • Google’s FunctionGemma a specialized version of Google’s Gemma 3 270M model fine-tuned explicitly for function calling.

What’s Changed

  • bert architecture models now run on Ollama’s engine
  • Added built-in renderer & tool parsing capabilities for DeepSeek-V3.1
  • Fixed issue where nested properties in tools may not have been rendered properly

New Contributors

Full Changelog: https://github.com/ollama/ollama/compare/v0.13.4…v0.13.5

Update Dec 13, 2025 tracked by Updatify

v0.13.4

New Models

  • Nemotron 3 Nano: A new Standard for Efficient, Open, and Intelligent Agentic Models
  • Olmo 3 and Olmo 3.1: A series of Open language models designed to enable the science of language models. These models are pre-trained on the Dolma 3 dataset and post-trained on the Dolci datasets.

What’s Changed

  • Enable Flash Attention automatically for models by default
  • Fixed handling of long contexts with Gemma 3 models
  • Fixed issue that would occur with Gemma 3 QAT models or other models imported with the Gemma 3 architecture

New Contributors

Full Changelog: https://github.com/ollama/ollama/compare/v0.13.3…v0.13.4-rc0

Update Dec 9, 2025 tracked by Updatify

v0.13.3

New models

  • Devstral-Small-2: 24B model that excels at using tools to explore codebases, editing multiple files and power software engineering agents.
  • rnj-1: Rnj-1 is a family of 8B parameter open-weight, dense models trained from scratch by Essential AI, optimized for code and STEM with capabilities on par with SOTA open-weight models.
  • nomic-embed-text-v2: nomic-embed-text-v2-moe is a multilingual MoE text embedding model that excels at multilingual retrieval.

What’s Changed

  • Improved truncation logic when using /api/embed and /v1/embeddings
  • Extend Gemma 3 architecture to support rnj-1 model
  • Fix error that would occur when running qwen2.5vl with image input

Full Changelog: https://github.com/ollama/ollama/compare/v0.13.2…v0.13.3

Update Dec 4, 2025 tracked by Updatify

v0.13.2

New models

  • Qwen3-Next: The first installment in the Qwen3-Next series with strong performance in terms of both parameter efficiency and inference speed.

What’s Changed

  • Flash attention is now enabled by default for vision models such as mistral-3, gemma3, qwen3-vl and more. This improves memory utilization and performance when providing images as input.
  • Fixed GPU detection on multi-GPU CUDA machines
  • Fixed issue where deepseek-v3.1 would always think even with thinking is disabled in Ollama’s app

New Contributors

Full Changelog: https://github.com/ollama/ollama/compare/v0.13.1…v0.13.2

Update Nov 27, 2025 tracked by Updatify

v0.13.1

New models

  • Ministral-3: The Ministral 3 family is designed for edge deployment, capable of running on a wide range of hardware.
  • Mistral-Large-3: A general-purpose multimodal mixture-of-experts model for production-grade tasks and enterprise workloads.

What’s Changed

  • nomic-embed-text will now use Ollama’s engine by default
  • Tool calling support for cogito-v2.1
  • Fixed issues with CUDA VRAM discovery
  • Fixed link to docs in Ollama’s app
  • Fixed issue where models would be evicted on CPU-only systems
  • Ollama will now better render errors instead of showing Unmarshal: errors
  • Fixed issue where CUDA GPUs would fail to be detected with older GPUs
  • Added thinking and tool parsing for cogito-v2.1

New Contributors

Full Changelog: https://github.com/ollama/ollama/compare/v0.13.0…v0.13.1

Update Nov 19, 2025 tracked by Updatify

v0.13.0

New models

  • DeepSeek-OCR: DeepSeek-OCR uses optical 2D mapping to compress long contexts, achieving high OCR precision with reduced vision tokens and demonstrating practical value in document processing.
  • Cogito-V2.1: instruction tuned generative models, currently the best open-weight LLM by a US company

DeepSeek-OCR

DeepSeek-OCR is now available on Ollama. Example inputs:

ollama run deepseek-ocr "/path/to/image\n<|grounding|>Given the layout of the image."
ollama run deepseek-ocr "/path/to/image\nFree OCR."
ollama run deepseek-ocr "/path/to/image\nParse the figure."
ollama run deepseek-ocr "/path/to/image\nExtract the text in the image."
ollama run deepseek-ocr "/path/to/image\n<|grounding|>Convert the document to markdown."

New bench tool

Ollama’s GitHub repo now includes a bench tool that can be used to test model performance. For the time being this is a separate tool that can be built in the Ollama GitHub repository:

First, install Go. Then from the root of the Ollama repository run:

go run ./cmd/bench -model gpt-oss:20b

For more information see the tool’s documentation

What’s Changed

  • DeepSeek-OCR is now supported
  • DeepSeek-V3.1 architecture is now supported in Ollama’s engine
  • Fixed performance issues that arose in Ollama 0.12.11 on CUDA
  • Fixed issue where Linux install packages were missing required Vulkan libraries
  • Improved CPU and memory detection while in containers/cgroups
  • Improved VRAM information detection for AMD GPUs
  • Improved KV cache performance to no longer require defragmentation

New Contributors

Full Changelog: https://github.com/ollama/ollama/compare/v0.12.11…v0.13.0

Update Nov 12, 2025 tracked by Updatify

v0.12.11

Logprobs

Ollama’s API and OpenAI-compatible API now support log probabilities. Log probabilities of output tokens indicate the likelihood of each token occurring in the sequence given the context. This is useful for different use cases:

  1. Classification tasks
  2. Retrieval (Q&A) evaluation
  3. Autocomplete
  4. Token highlighting and outputting bytes
  5. Calculating perplexity

To enable Logprobs, provide "logprobs": true to Ollama’s API:

curl http://localhost:11434/api/generate -d '{
  "model": "gemma3",
  "prompt": "Why is the sky blue?",
  "logprobs": true
}'

When log probabilities are requested, response chunks will now include a "logprobs" field with the token, log probability and raw bytes (for partial unicode).

{
  "model": "gemma3",
  "created_at": "2025-11-14T22:17:56.598562Z",
  "response": "Okay",
  "done": false,
  "logprobs": [
    {
      "token": "Okay",
      "logprob": -1.3434503078460693,
      "bytes": [
        79,
        107,
        97,
        121
      ]
    }
  ]
}

top_logprobs

When setting "top_logprobs", a number of most-likely tokens are also provided, making it possible to introspect alternative tokens. Below is an example request.

curl http://localhost:11434/api/generate -d '{
  "model": "gemma3",
  "prompt": "Why is the sky blue?",
  "logprobs": true,
  "top_logprobs": 3
}'

This will generate a stream of response chunks with the following fields:

{
  "model": "gemma3",
  "created_at": "2025-11-14T22:26:10.466324Z",
  "response": "The",
  "done": false,
  "logprobs": [
    {
      "token": "The",
      "logprob": -0.8361086845397949,
      "bytes": [
        84,
        104,
        101
      ],
      "top_logprobs": [
        {
          "token": "The",
          "logprob": -0.8361086845397949,
          "bytes": [
            84,
            104,
            101
          ]
        },
        {
          "token": "Okay",
          "logprob": -1.2590975761413574,
          "bytes": [
            79,
            107,
            97,
            121
          ]
        },
        {
          "token": "That",
          "logprob": -1.2686877250671387,
          "bytes": [
            84,
            104,
            97,
            116
          ]
        }
      ]
    }
  ]
}

Special thanks

Thank you @baptistejamin for adding Logprobs to Ollama’s API.

Vulkan support (opt-in)

Ollama 0.12.11 includes support for Vulkan acceleration. Vulkan brings support for a broad range of GPUs from AMD, Intel, and iGPUs. Vulkan support is not yet enabled by default, and requires opting in by running Ollama with a custom environment variable:

OLLAMA_VULKAN=1 ollama serve

On Powershell, use:

$env:OLLAMA_VULKAN="1"
ollama serve

For issues or feedback on using Vulkan with Ollama, create an issue labelled Vulkan and make sure to include server logs where possible to aid in debugging.

What’s Changed

  • Ollama’s API and the OpenAI-compatible API now supports Logprobs
  • Ollama’s new app now supports WebP images
  • Improved rendering performance in Ollama’s new app, especially when rendering code
  • The "required" field in tool definitions will now be omitted if not specified
  • Fixed issue where "tool_call_id" would be omitted when using the OpenAI-compatible API.
  • Fixed issue where ollama create would import data from both consolidated.safetensors and other safetensor files.
  • Ollama will now prefer dedicated GPUs over iGPUs when scheduling models
  • Vulkan can now be enabled by setting OLLAMA_VULKAN=1. For example: OLLAMA_VULKAN=1 ollama serve

New Contributors

Full Changelog: https://github.com/ollama/ollama/compare/v0.12.10…v0.12.11

Update Nov 5, 2025 tracked by Updatify

v0.12.10

ollama run now works with embedding models

ollama run can now run embedding models to generate vector embeddings from text:

ollama run embeddinggemma "Hello world"

Content can also be provided to ollama run via standard input:

echo "Hello world" | ollama run embeddinggemma

What’s Changed

  • Fixed errors when running qwen3-vl:235b and qwen3-vl:235b-instruct
  • Enable flash attention for Vulkan (currently needs to be built from source)
  • Add Vulkan memory detection for Intel GPU using DXGI+PDH
  • Ollama will now return tool call IDs from the /api/chat API
  • Fixed hanging due to CPU discovery
  • Ollama will now show login instructions when switching to a cloud model in interactive mode
  • Fix reading stale VRAM data
  • ollama run now works with embedding models

New Contributors

Full Changelog: https://github.com/ollama/ollama/compare/v0.12.9…v0.12.10

Update Oct 30, 2025 tracked by Updatify

v0.12.8

Ollama_halloween_background

What’s Changed

  • qwen3-vl performance improvements, including flash attention support by default
  • qwen3-vl will now output less leading whitespace in the response when thinking
  • Fixed issue where deepseek-v3.1 thinking could not be disabled in Ollama’s new app
  • Fixed issue where qwen3-vl would fail to interpret images with transparent backgrounds
  • Ollama will now stop running a model before removing it via ollama rm
  • Fixed issue where prompt processing would be slower on Ollama’s engine
  • Ignore unsupported iGPUs when doing device discovery on Windows

New Contributors

Full Changelog: https://github.com/ollama/ollama/compare/v0.12.7…v0.12.8

Update Oct 29, 2025 tracked by Updatify

v0.12.7

Ollama screenshot 2025-10-29 at 13 56 55@2x

New models

  • Qwen3-VL: Qwen3-VL is now available in all parameter sizes ranging from 2B to 235B
  • MiniMax-M2: a 230 Billion parameter model built for coding & agentic workflows available on Ollama’s cloud

Add files and adjust thinking levels in Ollama’s new app

Ollama’s new app now includes a way to add one or many files when prompting the model:

Screenshot 2025-10-29 at 2 16 55 PM

For better responses, thinking levels can now be adjusted for the gpt-oss models:

Screenshot 2025-10-29 at 2 12 33 PM

New API documentation

New API documentation is available for Ollama’s API: https://docs.ollama.com/api

Screenshot 2025-10-29 at 4 02 53 PM

What’s Changed

  • Model load failures now include more information on Windows
  • Fixed embedding results being incorrect when running embeddinggemma
  • Fixed gemma3n on Vulkan backend
  • Increased time allocated for ROCm to discover devices
  • Fixed truncation error when generating embeddings
  • Fixed request status code when running cloud models
  • The OpenAI-compatible /v1/embeddings endpoint now supports encoding_format parameter
  • Ollama will now parse tool calls that don’t conform to {"name": name, "arguments": args} (thanks @rick-github!)
  • Fixed prompt processing reporting in the llama runner
  • Increase speed when scheduling models
  • Fixed issue where FROM <model> would not inherit RENDERER or PARSER commands

New Contributors

Full Changelog: https://github.com/ollama/ollama/compare/v0.12.6…v0.12.7

Update Oct 15, 2025 tracked by Updatify

v0.12.6

What’s Changed

  • Ollama’s app now supports searching when running DeepSeek-V3.1, Qwen3 and other models that support tool calling.
  • Flash attention is now enabled by default for Gemma 3, improving performance and memory utilization
  • Fixed issue where Ollama would hang while generating responses
  • Fixed issue where qwen3-coder would act in raw mode when using /api/generate or ollama run qwen3-coder <prompt>
  • Fixed qwen3-embedding providing invalid results
  • Ollama will now evict models correctly when num_gpu is set
  • Fixed issue where tool_index with a value of 0 would not be sent to the model

Experimental Vulkan Support

Experimental support for Vulkan is now available when you build locally from source. This will enable additional GPUs from AMD, and Intel which are not currently supported by Ollama. To build locally, install the Vulkan SDK and set VULKAN_SDK in your environment, then follow the developer instructions. In a future release, Vulkan support will be included in the binary release as well. Please file issues if you run into any problems.

New Contributors

Full Changelog: https://github.com/ollama/ollama/compare/v0.12.5…v0.12.6

Update Oct 10, 2025 tracked by Updatify

v0.12.5

What’s Changed

  • Thinking models now support structured outputs when using the /api/chat API
  • Ollama’s app will now wait until Ollama is running to allow for a conversation to be started
  • Fixed issue where "think": false would show an error instead of being silently ignored
  • Fixed deepseek-r1 output issues
  • macOS 12 Monterey and macOS 13 Ventura are no longer supported
  • AMD gfx900 and gfx906 (MI50, MI60, etc) GPUs are no longer supported via ROCm. We’re working to support these GPUs via Vulkan in a future release.

New Contributors

Full Changelog: https://github.com/ollama/ollama/compare/v0.12.4…v0.12.5-rc0

Update Oct 3, 2025 tracked by Updatify

v0.12.4

What’s Changed

  • Flash attention is now enabled by default for Qwen 3 and Qwen 3 Coder
  • Fixed minor memory estimation issues when scheduling models on NVIDIA GPUs
  • Fixed an issue where keep_alive in the API would accept different values for the /api/chat and /api/generate endpoints
  • Fixed tool calling rendering with qwen3-coder
  • More reliable and accurate VRAM detection
  • OLLAMA_FLASH_ATTENTION can now be overridden to 0 for models that have flash attention enabled by default
  • macOS 12 Monterey and macOS 13 Ventura are no longer supported
  • Fixed crash where templates were not correctly defined
  • Fix memory calculations on NVIDIA iGPUs
  • AMD gfx900 and gfx906 (MI50, MI60, etc) GPUs are no longer supported via ROCm. We’re working to support these GPUs via Vulkan in a future release.

New Contributors

Full Changelog: https://github.com/ollama/ollama/compare/v0.12.3…v0.12.4-rc3

Update Sep 26, 2025 tracked by Updatify

v0.12.3

New models

  • DeepSeek-V3.1-Terminus: DeepSeek-V3.1-Terminus is a hybrid model that supports both thinking mode and non-thinking mode. It delivers more stable & reliable outputs across benchmarks compared to the previous version:

    Run on Ollama’s cloud:

    ollama run deepseek-v3.1:671b-cloud

    Run locally (requires 500GB+ of VRAM)

    ollama run deepseek-v3.1
  • Kimi-K2-Instruct-0905: Kimi K2-Instruct-0905 is the latest, most capable version of Kimi K2. It is a state-of-the-art mixture-of-experts (MoE) language model, featuring 32 billion activated parameters and a total of 1 trillion parameters.

    ollama run kimi-k2:1t-cloud

What’s Changed

  • Fixed issue where tool calls provided as stringified JSON would not be parsed correctly
  • ollama push will now provide a URL to follow to sign in
  • Fixed issues where qwen3-coder would output unicode characters incorrectly
  • Fix issue where loading a model with /load would crash

New Contributors

Full Changelog: https://github.com/ollama/ollama/compare/v0.12.2…v0.12.3

Update Sep 24, 2025 tracked by Updatify

v0.12.2

Web search

ollama_web_search

A new web search API is now available in Ollama. Ollama provides a generous free tier of web searches for individuals to use, and higher rate limits are available via Ollama’s cloud. This web search capability can augment models with the latest information from the web to reduce hallucinations and improve accuracy.

What’s Changed

  • Models with Qwen3’s architecture including MoE now run in Ollama’s new engine
  • Fixed issue where built-in tools for gpt-oss were not being rendered correctly
  • Support multi-regex pretokenizers in Ollama’s new engine
  • Ollama’s new engine can now load tensors by matching a prefix or suffix

Full Changelog: https://github.com/ollama/ollama/compare/v0.12.1…v0.12.2

Update Sep 21, 2025 tracked by Updatify

v0.12.1

New models

What’s Changed

  • Qwen3-Coder now supports tool calling
  • Ollama’s app will now longer show “connection lost” in error when connecting to cloud models
  • Fixed issue where Gemma3 QAT models would not output correct tokens
  • Fix issue where & characters in Qwen3-Coder would not be parsed correctly when function calling
  • Fixed issues where ollama signin would not work properly on Linux

Full Changelog: https://github.com/ollama/ollama/compare/v0.12.0…v0.12.1

Update Sep 18, 2025 tracked by Updatify

v0.12.0

Cloud models

Ollama_cloud_background

Cloud models are now available in preview, allowing you to run a group of larger models with fast, datacenter-grade hardware.

To run a cloud model, use:

ollama run qwen3-coder:480b-cloud

What’s Changed

  • Models with the Bert architecture now run on Ollama’s engine
  • Models with the Qwen 3 architecture now run on Ollama’s engine
  • Fix issue where older NVIDIA GPUs would not be detected if newer drivers were installed
  • Fixed issue where models would not be imported correctly with ollama create
  • Ollama will skip parsing the initial <think> if provided in the prompt for /api/generate by @rick-github

New Contributors

Full Changelog: https://github.com/ollama/ollama/compare/v0.11.11…v0.12.0

Update Sep 11, 2025 tracked by Updatify

v0.11.11

What’s Changed

New Contributors

Full Changelog: https://github.com/ollama/ollama/compare/v0.11.10…v0.11.11

Update Sep 2, 2025 tracked by Updatify

v0.11.9

What’s Changed

  • Improved performance via overlapping GPU and CPU computations
  • Fixed issues where unrecognized AMD GPU would cause an error
  • Reduce crashes due to unhandled errors in some Mac and Linux installations of Ollama

New Contributors

Full Changelog: https://github.com/ollama/ollama/compare/v0.11.8…v0.11.9-rc0

Update Aug 25, 2025 tracked by Updatify

v0.11.7

DeepSeek-V3.1

DeepSeek-V3.1 is now available to run via Ollama.

This model supports hybrid thinking, meaning thinking can be enabled or disabled by setting think in Ollama’s API:

curl http://localhost:11434/api/chat -d '{
  "model": "deepseek-v3.1",
  "messages": [
    {
      "role": "user",
      "content": "why is the sky blue?"
    }
  ],
  "think": true
}'

In Ollama’s CLI, thinking can be enabled or disabled by running the /set think or /set nothink commands.

Turbo (in preview)

DeepSeek-V3.1 has over 671B parameters, and so a large amount of VRAM is required to run it. Ollama’s Turbo mode (in preview) provides access to powerful hardware in the cloud you can use to run the model.

Turbo via Ollama’s app

Screenshot 2025-08-25 at 1 23 37 PM
  1. Download Ollama for macOS or Windows
  2. Select deepseek-v3.1:671b from the model selector
  3. Enable Turbo

Turbo via Ollama’s CLI and libraries

  1. Create an account on ollama.com/signup
  2. Follow the docs for Ollama’s CLI to upload authenticate your Ollama installation
  3. Run the following:
OLLAMA_HOST=ollama.com ollama run deepseek-v3.1

For instructions on using Turbo with Ollama’s Python and JavaScript library, see the docs

What’s Changed

  • Fixed issue where multiple models would not be loaded on CPU-only systems
  • Ollama will now work with models who skip outputting the initial<think> tag (e.g. DeepSeek-V3.1)
  • Fixed issue where text would be emitted when there is no opening <think> tag from a model
  • Fixed issue where tool calls containing { or } would not be parsed correctly

New Contributors

Full Changelog: https://github.com/ollama/ollama/compare/v0.11.6…v0.11.7

Update Aug 15, 2025 tracked by Updatify

v0.11.5

What’s Changed

  • Performance improvements for the gpt-oss models
  • New memory management: this release of Ollama includes improved memory management for scheduling models on GPUs, leading to better VRAM utilization, model performance and less out of memory errors. These new memory estimations can be enabled with OLLAMA_NEW_ESTIMATES=1 ollama serve and will soon be enabled by default.
  • Improved multi-GPU scheduling and reduced VRAM allocation when using more than 2 GPUs
  • Ollama’s new app will now remember default selections for default model, Turbo and Web Search between restarts
  • Fix error when parsing bad harmony tool calls
  • OLLAMA_FLASH_ATTENTION=1 will also enable flash attention for pure-CPU models
  • Fixed OpenAI-compatible API not supporting reasoning_effort
  • Reduced size of installation on Windows and Linux

New Contributors

Full Changelog: https://github.com/ollama/ollama/compare/v0.11.4…v0.11.5

Update Aug 7, 2025 tracked by Updatify

v0.11.4

What’s Changed

New Contributors

Full Changelog: https://github.com/ollama/ollama/compare/v0.11.3…v0.11.4

Update Aug 5, 2025 tracked by Updatify

v0.11.0

ollama OpenAI gpt-oss

Welcome OpenAI’s gpt-oss models

Ollama partners with OpenAI to bring its latest state-of-the-art open weight models to Ollama. The two models, 20B and 120B, bring a whole new local chat experience, and are designed for powerful reasoning, agentic tasks, and versatile developer use cases.

Feature highlights

  • Agentic capabilities: Use the models’ native capabilities for function calling, web browsing (Ollama is providing a built-in web search that can be optionally enabled to augment the model with the latest information), python tool calls, and structured outputs.
  • Full chain-of-thought: Gain complete access to the model’s reasoning process, facilitating easier debugging and increased trust in outputs.
  • Configurable reasoning effort: Easily adjust the reasoning effort (low, medium, high) based on your specific use case and latency needs.
  • Fine-tunable: Fully customize models to your specific use case through parameter fine-tuning.
  • Permissive Apache 2.0 license: Build freely without copyleft restrictions or patent risk—ideal for experimentation, customization, and commercial deployment.

Quantization - MXFP4 format

OpenAI utilizes quantization to reduce the memory footprint of the gpt-oss models. The models are post-trained with quantization of the mixture-of-experts (MoE) weights to MXFP4 format, where the weights are quantized to 4.25 bits per parameter. The MoE weights are responsible for 90+% of the total parameter count, and quantizing these to MXFP4 enables the smaller model to run on systems with as little as 16GB memory, and the larger model to fit on a single 80GB GPU.

Ollama is supporting the MXFP4 format natively without additional quantizations or conversions. New kernels are developed for Ollama’s new engine to support the MXFP4 format.

Ollama collaborated with OpenAI to benchmark against their reference implementations to ensure Ollama’s implementations have the same quality.

Get started

You can get started by downloading the latest Ollama version (v0.11)

The model can be downloaded directly in Ollama’s new app or via the terminal:

ollama run gpt-oss:20b

ollama run gpt-oss:120b

What’s Changed

Full Changelog: https://github.com/ollama/ollama/compare/v0.10.1…v0.11.0

Update Jul 18, 2025 tracked by Updatify

v0.10.0

Ollama’s new app

Ollama’s new app is available for macOS and Windows: Download Ollama

ollama's new app

What’s Changed

  • ollama ps will now show the context length of loaded models
  • Improved performance in gemma3n models by 2-3x
  • Parallel request processing now defaults to 1. For more details, see the FAQ
  • Fixed issue where tool calling would not work correctly with granite3.3 and mistral-nemo models
  • Fixed issue where Ollama’s tool calling would not work correctly if a tool’s name was part of of another one, such as add and get_address
  • Improved performance when using multiple GPUs by 10-30%
  • Ollama’s OpenAI-compatible API will now support WebP images
  • Fixed issue where ollama show would report an error
  • ollama run will more gracefully display errors

New Contributors

Full Changelog: https://github.com/ollama/ollama/compare/v0.9.6…v0.10.0

Update Jul 2, 2025 tracked by Updatify

v0.9.5

Updates to Ollama for macOS and Windows

A new version of Ollama’s macOS and Windows applications are now available. New improvements to the apps will be introduced over the coming releases:

Screenshot 2025-07-01 at 9 53 31 AM

New features

Expose Ollama on the network

Ollama can now be exposed on the network, allowing others to access Ollama on other devices or even over the internet. This is useful for having Ollama running on a powerful Mac, PC or Linux computer while making it accessible to less powerful devices.

Model directory

The directory in which models are stored can now be modified! This allows models to be stored on external hard disks or alternative directories than the default.

Smaller footprint and faster starting on macOS

The macOS app is now a native application and starts much faster while requiring a much smaller installation.

Additional changes in 0.9.5

  • Fixed issue where the ollama CLI would not be installed by Ollama on macOS on startup
  • Fixed issue where files in ollama-darwin.tgz were not notarized
  • Add NativeMind to Community Integrations by @xukecheng in https://github.com/ollama/ollama/pull/11242
  • Ollama for macOS now requires version 12 (Monterey) or newer

New Contributors

Update Jun 27, 2025 tracked by Updatify

v0.9.4

Updates to Ollama for macOS and Windows

A new version of Ollama’s macOS and Windows applications are now available. New improvements to the apps will be introduced over the coming releases:

Screenshot 2025-07-01 at 9 53 31 AM

New features

Expose Ollama on the network

Ollama can now be exposed on the network, allowing others to access Ollama on other devices or even over the internet. This is useful for having Ollama running on a powerful Mac, PC or Linux computer while making it accessible to less powerful devices.

Model directory

The directory in which models are stored can now be modified! This allows models to be stored on external hard disks or alternative directories than the default.

Smaller footprint and faster starting on macOS

The macOS app is now a native application and starts much faster while requiring a much smaller installation.

What’s Changed

  • Reduced download size and startup time for Ollama on macOS
  • Tool calling with empty parameters will now work correctly
  • Fixed issue when quantizing models with the Gemma 3n architecture
  • Ollama for macOS should not longer ask for root privileges when updating unless required
  • Ollama for macOS now requires version 12 (Monterey) or newer

Full Changelog: https://github.com/ollama/ollama/compare/v0.9.3…v0.9.4

Update Jun 25, 2025 tracked by Updatify

v0.9.3

Gemma 3n

Gemma 3n

Ollama now supports Gemma 3n.

Gemma 3n models are designed for efficient execution on everyday devices such as laptops, tablets or phones. These models were trained with data in over 140 spoken languages.

Effective 2B

ollama run gemma3n:e2b

Effective 4B

ollama run gemma3n:e4b

What’s Changed

  • Fixed issue where errors would not be properly reported on Apple Silicon Macs
  • Ollama will now limit context length to what the model was trained against to avoid strange overflow behavior

New Contributors

Full Changelog: https://github.com/ollama/ollama/compare/v0.9.2…v0.9.3

Update Jun 9, 2025 tracked by Updatify

v0.9.1

Tool calling improvements

New tool calling support

The following models now support tool calling:

Tool calling reliability has also been improved for the following models:

To re-download the models, use ollama pull.

New Ollama for macOS and Windows preview

A new version of Ollama’s macOS and Windows applications are available to test for early feedback. New improvements to the apps will be introduced over the coming releases:

If you have feedback, please create an issue on GitHub with the app label. These apps will automatically update themselves to future versions of Ollama, so you may have to redownload new preview versions in the future.

Screenshot 2025-06-17 at 11 41 43 AM

New features

Expose Ollama on the network

Ollama can now be exposed on the network, allowing others to access Ollama on other devices or even over the internet. This is useful for having Ollama running on a powerful Mac, PC or Linux computer while making it accessible to less powerful devices.

Allow local browser access

Enabling this allows websites to access your local installation of Ollama. This is handy for developing browser-based applications using Ollama’s JavaScript library.

Model directory

The directory in which models are stored can now be modified! This allows models to be stored on external hard disks or alternative directories than the default.

Smaller footprint and faster starting on macOS

The macOS app is now a native application and starts much faster while requiring a much smaller installation.

What’s Changed

  • Magistral now supports disabling thinking mode. Note: it is also recommended to change the system prompt when doing so.
  • Error messages that previously showed POST predict will now be more informative
  • Improved tool calling reliability for some models
  • Fixed issue on Windows where ollama run would not start Ollama automatically

New Contributors

Full Changelog: https://github.com/ollama/ollama/compare/v0.9.0…v0.9.1

Update May 29, 2025 tracked by Updatify

v0.9.0

ollama thinking

New models

  • DeepSeek-R1-2508: DeepSeek-R1 has received a minor version upgrade to DeepSeek-R1-0528 for the 8 billion parameter distilled model and the full 671 billion parameter model. In this update, DeepSeek R1 has significantly improved its reasoning and inference capabilities.

Thinking

Ollama now has the ability to enable or disable thinking. This gives users the flexibility to choose the model’s thinking behavior for different applications and use cases.

When thinking is enabled, the output will separate the model’s thinking from the model’s output. When thinking is disabled, the model will not think and directly output the content.

Models that support thinking:

When running a model that supports thinking, Ollama will now display the model’s thoughts:

% ollama run deepseek-r1
>>> How many Rs are in strawberry
Thinking...
First, I need to understand what the question is asking. It's asking how many letters 'R' are present in the word "strawberry."

Next, I'll examine each letter in the word individually.

I'll start from the beginning and count every occurrence of the letter 'R.'

After reviewing all the letters, I determine that there are three instances where the letter 'R' appears in the word "strawberry."
...done thinking.

There are three **Rs** in the word **"strawberry"**.

In Ollama’s API, a model’s thinking is now returned as a separate thinking field for easy parsing:

{
  "message": {
    "role": "assistant",
    "thinking": "First, I need to understand what the question is asking. It's asking how many letters 'R' are present in the word "strawberry...",
    "content": "There are **3** instances of the letter **R** in the word **"strawberry."**"
  }
}

Turning thinking on and off

In the API, thinking can be enabled by passing "think": true and disabled by passing "think": false

curl http://localhost:11434/api/chat -d '{
  "model": "deepseek-r1",
  "messages": [
    {
      "role": "user",
      "content": "Why is the sky blue?"
    },
  ],
  "think": true
}'

In Ollama’s CLI, use /set think and /set nothink to enable and disable thinking.

What’s Changed

  • Add thinking support to Ollama

Full Changelog: https://github.com/ollama/ollama/compare/v0.8.0…v0.9.0

Update May 21, 2025 tracked by Updatify

v0.7.1

What’s Changed

  • Improved model memory management to allocate sufficient memory to prevent crashes when running multimodal models in certain situations
  • Enhanced memory estimation for models to prevent unintended memory offloading
  • ollama show will now show ... when data is truncated
  • Fixed crash that would occur with qwen2.5vl
  • Fixed crash on Nvidia’s CUDA for llama3.2-vision
  • Support for Alibaba’s Qwen 3 and Qwen 2 architectures in Ollama’s new multimodal engine

New Contributors

Full Changelog: https://github.com/ollama/ollama/compare/v0.7.0…v0.7.1

Update May 13, 2025 tracked by Updatify

v0.7.0

multimodality-uncompressed

Ollama now supports multimodal models via Ollama’s new engine, starting with new vision multimodal models:

What’s Changed

  • Ollama now supports providing WebP images as input to multimodal models
  • Fixed issue where a blank terminal window would appear when runnings models on Windows
  • Fixed error that would occur when running llama4 on NVIDIA GPUs
  • Reduced log level of key not found message
  • Ollama will now correct remove quotes from image paths when sending images as input with ollama run
  • Improved performance of importing safetensors models via ollama create
  • Improved prompt processing speeds of Qwen3 MoE on macOS
  • Fixed issue where providing large JSON schemas in structured output requests would result in an error
  • Ollama’s API will now return code 405 instead of 404 for methods that are not allowed
  • Fixed issue where ollama processes would continue to run after a model was unloaded

New Contributors

Full Changelog: https://github.com/ollama/ollama/compare/v0.6.8…v0.7.0

Update May 3, 2025 tracked by Updatify

v0.6.8

What’s Changed

  • Performance improvements for Qwen 3 MoE models (30b-a3b and 235b-a22b) on NVIDIA and AMD GPUs
  • Fixed GGML_ASSERT(tensor->op == GGML_OP_UNARY) failed issue caused by conflicting installations
  • Fixed a memory leak that occurred when providing images as input
  • ollama show will now correctly label older vision models such as llava
  • Reduced out of memory errors by improving worst-case memory estimations
  • Fix issue that resulted in a context canceled error

New Contributors

Full Changelog: https://github.com/ollama/ollama/compare/v0.6.7…v0.6.8

Update Apr 26, 2025 tracked by Updatify

v0.6.7

New models

  • Qwen 3: Qwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models.
  • Phi 4 reasoning and Phi-4-mini-reasoning: New state-of-the-art reasoning models from Microsoft
  • Llama 4: state-of-the-art multi-modal models from Meta

What’s Changed

  • Add support for Meta’s Llama 4 multimodal models
  • Add support for Microsoft’s Phi 4 reasoning models, and Phi 4 mini reasoning model
  • Increased default context window to 4096 tokens
  • Fixed issue where image paths would not be recognized with ~ when being provided to ollama run
  • Improved output quality when using JSON mode in certain scenarios
  • Fixed tensor->op == GGML_OP_UNARY errors when running a model due to conflicting inference libraries
  • Fixed issue where model would be stuck in the Stopping... state

New Contributors

Full Changelog: https://github.com/ollama/ollama/compare/v0.6.6…v0.6.7