Update Jun 3, 2026 tracked by Updatify

v0.30.4

New models

  • Nemotron-3-Ultra: NVIDIA Nemotron 3 Ultra is built for high-throughput reasoning and long-running agent workflows.

What’s Changed

  • Fixed multimodal models not using GPU on the llama.cpp backend can now use Metal GPU offload on Apple Silicon, improving multimodal performance on supported Macs.
  • ollama create --experimental now respects REQUIRES in Modelfiles for MLX-based models.
  • ollama launch codex now cleans up old conflicting Codex profile config before launching.
  • ollama launch pi now migrates users from the legacy Pi package to the official package and preserves the correct npm install prefix.
  • Pi web search setup now updates only when a newer package is available.
  • Windows cleanup now terminates the llama.cpp backend more reliably.
  • Updated the llama.cpp backend.

Known Issues

  • gemma4:12b crashes with floating point exception

Full Changelog: https://github.com/ollama/ollama/compare/v0.30.3…v0.30.4