Update Jun 3, 2026 tracked by Updatify
v0.30.4
New models
- Nemotron-3-Ultra: NVIDIA Nemotron 3 Ultra is built for high-throughput reasoning and long-running agent workflows.
What’s Changed
- Fixed multimodal models not using GPU on the llama.cpp backend can now use Metal GPU offload on Apple Silicon, improving multimodal performance on supported Macs.
-
ollama create --experimentalnow respectsREQUIRESin Modelfiles for MLX-based models. -
ollama launch codexnow cleans up old conflicting Codex profile config before launching. -
ollama launch pinow migrates users from the legacy Pi package to the official package and preserves the correct npm install prefix. - Pi web search setup now updates only when a newer package is available.
- Windows cleanup now terminates the llama.cpp backend more reliably.
- Updated the llama.cpp backend.
Known Issues
-
gemma4:12bcrashes with floating point exception
Full Changelog: https://github.com/ollama/ollama/compare/v0.30.3…v0.30.4