Update Aug 15, 2025 tracked by Updatify
v0.11.5
What’s Changed
-
Performance improvements for the
gpt-ossmodels -
New memory management: this release of Ollama includes improved memory management for scheduling models on GPUs, leading to better VRAM utilization, model performance and less out of memory errors. These new memory estimations can be enabled with
OLLAMA_NEW_ESTIMATES=1 ollama serveand will soon be enabled by default. - Improved multi-GPU scheduling and reduced VRAM allocation when using more than 2 GPUs
- Ollama’s new app will now remember default selections for default model, Turbo and Web Search between restarts
- Fix error when parsing bad harmony tool calls
-
OLLAMA_FLASH_ATTENTION=1will also enable flash attention for pure-CPU models -
Fixed OpenAI-compatible API not supporting
reasoning_effort - Reduced size of installation on Windows and Linux
New Contributors
- @vorburger made their first contribution in https://github.com/ollama/ollama/pull/11755
- @dan-and made their first contribution in https://github.com/ollama/ollama/pull/10678
- @youzichuan made their first contribution in https://github.com/ollama/ollama/pull/11880
Full Changelog: https://github.com/ollama/ollama/compare/v0.11.4…v0.11.5