v0.11.5 — Ollama - Product release notes & changelog tool

What’s Changed

Performance improvements for the gpt-oss models
New memory management: this release of Ollama includes improved memory management for scheduling models on GPUs, leading to better VRAM utilization, model performance and less out of memory errors. These new memory estimations can be enabled with OLLAMA_NEW_ESTIMATES=1 ollama serve and will soon be enabled by default.
Improved multi-GPU scheduling and reduced VRAM allocation when using more than 2 GPUs
Ollama’s new app will now remember default selections for default model, Turbo and Web Search between restarts
Fix error when parsing bad harmony tool calls
OLLAMA_FLASH_ATTENTION=1 will also enable flash attention for pure-CPU models
Fixed OpenAI-compatible API not supporting reasoning_effort
Reduced size of installation on Windows and Linux

@vorburger made their first contribution in https://github.com/ollama/ollama/pull/11755
@dan-and made their first contribution in https://github.com/ollama/ollama/pull/10678
@youzichuan made their first contribution in https://github.com/ollama/ollama/pull/11880

Full Changelog: https://github.com/ollama/ollama/compare/v0.11.4…v0.11.5