Update Aug 15, 2025 tracked by Updatify

v0.11.5

What’s Changed

  • Performance improvements for the gpt-oss models
  • New memory management: this release of Ollama includes improved memory management for scheduling models on GPUs, leading to better VRAM utilization, model performance and less out of memory errors. These new memory estimations can be enabled with OLLAMA_NEW_ESTIMATES=1 ollama serve and will soon be enabled by default.
  • Improved multi-GPU scheduling and reduced VRAM allocation when using more than 2 GPUs
  • Ollama’s new app will now remember default selections for default model, Turbo and Web Search between restarts
  • Fix error when parsing bad harmony tool calls
  • OLLAMA_FLASH_ATTENTION=1 will also enable flash attention for pure-CPU models
  • Fixed OpenAI-compatible API not supporting reasoning_effort
  • Reduced size of installation on Windows and Linux

New Contributors

Full Changelog: https://github.com/ollama/ollama/compare/v0.11.4…v0.11.5