Update Dec 4, 2025 tracked by Updatify
v0.13.2
New models
- Qwen3-Next: The first installment in the Qwen3-Next series with strong performance in terms of both parameter efficiency and inference speed.
What’s Changed
-
Flash attention is now enabled by default for vision models such as
mistral-3,gemma3,qwen3-vland more. This improves memory utilization and performance when providing images as input. - Fixed GPU detection on multi-GPU CUDA machines
-
Fixed issue where
deepseek-v3.1would always think even with thinking is disabled in Ollama’s app
New Contributors
- @chengcheng84 made their first contribution in https://github.com/ollama/ollama/pull/13265
- @nathan-hook made their first contribution in https://github.com/ollama/ollama/pull/13256
Full Changelog: https://github.com/ollama/ollama/compare/v0.13.1…v0.13.2