Update Nov 5, 2025 tracked by Updatify
v0.12.10
ollama run now works with embedding models
ollama run can now run embedding models to generate vector embeddings from text:
ollama run embeddinggemma "Hello world"
Content can also be provided to ollama run via standard input:
echo "Hello world" | ollama run embeddinggemma
What’s Changed
-
Fixed errors when running
qwen3-vl:235bandqwen3-vl:235b-instruct - Enable flash attention for Vulkan (currently needs to be built from source)
- Add Vulkan memory detection for Intel GPU using DXGI+PDH
-
Ollama will now return tool call IDs from the
/api/chatAPI - Fixed hanging due to CPU discovery
- Ollama will now show login instructions when switching to a cloud model in interactive mode
- Fix reading stale VRAM data
-
ollama runnow works with embedding models
New Contributors
- @ryanycoleman made their first contribution in https://github.com/ollama/ollama/pull/11740
- @Rajathbail made their first contribution in https://github.com/ollama/ollama/pull/12929
- @virajwad made their first contribution in https://github.com/ollama/ollama/pull/12664
- @AXYZdong made their first contribution in https://github.com/ollama/ollama/pull/8601
Full Changelog: https://github.com/ollama/ollama/compare/v0.12.9…v0.12.10