Update Nov 5, 2025 tracked by Updatify

v0.12.10

ollama run now works with embedding models

ollama run can now run embedding models to generate vector embeddings from text:

ollama run embeddinggemma "Hello world"

Content can also be provided to ollama run via standard input:

echo "Hello world" | ollama run embeddinggemma

What’s Changed

  • Fixed errors when running qwen3-vl:235b and qwen3-vl:235b-instruct
  • Enable flash attention for Vulkan (currently needs to be built from source)
  • Add Vulkan memory detection for Intel GPU using DXGI+PDH
  • Ollama will now return tool call IDs from the /api/chat API
  • Fixed hanging due to CPU discovery
  • Ollama will now show login instructions when switching to a cloud model in interactive mode
  • Fix reading stale VRAM data
  • ollama run now works with embedding models

New Contributors

Full Changelog: https://github.com/ollama/ollama/compare/v0.12.9…v0.12.10