v0.15.5

New models

Qwen3-Coder-Next: a coding-focused language model from Alibaba’s Qwen team, optimized for agentic coding workflows and local development.
GLM-OCR: GLM-OCR is a multimodal OCR model for complex document understanding, built on the GLM-V encoder–decoder architecture.

ollama launch can now be provided arguments, for example ollama launch claude -- --resume
ollama launch will now work run subagents when using ollama launch claude
Ollama will now set context limits for a set of models when using ollama launch opencode

Sub-agent support for ollama launch for planning, deep research, and similar tasks
ollama signin will now open a browser window to make signing in easier
Ollama will now default to the following context lengths based on VRAM:
- < 24 GiB VRAM: 4,096 context
- 24-48 GiB VRAM: 32,768 context
- >= 48 GiB VRAM: 262,144 context
GLM-4.7-Flash support on Ollama’s experimental MLX engine
ollama signin will now open the browser to the connect page
Fixed off by one error when using num_predict in the API
Fixed issue where tokens from a previous sequence would be returned when hitting num_predict

@avukmirovich made their first contribution in https://github.com/ollama/ollama/pull/13934

Full Changelog: https://github.com/ollama/ollama/compare/v0.15.4…v0.15.5