The latest NVIDIA DGX Spark is here! Ollama has partnered with NVIDIA to ensure it runs fast and efficiently out-of-the-box.
A new web search API is now available in Ollama. Ollama provides a generous free tier of web searches for […]
Ollama now includes a significantly improved model scheduling system, reducing crashes due to out of memory issues, maximizing GPU utilization […]
Cloud models are now in preview, letting you run larger models with fast, datacenter-grade hardware. You can keep using your […]
Secure Minions is a secure protocol built by Stanford’s Hazy Research lab to allow encrypted local-remote communication.
Ollama now has the ability to enable or disable thinking. This gives users the flexibility to choose the model’s thinking […]
Ollama now supports streaming responses with tool calling. This enables all chat applications to stream content and also call tools […]
Ollama now supports new multimodal models with its new engine.
Avanika Narayan, Dan Biderman, and Sabri Eyuboglu from Christopher Ré’s Stanford Hazy Research lab, along with Avner May, Scott Linderman, […]
Ollama now supports structured outputs making it possible to constrain a model’s output to a specific format defined by a […]
With Ollama Python library version 0.4, functions can now be provided as tools. The library now also has full typing […]
Bespoke-Minicheck is a new grounded factuality checking model developed by Bespoke Labs that is now available in Ollama. It can […]
Ollama now supports tool calling with popular models such as Llama 3.1. This enables a model to answer a given […]
Continue enables you to easily create your own coding assistant directly inside Visual Studio Code and JetBrains with open-source LLMs.
At Google IO 2024, Google announced Ollama support in Firebase Genkit, a new open-source framework for developers to build, deploy […]
Compared to Llama 2, Llama 3 feels much less censored. Meta has substantially lowered false refusal rates. Llama 3 will […]
Llama 3 is now available to run on Ollama. This model is the next generation of Meta’s state-of-the-art large language […]
Embedding models are available in Ollama, making it easy to generate vector embeddings for use in search and retrieval augmented […]
Ollama now supports AMD graphics cards in preview on Windows and Linux. All the features of Ollama can now be […]
Ollama is now available on Windows in preview, making it possible to pull, run and create large language models in […]
Ollama now has initial compatibility with the OpenAI Chat Completions API, making it possible to use existing tooling built for […]
New vision models are now available: LLaVA 1.6, in 7B, 13B and 34B parameter sizes. These models support higher resolution […]
The initial versions of the Ollama Python and JavaScript libraries are now available, making it easy to integrate your Python […]
Recreate one of the most popular LangChain use-cases with open source, locally running software – a chain that performs Retrieval-Augmented […]
Ollama can now run with Docker Desktop on the Mac, and run inside Docker containers with GPU acceleration on Linux.
Currently running Ollama API instances on these locations: 1. Netherlands [Amsterdam] – Multiple End-Points 2. France [Paris] – Multiple End-Points […]
The Ollama Platform has integrated with a publicly available API service, enabling seamless access and management of multiple Ollama installations […]