Webtagr - Technology News Summarizer

Ollama Turbo: Blazing Fast Open-Source LLMs

2025-08-06

Ollama Turbo is a new way to run large open-source language models using datacenter-grade hardware. Many new models are too large for typical GPUs or run too slowly. Ollama Turbo offers a solution for fast execution, compatible with Ollama's App, CLI, and API. Currently in preview, it supports gpt-oss-20b and gpt-oss-120b. It works with Ollama's CLI, API, and JavaScript/Python libraries. Importantly, Ollama doesn't log or retain any queries made in Turbo mode. All hardware is US-based. Usage limits (hourly and daily) are in place to manage capacity, with usage-based pricing coming soon.

(ollama.com)

AI

Ollama Launches Desktop App for Easier LLM Interaction

2025-07-31

Ollama has released a new desktop application for macOS and Windows, offering a more streamlined way to interact with large language models. The app supports drag-and-drop file uploads (text or PDFs), making it easier to process documents. Users can also increase context length in settings for larger files (requires more memory). Multimodal support allows sending images to compatible models like Google DeepMind's Gemma 3, and code files can be processed for understanding. A command-line interface version is also available.

(ollama.com)

Development

Ollama's New Multimodal Engine: Local Inference for Vision Models

2025-05-16

Ollama has launched a new engine supporting local inference for multimodal models, starting with vision models like Llama 4 Scout and Gemma 3. Addressing limitations of the ggml library for multimodal models, the engine improves model modularity, accuracy, and memory management for reliable and efficient inference with large images and complex architectures (including Mixture-of-Experts models). This focus on accuracy and reliability lays the foundation for future support of speech, image generation, and longer contexts.

(ollama.com)

AI local inference

Google's Gemma: A Lightweight Multimodal Model Family

2025-03-12

Google unveiled Gemma, a lightweight family of multimodal models built on Gemini technology. Gemma 3 models process text and images, boast a 128K context window, and support over 140 languages. Available in 1B, 4B, 12B, and 27B parameter sizes, they excel at question answering, summarization, and reasoning, while their compact design enables deployment on resource-constrained devices. Benchmark results demonstrate strong performance across various tasks, particularly in multilingual and multimodal capabilities.

(ollama.com)

AI Lightweight Model

DeepSeek-R1: A Family of Reasoning Models Matching OpenAI-o1

2025-01-21

DeepSeek has released its first-generation reasoning models, DeepSeek-R1, demonstrating performance comparable to OpenAI-o1. The series includes models ranging from 1.5B to 70B parameters, easily runnable via Ollama. DeepSeek-R1 excels in math, code, and reasoning tasks, presenting a significant contender in the AI landscape.

(ollama.com)

AI reasoning model

Microsoft Releases Phi-4: A 14B Parameter Open-Source Language Model

2025-01-12

Microsoft has unveiled Phi-4, a new 14-billion parameter open-source language model. Built using a blend of synthetic data, filtered public domain websites, and academic books and Q&A datasets, Phi-4 boasts a rigorous enhancement and alignment process, ensuring accurate instruction following and robust safety. With a context length of 16k tokens, it's designed for general-purpose AI systems and applications (primarily English) needing memory/compute constrained environments, low latency, and strong reasoning and logic capabilities. Microsoft emphasizes that developers should consider the limitations of language models and mitigate for accuracy, safety, and fairness, especially in high-risk scenarios.

(ollama.com)

AI