Improved Ollama Model Atom Feed Scraper with Gemini 2.5 Pro

2025-03-26

This post details the creation of a GitHub Actions and GitHub Pages powered Atom feed scraping recent model data from Ollama's latest models page. Initially built using Claude to convert HTML to Atom, the script was refined using Google's Gemini 2.5 Pro. The upgrade splits the output into two feeds: one containing all models and another with only the most recent 20, improving efficiency and usability.

Read more
Development model scraping

Open-Source OLMo-2 Outperforms GPT-3.5? Mac-Friendly Setup!

2025-03-18
Open-Source OLMo-2 Outperforms GPT-3.5?  Mac-Friendly Setup!

The open-source language model OLMo-2, with 32 billion parameters, claims to outperform GPT-3.5-Turbo and GPT-4 mini. All data, code, weights, and details are freely available. This post details a simple setup for running it on a Mac using the llm-mlx plugin. Download the 17GB model with a few commands and engage in interactive chat or generate images; the example shows generating an SVG of a pelican on a bicycle.

Read more
AI

Aider's Ingenious Installation: Bypassing Virtual Environments

2025-03-06

Paul Gauthier's Aider CLI tool offers an innovative installation method that avoids the complexities of virtual environments for end-users. A simple `pip install aider-install && aider-install` command leverages the `uv` tool to install a self-contained Python 3.12 environment, installing Aider within it and automatically configuring the PATH. This provides a safe and easy installation experience for novice Python users, eliminating complex setup steps.

Read more
Development

LLM Code Hallucinations: Not the End of the World

2025-03-02

A common complaint among developers using LLMs for code is the occurrence of 'hallucinations' – the LLM inventing non-existent methods or libraries. However, the author argues this isn't a fatal flaw. Code hallucinations are easily detectable via compiler/interpreter errors and can be fixed, sometimes automatically by more advanced systems. The real risk lies in undetected errors only revealed during runtime, requiring robust manual testing and QA skills. The author advises developers to improve their code reading, understanding, and review capabilities, and offers tips to reduce hallucinations, such as trying different models, utilizing context effectively, and choosing established technologies. The ability to review code generated by LLMs is presented as valuable skill-building.

Read more
Development

Sub-100MB LLM Now Pip-installable: Introducing llm-smollm2

2025-02-07
Sub-100MB LLM Now Pip-installable: Introducing llm-smollm2

A new plugin, llm-smollm2, bundles a quantized SmolLM2-135M-Instruct LLM under 100MB, making it pip-installable. The author details the creation process, from finding a suitable sub-100MB model (limited by PyPI size restrictions) to suppressing verbose logging from llama-cpp-python and packaging for PyPI. While the model's capabilities are limited, it's presented as a valuable learning tool for understanding LLM technology.

Read more
Development Model Quantization

Lost IBM Training Doc: Computers Can't Be Held Accountable (1979)

2025-02-03
Lost IBM Training Doc: Computers Can't Be Held Accountable (1979)

A legendary page from a 1979 internal IBM training resurfaced online, stating 'A computer can never be held accountable; therefore a computer must never make a management decision.' The original source is lost, reportedly destroyed in a flood. This statement resonates powerfully in our AI-driven age, prompting reflection on AI responsibility and decision-making.

Read more

OpenAI's o3-mini: A Budget-Friendly LLM Powerhouse

2025-02-01

OpenAI has released o3-mini, a new language model that excels in the Codeforces competitive programming benchmark, significantly outperforming GPT-4o and o1. While not universally superior across all metrics, its low price ($1.10/million input tokens, $4.40/million output tokens) and exceptionally high token output limit (100,000 tokens) make it highly competitive. OpenAI plans to integrate it into ChatGPT for web search and summarization, and support is already available in LLM 0.21, but currently limited to Tier 3 users (at least $100 spent on the API). o3-mini offers developers a powerful and cost-effective LLM option.

Read more
AI

llama.cpp WASM Gets 2x Speedup Thanks to Optimized SIMD

2025-01-28

Simon Willison's blog post highlights a significant performance improvement in llama.cpp: a 2x speed increase for the WASM version achieved by optimizing SIMD instructions. Surprisingly, 99% of the code was generated by the AI-assisted programming tool DeepSeek R1. DeepSeek R1 spent 3-5 minutes 'thinking' about each prompt, ultimately helping the developer improve the llm_groq.py plugin and elegantly eliminate the model_map, streamlining the code. This showcases the immense potential of AI in code optimization and refactoring.

Read more

Alibaba's Qwen 2.5: A 1M Token Context LLM

2025-01-26

Alibaba released a major update to its open-source large language model, Qwen 2.5, boasting a staggering 1 million token context window! This is achieved through a new technique called Dual Chunk Attention. Two models are available on Hugging Face: 7B and 14B parameter versions, both requiring significant VRAM – at least 120GB for the 7B and 320GB for the 14B model. While usable for shorter tasks, Alibaba recommends using their custom vLLM framework. GGUF quantized versions are emerging, offering smaller sizes, but compatibility issues with full context lengths might exist. A blogger attempted running the GGUF version on a Mac using Ollama, encountering some challenges and promising a future update.

Read more

AI/LLM Predictions: 1, 3, and 6 Years Out

2025-01-11

Simon Willison shared his predictions for AI/LLM development over the next 1, 3, and 6 years on the Oxide and Friends podcast. He anticipates that general-purpose AI agents won't materialize soon, but code and research assistants will flourish. Within three years, AI-assisted investigative reporting could win a Pulitzer Prize, alongside stricter privacy laws. Six years out, AI might produce amazing art, but could also lead to widespread civil unrest depending on the development and economic impact of AGI/ASI. Willison emphasizes his low confidence in these predictions, offering them as an interesting point of future reflection.

Read more

My Linkblogging Workflow: 7,607 Posts and Counting

2025-01-06
My Linkblogging Workflow:  7,607 Posts and Counting

Simon Willison shares his approach to running a successful link blog spanning over two decades. He details his methods for curating and presenting links, emphasizing the value of adding insightful commentary, giving proper credit to creators, and using technology (Django, Markdown, Claude) to enhance the experience. He argues link blogging is a low-effort, high-reward way to contribute meaningfully to online discourse and encourages others to adopt the practice.

Read more

Apple's $95M Siri Settlement: More Misinformation Than Microphone Spying?

2025-01-03

Apple settled a lawsuit for $95 million over claims that Siri recordings were used for targeted advertising, despite denying wrongdoing. The author argues that the accuracy of ad targeting is more likely due to app data collection than microphone spying. However, anecdotal evidence of ads matching conversations will likely fuel conspiracy theories surrounding microphone surveillance, regardless of the truth.

Read more

LLMs in 2024: A Year of Breakthroughs and Challenges

2024-12-31
LLMs in 2024: A Year of Breakthroughs and Challenges

2024 witnessed a remarkable evolution in Large Language Models (LLMs). Multiple organizations surpassed GPT-4's performance, leading to dramatically increased efficiency—even enabling LLM execution on personal laptops. Multimodal models became commonplace, with voice and video capabilities emerging. Prompt-driven app generation became a commodity, yet universal access to top-tier models lasted only months. While 'agents' remained elusive, the importance of evaluation became paramount. Apple's MLX library excelled, contrasting with its underwhelming 'Apple Intelligence' features. Inference-scaling models rose, lowering costs and improving environmental impact, but also raising concerns about the environmental consequences of new infrastructure. Synthetic training data proved highly effective, but LLM usability remained challenging, knowledge distribution remained uneven, and better critical evaluation is needed.

Read more

Alibaba Unveils QvQ: A New Visual Reasoning Model

2024-12-25
Alibaba Unveils QvQ: A New Visual Reasoning Model

Alibaba recently released QvQ-72B-Preview, a new visual reasoning model under the Apache 2.0 license. Designed to enhance AI's visual reasoning capabilities, QvQ builds upon the inference-scaling model QwQ by adding vision processing. It accepts images and prompts, generating detailed, step-by-step reasoning processes. Blogger Simon Willison tested QvQ, finding it successful in tasks like counting pelicans but less accurate on complex reasoning problems. Currently available on Hugging Face Spaces, future plans include local deployment and broader platform support.

Read more

LLM Benchmark: Pelican on a Bicycle

2024-12-16

Simon Willison created a unique LLM benchmark: generating an SVG image of a pelican riding a bicycle. This unusual prompt aimed to test the models' creative abilities without relying on pre-existing training data. He tested 16 models from OpenAI, Anthropic, Google Gemini, and Meta (Llama on Cerebras), revealing significant variations in the quality of generated SVGs. Some models produced surprisingly good results, while others struggled.

Read more

Storing Times for Human Events: Best Practices and Challenges

2024-12-12
Storing Times for Human Events: Best Practices and Challenges

This blog post discusses best practices for storing event times on event websites. The author argues that directly storing UTC time loses crucial information, such as the user's original intent and location. A better approach is to store the user's intended time and the event location, then derive the UTC time. Examples like user error, international timezone adjustments, and the 2007 Microsoft Exchange DST update illustrate the importance of storing the user's intended time. The author recommends designing a clear and user-friendly interface to help users accurately set event times and locations, emphasizing the importance of maintaining the user's original intent to avoid errors caused by timezone changes.

Read more