Sub-100MB LLM Now Pip-installable: Introducing llm-smollm2

2025-02-07
Sub-100MB LLM Now Pip-installable: Introducing llm-smollm2

A new plugin, llm-smollm2, bundles a quantized SmolLM2-135M-Instruct LLM under 100MB, making it pip-installable. The author details the creation process, from finding a suitable sub-100MB model (limited by PyPI size restrictions) to suppressing verbose logging from llama-cpp-python and packaging for PyPI. While the model's capabilities are limited, it's presented as a valuable learning tool for understanding LLM technology.

Read more
Development Model Quantization

Lost IBM Training Doc: Computers Can't Be Held Accountable (1979)

2025-02-03
Lost IBM Training Doc: Computers Can't Be Held Accountable (1979)

A legendary page from a 1979 internal IBM training resurfaced online, stating 'A computer can never be held accountable; therefore a computer must never make a management decision.' The original source is lost, reportedly destroyed in a flood. This statement resonates powerfully in our AI-driven age, prompting reflection on AI responsibility and decision-making.

Read more

OpenAI's o3-mini: A Budget-Friendly LLM Powerhouse

2025-02-01

OpenAI has released o3-mini, a new language model that excels in the Codeforces competitive programming benchmark, significantly outperforming GPT-4o and o1. While not universally superior across all metrics, its low price ($1.10/million input tokens, $4.40/million output tokens) and exceptionally high token output limit (100,000 tokens) make it highly competitive. OpenAI plans to integrate it into ChatGPT for web search and summarization, and support is already available in LLM 0.21, but currently limited to Tier 3 users (at least $100 spent on the API). o3-mini offers developers a powerful and cost-effective LLM option.

Read more
AI

llama.cpp WASM Gets 2x Speedup Thanks to Optimized SIMD

2025-01-28

Simon Willison's blog post highlights a significant performance improvement in llama.cpp: a 2x speed increase for the WASM version achieved by optimizing SIMD instructions. Surprisingly, 99% of the code was generated by the AI-assisted programming tool DeepSeek R1. DeepSeek R1 spent 3-5 minutes 'thinking' about each prompt, ultimately helping the developer improve the llm_groq.py plugin and elegantly eliminate the model_map, streamlining the code. This showcases the immense potential of AI in code optimization and refactoring.

Read more

Alibaba's Qwen 2.5: A 1M Token Context LLM

2025-01-26

Alibaba released a major update to its open-source large language model, Qwen 2.5, boasting a staggering 1 million token context window! This is achieved through a new technique called Dual Chunk Attention. Two models are available on Hugging Face: 7B and 14B parameter versions, both requiring significant VRAM – at least 120GB for the 7B and 320GB for the 14B model. While usable for shorter tasks, Alibaba recommends using their custom vLLM framework. GGUF quantized versions are emerging, offering smaller sizes, but compatibility issues with full context lengths might exist. A blogger attempted running the GGUF version on a Mac using Ollama, encountering some challenges and promising a future update.

Read more

AI/LLM Predictions: 1, 3, and 6 Years Out

2025-01-11

Simon Willison shared his predictions for AI/LLM development over the next 1, 3, and 6 years on the Oxide and Friends podcast. He anticipates that general-purpose AI agents won't materialize soon, but code and research assistants will flourish. Within three years, AI-assisted investigative reporting could win a Pulitzer Prize, alongside stricter privacy laws. Six years out, AI might produce amazing art, but could also lead to widespread civil unrest depending on the development and economic impact of AGI/ASI. Willison emphasizes his low confidence in these predictions, offering them as an interesting point of future reflection.

Read more

My Linkblogging Workflow: 7,607 Posts and Counting

2025-01-06
My Linkblogging Workflow:  7,607 Posts and Counting

Simon Willison shares his approach to running a successful link blog spanning over two decades. He details his methods for curating and presenting links, emphasizing the value of adding insightful commentary, giving proper credit to creators, and using technology (Django, Markdown, Claude) to enhance the experience. He argues link blogging is a low-effort, high-reward way to contribute meaningfully to online discourse and encourages others to adopt the practice.

Read more

Apple's $95M Siri Settlement: More Misinformation Than Microphone Spying?

2025-01-03

Apple settled a lawsuit for $95 million over claims that Siri recordings were used for targeted advertising, despite denying wrongdoing. The author argues that the accuracy of ad targeting is more likely due to app data collection than microphone spying. However, anecdotal evidence of ads matching conversations will likely fuel conspiracy theories surrounding microphone surveillance, regardless of the truth.

Read more

LLMs in 2024: A Year of Breakthroughs and Challenges

2024-12-31
LLMs in 2024: A Year of Breakthroughs and Challenges

2024 witnessed a remarkable evolution in Large Language Models (LLMs). Multiple organizations surpassed GPT-4's performance, leading to dramatically increased efficiency—even enabling LLM execution on personal laptops. Multimodal models became commonplace, with voice and video capabilities emerging. Prompt-driven app generation became a commodity, yet universal access to top-tier models lasted only months. While 'agents' remained elusive, the importance of evaluation became paramount. Apple's MLX library excelled, contrasting with its underwhelming 'Apple Intelligence' features. Inference-scaling models rose, lowering costs and improving environmental impact, but also raising concerns about the environmental consequences of new infrastructure. Synthetic training data proved highly effective, but LLM usability remained challenging, knowledge distribution remained uneven, and better critical evaluation is needed.

Read more

Alibaba Unveils QvQ: A New Visual Reasoning Model

2024-12-25
Alibaba Unveils QvQ: A New Visual Reasoning Model

Alibaba recently released QvQ-72B-Preview, a new visual reasoning model under the Apache 2.0 license. Designed to enhance AI's visual reasoning capabilities, QvQ builds upon the inference-scaling model QwQ by adding vision processing. It accepts images and prompts, generating detailed, step-by-step reasoning processes. Blogger Simon Willison tested QvQ, finding it successful in tasks like counting pelicans but less accurate on complex reasoning problems. Currently available on Hugging Face Spaces, future plans include local deployment and broader platform support.

Read more

LLM Benchmark: Pelican on a Bicycle

2024-12-16

Simon Willison created a unique LLM benchmark: generating an SVG image of a pelican riding a bicycle. This unusual prompt aimed to test the models' creative abilities without relying on pre-existing training data. He tested 16 models from OpenAI, Anthropic, Google Gemini, and Meta (Llama on Cerebras), revealing significant variations in the quality of generated SVGs. Some models produced surprisingly good results, while others struggled.

Read more

Storing Times for Human Events: Best Practices and Challenges

2024-12-12
Storing Times for Human Events: Best Practices and Challenges

This blog post discusses best practices for storing event times on event websites. The author argues that directly storing UTC time loses crucial information, such as the user's original intent and location. A better approach is to store the user's intended time and the event location, then derive the UTC time. Examples like user error, international timezone adjustments, and the 2007 Microsoft Exchange DST update illustrate the importance of storing the user's intended time. The author recommends designing a clear and user-friendly interface to help users accurately set event times and locations, emphasizing the importance of maintaining the user's original intent to avoid errors caused by timezone changes.

Read more