Webtagr - Technology News Summarizer

Databricks' TAO: Outperforming Fine-tuning with Unlabeled Data

2025-03-26

Databricks introduces TAO (Test-time Adaptive Optimization), a novel model tuning method requiring only unlabeled usage data. Unlike traditional fine-tuning, TAO leverages test-time compute and reinforcement learning to improve model performance based on past input examples. Surprisingly, TAO surpasses traditional fine-tuning, bringing open-source models like Llama to a quality comparable to expensive proprietary models like GPT-4. This breakthrough is available in preview for Databricks customers and will power future products.

(www.databricks.com)

AI Model Tuning

Model Context Protocol (MCP): A USB-C for AI

2025-03-26

The Model Context Protocol (MCP) is an open protocol standardizing how applications provide context to LLMs. Think of it as a USB-C port for AI: it connects AI models to various data sources and tools. The Agents SDK supports MCP, enabling the use of diverse MCP servers to equip Agents with tools. MCP servers come in two types: stdio servers (local) and HTTP over SSE servers (remote). Caching the tool list minimizes latency. Complete examples are available in the examples/mcp directory.

(openai.github.io)

AI

StarVector: A Transformer-based Image-to-SVG Vectorization Model

2025-03-26

StarVector is a Transformer-based image-to-SVG vectorization model, with 8B and 1B parameter models released on Hugging Face. It achieves state-of-the-art results on the SVG-Bench benchmark, excelling at vectorizing icons, logos, and technical diagrams, demonstrating superior performance in handling complex graphical details. The model leverages extensive datasets for training, encompassing a wide range of vector graphic styles, from simple icons to intricate colored illustrations. Compared to traditional vectorization methods, StarVector generates cleaner, more accurate SVG code, better preserving image details and structural information.

(starvector.github.io)

AI image vectorization Transformer model

AI's Unexpected Revolution: Brevity Trumps Verbosity

2025-03-26

The proliferation of Large Language Models (LLMs) initially caused panic in schools and businesses, fearing their replacement of written assignments and professional communication. However, the author argues that the true impact of LLMs lies in their potential to revolutionize how we communicate and program. LLMs reveal the underlying simplicity of verbose business emails and complex code, pushing us towards concise communication. This could eventually lead to the obsolescence of LLMs themselves, giving rise to more efficient and streamlined business communication and programming languages. This shift towards brevity promises to change the world.

(thomashunter.name)

AI Communication Revolution

Dapr Agents: A Framework for Building Scalable, Resilient AI Agent Systems

2025-03-26

Dapr Agents is a developer framework for building production-grade, resilient AI agent systems that operate at scale. Built on the battle-tested Dapr project, it enables developers to create AI agents that reason, act, and collaborate using Large Language Models (LLMs). Built-in observability and stateful workflow execution ensure agentic workflows complete successfully, regardless of complexity. Key features include efficient multi-agent execution, automatic retry mechanisms, Kubernetes native deployment, diverse data source integration, secure multi-agent collaboration, platform readiness, cost-effectiveness, and vendor neutrality.

(github.com)

AI

Gemini 2.5 Pro: An AI That Knows Its Limits

2025-03-26

The author attempted to get Gemini 2.5 Pro to recreate the famous 90s synthesizer, ReBirth RB-338. Surprisingly, instead of attempting the impossible, Gemini 2.5 Pro assessed the task's difficulty and explained its infeasibility, demonstrating powerful reasoning capabilities. The author negotiated a simpler, yet functional synthesizer. This showcases AI's progress towards understanding its limitations and making rational judgments.

(everything.intellectronica.net)

AI

Reinforcement Learning: From AlphaGo to AlphaGo Zero

2025-03-26

This article provides a comprehensive overview of reinforcement learning (RL), starting with the captivating story of AlphaGo defeating human Go champions. It explains core RL concepts like MDPs, Bellman equations, dynamic programming, Monte Carlo methods, TD learning (SARSA, Q-learning, DQN), policy gradient methods (REINFORCE, Actor-Critic, A3C), and evolutionary strategies. The article delves into the details of each algorithm, using AlphaGo Zero as a compelling case study to illustrate RL's practical applications and its power in solving complex problems.

(lilianweng.github.io)

AI

Whisper's Embeddings Surprisingly Align with Human Brain Activity During Speech

2025-03-26

A study reveals a surprising alignment between OpenAI's Whisper speech recognition model and the neural activity in the human brain during natural conversations. By comparing Whisper's embeddings to brain activity in regions like the inferior frontal gyrus (IFG) and superior temporal gyrus (STG), researchers found that language embeddings peaked before speech embeddings during speech production, and vice-versa during comprehension. This suggests Whisper, despite not being designed with brain mechanisms in mind, captures key aspects of language processing. The findings also highlight a 'soft hierarchy' in brain language processing: higher-order areas like the IFG prioritize semantic and syntactic information but also process lower-level auditory features, while lower-order areas like the STG prioritize acoustic and phonemic processing but also capture word-level information.

(research.google)

AI

Model Context Protocol (MCP): The USB-C Moment for AI?

2025-03-26

Anthropic's Model Context Protocol (MCP), released in late 2024, is taking the AI world by storm. Think of it as the USB-C of AI integrations: it allows Large Language Models (LLMs) like Claude or ChatGPT to seamlessly communicate with external data sources and tools (Obsidian, Gmail, calendars, etc.) without needing a million custom integrations. MCP uses a three-tier architecture—hosts, clients, and servers—to enable secure and reliable data access and action triggering, significantly simplifying development and spawning innovative applications. Examples include connecting LLMs to personal databases, code repositories, and even real-time stock data. MCP's open-source nature has made it a hot topic in the developer community, integrated into numerous AI apps, and heralds a revolutionary shift in how we interact with AI applications.

(pieces.app)

AI

Google's Gemini 2.5: A Thinking AI Model Takes the Lead

2025-03-25

Google unveiled Gemini 2.5, its most intelligent AI model yet. An experimental version, 2.5 Pro, achieves top ranking on LMArena, significantly outperforming competitors. Gemini 2.5's key innovation is its 'thinking' capabilities: it reasons before responding, leading to enhanced accuracy and performance. This reasoning extends beyond simple classification and prediction; it involves analyzing information, drawing logical conclusions, understanding context and nuance, and making informed decisions. Building upon prior work with reinforcement learning and chain-of-thought prompting, Gemini 2.5 combines an improved base model with advanced post-training. Google plans to integrate these thinking capabilities into all future models, enabling them to tackle more complex tasks and power more sophisticated, context-aware agents.

(blog.google)

AI

Apple to Use Apple Maps Imagery for AI Model Training

2025-03-25

Apple recently updated its website, revealing that starting March 2025, it will use imagery and data collected for its Apple Maps Look Around feature to train AI models for image recognition, creation, and enhancement. This data, gathered by vehicles and backpacks equipped with cameras, sensors, and iPhones/iPads, has faces and license plates blurred. Apple states only blurred imagery will be used, and it accepts requests to blur houses. This will enhance AI capabilities in Apple products and services, such as the Photos app's cleanup tool and search functionality.

(www.theverge.com)

AI

Google Unveils Gemini 2.5: A Giant Leap in AI Reasoning

2025-03-25

Google has introduced Gemini 2.5, its most intelligent AI model yet. The experimental 2.5 Pro version boasts top performance across various benchmarks, achieving the #1 spot on LMArena by a considerable margin. Gemini 2.5 models are 'thinking' models, capable of reasoning through their responses, leading to enhanced accuracy and performance. This reasoning extends beyond simple classification and prediction, encompassing information analysis, logical conclusions, contextual understanding, and informed decision-making. Building on prior work with reinforcement learning and chain-of-thought prompting, Gemini 2.5 represents a significant leap forward, combining a vastly improved base model with enhanced post-training. Google plans to integrate these thinking capabilities into all future models, enabling them to tackle more complex problems and support more sophisticated agents.

(blog.google)

AI

Sam Altman on OpenAI: An Accidental Consumer Tech Giant

2025-03-25

This Stratechery interview features OpenAI CEO Sam Altman, detailing OpenAI's journey from a research lab to a consumer tech giant, and the unexpected success of ChatGPT. Altman candidly discusses OpenAI's business model shift, its relationship with Microsoft, views on AI safety and regulation, and the future of AGI. The interview also touches on OpenAI's open-source strategy, GPT-5 development, and the implications of AI across various industries. Altman believes a billion-user AI platform will be more valuable than cutting-edge models, hinting at potential alternative monetization strategies beyond advertising.

(stratechery.com)

AI

VGGT: Lightning-Fast 3D Scene Reconstruction from Images

2025-03-25

Facebook Research introduces VGGT (Visual Geometry Grounded Transformer), a feed-forward neural network capable of inferring all key 3D attributes of a scene—extrinsic and intrinsic camera parameters, point maps, depth maps, and 3D point tracks—from one, a few, or hundreds of views in mere seconds. This user-friendly model, leveraging the power of Transformers, offers an interactive 3D visualization tool. Surprisingly, VGGT demonstrates impressive single-view reconstruction capabilities, achieving competitive results compared to state-of-the-art monocular methods, despite not being explicitly trained for this task.

(github.com)

AI

The Phony Comfort of AI Optimism: A Critique of Casey Newton and Kevin Roose

2025-03-25

This article critiques the blindly optimistic views of tech journalists Casey Newton and Kevin Roose on generative AI. The author argues that their positive predictions lack factual basis, merely catering to market demands and self-interest. Roose's claims about the imminent arrival of AGI, and Newton's excessive praise for OpenAI models, lack rigorous argumentation. The author points out that this 'cautiously optimistic' attitude is actually a cowardly avoidance of reality, ignoring numerous problems and potential risks of AI technology, such as model hallucinations, the manipulability of benchmarks, and the impact on the creative industries. The article also uses CoreWeave as an example to reveal the overheating investment and lack of sustainable business models in the AI field, urging people to maintain critical thinking and face the challenges in AI technology development.

(www.wheresyoured.at)

AI AI optimism tech critique media criticism

AlexNet Source Code Released: The Dawn of the Deep Learning Revolution

2025-03-25

In 2012, Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton's AlexNet demonstrated, for the first time, the massive potential of deep neural networks for image recognition, ushering in the era of deep learning. Recently, the source code for AlexNet was open-sourced, a collaboration between the Computer History Museum and Google. AlexNet's success stemmed from its scale—a large convolutional neural network trained using immense computing power and the ImageNet dataset, overcoming previous limitations of deep learning. This breakthrough fueled decades of innovation in AI, leading to companies like OpenAI and applications like ChatGPT, transforming the world.

(www.zdnet.com)

AI

Unlocking Infantile Amnesia: A Year-Old's Hippocampus Lights Up

2025-03-25

A new study using fMRI scanned the brains of 26 infants aged 4 to 25 months, attempting to solve the century-old mystery of infantile amnesia. The research found that around the age of one, the hippocampus, responsible for memory formation, becomes active, generating neural signals related to things the infants remembered from tests. This suggests that babies begin encoding memories around the age of one, even as their hippocampus is still developing. The study provides valuable clues to understanding early brain development and memory formation, hinting that we may one day be able to retrieve lost memories from our infancy.

(singularityhub.com)

AI infantile amnesia hippocampus memory formation

AI Chatbots and Loneliness: A Double-Edged Sword

2025-03-25

Two new studies reveal a potential dark side to heavy AI chatbot use: increased loneliness and emotional dependence, particularly among power users. Researchers found that lonely individuals are more likely to seek emotional bonds with AI, echoing earlier research on social media. While AI chatbots can offer emotional support, platforms must prioritize user well-being, preventing over-reliance and emotional exploitation, and implementing measures to identify and intervene in unhealthy usage patterns. Lawmakers should also address this emerging issue, developing appropriate regulations.

(www.platformer.news)

AI

Newton's Method Gets a Modern Upgrade: A Faster, Broader Optimization Algorithm

2025-03-25

Over 300 years ago, Isaac Newton developed an algorithm for finding the minimum values of functions. Now, Amir Ali Ahmadi of Princeton University and his students have improved this algorithm to efficiently handle a broader class of functions. This breakthrough uses higher-order derivatives and cleverly transforms the Taylor expansion into a convex sum-of-squares form, achieving faster convergence than traditional gradient descent. While currently computationally expensive, future advancements in computing could allow this algorithm to surpass gradient descent in fields like machine learning, becoming a powerful tool for optimization problems.

(www.quantamagazine.org)

AI optimization algorithm Newton's method

Ant Group Cuts AI Training Costs by 20% Using Chinese Chips

2025-03-25

Ant Group, backed by Jack Ma, has developed AI model training techniques using domestically produced semiconductors from Alibaba and Huawei, achieving cost reductions of 20%. While still utilizing Nvidia chips, Ant primarily relies on AMD and Chinese alternatives for its latest models, mirroring similar results to Nvidia's H800. This highlights China's efforts to reduce reliance on high-end Nvidia chips. Ant's newly developed language models, Ling-Plus and Ling-Lite, even outperformed Meta's Llama in some benchmarks. These models, intended for healthcare and finance applications, signify a significant advancement in cost-effective AI development in China.

(au.finance.yahoo.com)

AI domestic chips cost reduction

ARC-AGI-2: The AGI Benchmark That's Easier for Humans, Harder for AI

2025-03-24

The ARC Prize 2025 competition returns with ARC-AGI-2, a significantly harder AGI benchmark for AI while remaining relatively easy for humans. Focusing on tasks simple for humans but difficult for AI, ARC-AGI-2 highlights capability gaps not addressed by simply scaling up existing models. With a $1 million prize pool, the competition encourages open-source innovation towards efficient, general AI systems, aiming to bridge the human-AI gap and achieve true AGI.

(arcprize.org)

AI

Qwen2.5-VL-32B: A 32B Parameter Visual-Language Model That's More Human-Friendly

2025-03-24

Following the widespread acclaim of the Qwen2.5-VL series, we've open-sourced the new 32-billion parameter visual-language model, Qwen2.5-VL-32B-Instruct. This model boasts significant improvements in mathematical reasoning, fine-grained image understanding, and alignment with human preferences. Benchmarking reveals its superiority over comparable models in multimodal tasks (like MMMU, MMMU-Pro, and MathVista), even outperforming the larger 72-billion parameter Qwen2-VL-72B-Instruct. It also achieves top-tier performance in pure text capabilities at its scale.

(qwenlm.github.io)

AI visual language model

AMD Unveils Instella: A Family of Fully Open 3B Parameter Language Models

2025-03-24

AMD has announced Instella, a family of fully open, state-of-the-art 3-billion-parameter language models (LLMs) trained from scratch on AMD Instinct™ MI300X GPUs. Instella outperforms existing fully open models of similar size and achieves competitive results against leading open-weight models like Llama-3.2-3B. AMD is open-sourcing all model artifacts, including weights, training configurations, datasets, and code, to foster collaboration and innovation within the AI community. The models leverage efficient training techniques and a multi-stage training pipeline.

(rocm.blogs.amd.com)

AI

GPT-4o mini TTS: Text-to-Speech Made Easy

2025-03-24

This tool leverages OpenAI's GPT-4o mini TTS API to transform text into natural-sounding speech. It's a three-step process: input your text, customize settings (six voices and adjustable speed), and generate high-quality audio. The audio streams directly to your browser, never stored on our servers. Experiment with different voices and speeds to find the perfect fit!

(hitts.cc)

AI

CUDA at 18: Nvidia's Secret Sauce and AI Dominance

2025-03-24

Nvidia's CUDA platform, celebrating its 18th anniversary, is far more than a programming language or API; it's the core of Nvidia's software ecosystem, powering numerous "embarrassingly parallel" computing tasks from AI to cryptocurrency mining. CUDA's success stems from Nvidia's consistent long-term investment and steady updates, a stark contrast to competitors like AMD. The success of AlexNet highlighted CUDA's early influence in deep learning, and today, it's the de facto standard in AI, forming a strong competitive moat for Nvidia.

(thechipletter.substack.com)

AI

beeFormer: Bridging the Semantic and Interaction Gap in Recommender Systems

2025-03-24

The beeFormer project introduces a novel approach to recommender systems designed to tackle the cold-start problem. It leverages language models to learn user behavior patterns from interaction data and transfer this knowledge to unseen items. Unlike traditional content-based filtering which relies on item attributes, beeFormer learns user interaction patterns to better recommend items aligned with user interests, even with no prior interaction data. Experiments demonstrate significant performance improvements. The project provides detailed training steps and pre-trained models, supporting datasets such as MovieLens, GoodBooks, and Amazon Books.

(github.com)

AI Recommender Systems Cold-Start Language Models

LangManus: An Open-Source AI Automation Framework for Multi-Agent Collaboration

2025-03-23

LangManus is a community-driven open-source AI automation framework that integrates language models with tools for web search, crawling, and Python code execution. Developed by former colleagues in their spare time, this project aims to explore multi-agent and deep research, participating in the GAIA leaderboard. LangManus employs a hierarchical multi-agent system with roles such as Coordinator, Planner, Supervisor, Researcher, Coder, Browser, and Reporter, supporting various LLM integrations including Qwen and OpenAI-compatible models. The project is open-sourced under the MIT license and welcomes community contributions.

(github.com)

AI AI automation multi-agent system

Improved Crosscoder Unveils Secrets of LLM Fine-tuning

2025-03-23

Researchers introduce a novel method, the 'tied crosscoder,' for comparing the base and fine-tuned chat models of large language models (LLMs). Unlike traditional crosscoders, the tied crosscoder allows the same latent factors to fire at different times for the base and chat models, leading to more effective identification of novel features in the chat model. Experiments demonstrate this approach provides clearer explanations of how chat behavior emerges from base model capabilities and yields more monosemantic latents. This research offers new insights into the fine-tuning process of LLMs and guides future model improvements.

(www.lesswrong.com)

AI crosscoder model fine-tuning

Formal Verification of ML Models in Lean 4

2025-03-23

The `formal_verif_ml` project offers a Lean 4 framework for formally verifying properties (robustness, fairness, interpretability) of machine learning models. It includes a Lean library, model translator, web interface, and CI/CD pipeline, supporting various model types. An interactive web portal lets users upload models, view generated Lean code, trigger proof compilation, and visualize the architecture.

(github.com)

AI

Compute Wins: The New Paradigm in AI Development

2025-03-23

This article explores a new trend in AI development: the supremacy of compute. The author uses personal experiences and analogies to illustrate that over-engineered AI systems are like meticulously cared-for plants that struggle to adapt to changing environments, while large-scale compute-based AI systems, like naturally growing plants, can learn and adapt autonomously. By comparing rule-based, limited-compute, and scale-out approaches to building customer service automation systems, the author demonstrates the superiority of the scale-out solution. The rise of Reinforcement Learning (RL) further confirms this trend, as it explores multiple solutions through massive computation, ultimately achieving results that surpass human design. In the future, the role of AI engineers will shift from crafting perfect algorithms to building systems that can effectively leverage massive computational resources.

(ankitmaloo.com)

AI Compute

Category: AI