Category: AI

Google Unveils Ironwood: A 7th-Gen TPU for the Inference Age

2025-04-09
Google Unveils Ironwood: A 7th-Gen TPU for the Inference Age

At Google Cloud Next '25, Google announced Ironwood, its seventh-generation Tensor Processing Unit (TPU). This is Google's most powerful and scalable custom AI accelerator yet, designed specifically for inference. Ironwood marks a shift towards a proactive “age of inference,” where AI models generate insights and answers, not just data. Scaling up to 9,216 liquid-cooled chips interconnected via breakthrough ICI networking (nearly 10MW), Ironwood is a key component of Google Cloud's AI Hypercomputer architecture. Developers can leverage Google's Pathways software stack to easily harness the power of tens of thousands of Ironwood TPUs.

Agent2Agent (A2A): A New Era of AI Agent Interoperability

2025-04-09
Agent2Agent (A2A): A New Era of AI Agent Interoperability

Google launches Agent2Agent (A2A), an open protocol enabling seamless collaboration between AI agents built by different vendors or using different frameworks. Supported by over 50 tech partners and service providers, A2A allows secure information exchange and coordinated actions, boosting productivity and lowering costs. Built on existing standards, A2A supports multiple modalities, prioritizes security, and handles long-running tasks. Use cases range from automating hiring processes (e.g., candidate sourcing and interview scheduling) to streamlining complex workflows across various enterprise applications. Its open-source nature fosters a thriving ecosystem of collaborative AI agents.

DeepCoder-14B: Open-Source Code Reasoning Model Matches OpenAI's o3-mini

2025-04-09
DeepCoder-14B: Open-Source Code Reasoning Model Matches OpenAI's o3-mini

Agentica and Together AI have released DeepCoder-14B-Preview, a code reasoning model fine-tuned via distributed RL from Deepseek-R1-Distilled-Qwen-14B. Achieving an impressive 60.6% Pass@1 accuracy on LiveCodeBench, it rivals OpenAI's o3-mini, using only 14B parameters. The project open-sources its dataset, code, training logs, and system optimizations, showcasing a robust training recipe built on high-quality data and algorithmic improvements to GRPO. This advancement democratizes access to high-performing code-generation models.

Gemini 2.5 Pro Experimental: Deep Research Just Got a Whole Lot Smarter

2025-04-09
Gemini 2.5 Pro Experimental: Deep Research Just Got a Whole Lot Smarter

Gemini Advanced subscribers can now access Deep Research powered by Gemini 2.5 Pro Experimental, deemed the world's most capable AI model by industry benchmarks and Chatbot Arena. This personal AI research assistant significantly improves every stage of the research process. In testing, raters preferred reports generated by Gemini 2.5 Pro over competitors by more than a 2:1 margin, citing improvements in analytical reasoning, information synthesis, and insightful report generation. Access detailed, easy-to-read reports on any topic across web, Android, and iOS, saving hours of work. Plus, try the new Audio Overviews feature for on-the-go listening. Learn more and try it now by selecting Gemini 2.5 Pro (experimental) and choosing 'Deep Research' in the prompt bar.

Cyc: The $200M AI That Never Was

2025-04-08
Cyc: The $200M AI That Never Was

This essay details the 40-year history of Cyc, Douglas Lenat's ambitious project to build artificial general intelligence (AGI) by scaling symbolic logic. Despite a $200 million investment and 2000 person-years of effort, Cyc failed to achieve intellectual maturity. The article unveils its secretive history, highlighting the project's insularity and rejection of alternative AI approaches as key factors contributing to its failure. Cyc's long, slow demise serves as a powerful indictment against the symbolic-logic approach to AGI.

Meta's Llama 4: Second Place Ranking and a Messy Launch

2025-04-08
Meta's Llama 4: Second Place Ranking and a Messy Launch

Meta released two new Llama 4 models: Scout and Maverick. Maverick secured the number two spot on LMArena, outperforming GPT-4o and Gemini 2.0 Flash. However, Meta admitted that LMArena tested a specially optimized "experimental chat version," not the publicly available one. This sparked controversy, leading LMArena to update its policies to prevent similar incidents. Meta explained that it was experimenting with different versions, but the move raised questions about its strategy in the AI race and the unusual timing of the Llama 4 release. Ultimately, the incident highlights the limitations of AI benchmarks and the complex strategies of large tech companies in the competition.

One-Minute Videos from Text Storyboards using Test-Time Training Transformers

2025-04-08

Current Transformer models struggle with generating one-minute videos due to the inefficiency of self-attention layers for long contexts. This paper explores Test-Time Training (TTT) layers, whose hidden states are themselves neural networks, offering greater expressiveness. Adding TTT layers to a pre-trained Transformer allows for the generation of one-minute videos from text storyboards. Experiments using a Tom and Jerry cartoon dataset show that TTT layers significantly improve video coherence and storytelling compared to baselines like Mamba 2 and Gated DeltaNet, achieving a 34 Elo point advantage in human evaluation. While artifacts remain, likely due to limitations of the 5B parameter model, this work demonstrates a promising approach scalable to longer videos and more complex narratives.

Multimodal AI Image Generation: A Visual Revolution Begins

2025-04-08
Multimodal AI Image Generation: A Visual Revolution Begins

Google and OpenAI's recent release of multimodal image generation capabilities marks a revolution in AI image generation. Unlike previous methods that sent text prompts to separate image generation tools, multimodal models directly control the image creation process, building images token by token, much like LLMs generate text. This allows AI to generate more precise and impressive images, and iterate based on user feedback. The article showcases the powerful capabilities of multimodal models through various examples, such as generating infographics, modifying image details, and even creating virtual product advertisements. However, it also highlights challenges, including copyright and ethical concerns, as well as potential misuse like deepfakes. Ultimately, the author believes multimodal AI will profoundly change the landscape of visual creation, and we need to carefully consider how to guide this transformation to ensure its healthy development.

Real-time Neuroplasticity: Giving Pre-trained LLMs Real-time Learning

2025-04-08
Real-time Neuroplasticity: Giving Pre-trained LLMs Real-time Learning

This experimental technique, called "Neural Graffiti," uses a plug-in called the "Spray Layer" to inject memory traces directly into the final inference stage of pre-trained large language models (LLMs) without fine-tuning or retraining. Mimicking the neuroplasticity of the brain, it subtly alters the model's "thinking" by modifying vector embeddings, influencing its generative token predictions. Through interaction, the model gradually learns and evolves. While not forcing specific word outputs, it biases the model towards associated concepts with repeated interaction. The aim is to give AI models more proactive behavior, focused personality, and enhanced curiosity, ultimately helping them achieve a form of self-awareness at the neuron level.

AI

Background Music Listening Habits Differ Between Neurotypical Adults and Those Screened for ADHD

2025-04-08

An online survey of 910 young adults (17–30 years old) compared background music (BM) listening habits and subjective effects between neurotypical individuals and those who screened positive for ADHD across tasks with varying cognitive demands. The ADHD group showed a significantly higher preference for BM in specific situations, such as studying and exercising, and a stronger preference for stimulating music. However, no significant differences were found in subjective effects of BM on cognitive and emotional functioning between the groups. The study highlights the importance of adjusting BM use based on individual arousal needs and available cognitive resources, offering a novel perspective on music interventions for ADHD.

LLMs Hit a Wall: Llama 4's Failure and the AI Hype Cycle

2025-04-08
LLMs Hit a Wall: Llama 4's Failure and the AI Hype Cycle

The release of Llama 4 signals that large language models may have hit a performance ceiling. Meta's massive investment in Llama 4 failed to deliver expected breakthroughs, with rumors suggesting potential data manipulation to meet targets. This mirrors the struggles faced by OpenAI, Google, and others in their pursuit of GPT-5-level AI. Industry disappointment with Llama 4's performance is widespread, further solidified by the departure of Meta's AI VP, Joelle Pineau. The article highlights issues like data leakage and contamination within the AI industry, accusing prominent figures of overly optimistic predictions while ignoring real-world failures.

Do LLMs Understand Nulls? Probing the Internal Representations of Code-Generating Models

2025-04-07

Large language models (LLMs) have shown remarkable progress in code generation, but their true understanding of code remains a question. This work investigates LLMs' comprehension of nullability in code, employing both external evaluation (code completion) and internal probing (model activation analysis). Results reveal LLMs learn and apply rules about null values, with performance varying based on rule complexity and model size. The study also illuminates how LLMs internally represent nullability and how this understanding evolves during training.

LLM Elimination Game: Social Reasoning, Strategy, and Deception

2025-04-07
LLM Elimination Game: Social Reasoning, Strategy, and Deception

Researchers created a multiplayer "elimination game" benchmark to evaluate Large Language Models (LLMs) in social reasoning, strategy, and deception. Eight LLMs compete, engaging in public and private conversations, forming alliances, and voting to eliminate opponents until only two remain. A jury of eliminated players then decides the winner. Analyzing conversation logs, voting patterns, and rankings reveals how LLMs balance shared knowledge with hidden intentions, forging alliances or betraying them strategically. The benchmark goes beyond simple dialogue, forcing models to navigate public vs. private dynamics, strategic voting, and jury persuasion. GPT-4.5 Preview emerged as the top performer.

AI Agent Solves Minecraft's Diamond Challenge Without Human Guidance

2025-04-07
AI Agent Solves Minecraft's Diamond Challenge Without Human Guidance

Researchers at Google DeepMind have developed Dreamer, an AI system that learned to autonomously collect diamonds in Minecraft without any prior human instruction. This represents a significant advancement in AI's ability to generalize knowledge. Dreamer uses reinforcement learning and a world model to predict future scenarios, enabling it to effectively plan and execute the complex task of diamond collection without pre-programmed rules or demonstrations. The research paves the way for creating robots capable of learning and adapting in the real world.

AI

The Great LLM Hype: Benchmarks vs. Reality

2025-04-06
The Great LLM Hype: Benchmarks vs. Reality

A startup using AI models for code security scanning found limited practical improvements despite rising benchmark scores since June 2024. The author argues that advancements in large language models haven't translated into economic usefulness or generalizability, contradicting public claims. This raises concerns about AI model evaluation methods and potential exaggeration of capabilities by AI labs. The author advocates for focusing on real-world application performance over benchmark scores and highlights the need for robust evaluation before deploying AI in societal contexts.

Foundry: Tackling the Reliability Crisis in Browser Agents

2025-04-06
Foundry: Tackling the Reliability Crisis in Browser Agents

Current browser agents from leading AI labs fail over 80% of real-world tasks. Foundry is building the first robust simulator, RL training environment, and evaluation platform designed specifically for browser agents. By creating perfect replicas of websites like DoorDash, Foundry allows for millions of tests without real-world complexities, pinpointing failure points and accelerating improvements. Their mission is to transform unstable research projects into reliable enterprise solutions. They're seeking exceptional full-stack engineers to join their team of ML experts from Scale AI, to tackle this massive $20B+ automation market opportunity.

AI

QVQ-Max: An AI Model with Both Vision and Intellect

2025-04-06
QVQ-Max: An AI Model with Both Vision and Intellect

QVQ-Max is a novel visual reasoning model that not only 'understands' images and videos but also analyzes and reasons with this information to solve various problems. From math problems to everyday questions, from programming code to artistic creation, QVQ-Max demonstrates impressive capabilities. It excels at detailed observation, deep reasoning, and flexible application in various scenarios, such as assisting with work, learning, and daily life. Future development will focus on improving recognition accuracy, enhancing multi-step task handling, and expanding interaction methods to become a truly practical visual agent.

Model Context Protocol (MCP): The Next Big Thing for LLM Integration—But With a Catch

2025-04-06
Model Context Protocol (MCP): The Next Big Thing for LLM Integration—But With a Catch

Model Context Protocol (MCP) is emerging as the standard for Large Language Model (LLM) integration with tools and data, dubbed the "USB-C for AI agents." It enables agents to connect to tools via standardized APIs, maintain persistent sessions, run commands, and share context across workflows. However, MCP isn't secure by default. Connecting agents to arbitrary servers without careful consideration can create security vulnerabilities, potentially exposing shell access, secrets, or infrastructure via side-channel attacks.

SeedLM: A Novel LLM Weight Compression Method Using Pseudo-Random Number Generators

2025-04-06
SeedLM: A Novel LLM Weight Compression Method Using Pseudo-Random Number Generators

Large Language Models (LLMs) are hindered by high runtime costs, limiting widespread deployment. Meta researchers introduce SeedLM, a novel post-training compression method using seeds from a pseudo-random number generator to encode and compress model weights. During inference, SeedLM uses a Linear Feedback Shift Register (LFSR) to efficiently generate a random matrix, linearly combined with compressed coefficients to reconstruct weight blocks. This reduces memory access and leverages idle compute cycles, speeding up memory-bound tasks by trading compute for fewer memory accesses. Unlike state-of-the-art methods requiring calibration data, SeedLM is data-free and generalizes well across diverse tasks. Experiments on the challenging Llama 3 70B show zero-shot accuracy at 4- and 3-bit compression matching or exceeding state-of-the-art methods, while maintaining performance comparable to FP16 baselines. FPGA tests demonstrate that 4-bit SeedLM approaches a 4x speed-up over an FP16 Llama 2/3 baseline as model size increases.

AI

TripoSG: High-Fidelity 3D Shape Synthesis with Large-Scale Rectified Flow Models

2025-04-06
TripoSG: High-Fidelity 3D Shape Synthesis with Large-Scale Rectified Flow Models

TripoSG is a cutting-edge foundation model for high-fidelity image-to-3D generation. Leveraging large-scale rectified flow transformers, hybrid supervised training, and a high-quality dataset, it achieves state-of-the-art results. TripoSG generates meshes with sharp features, fine details, and complex structures, accurately reflecting input image semantics. It boasts strong generalization capabilities, handling diverse input styles. A 1.5B parameter model, along with inference code and an interactive demo, is now available.

Model Signing: Securing the Integrity of ML Models

2025-04-05
Model Signing: Securing the Integrity of ML Models

With the explosive growth of machine learning applications, model security has become a critical concern. This project aims to secure the integrity and provenance of machine learning models through model signing. It utilizes tools like Sigstore to generate model signatures and provides CLI and API interfaces, supporting various signing methods (including Sigstore, public keys, and certificates). Users can independently verify the integrity of their models, preventing tampering after training. The project also integrates with SLSA (Supply chain Levels for Software Artifacts) to further enhance the security of the machine learning model supply chain.

Meta's Llama 4: Powerful Multimodal AI Models Arrive

2025-04-05
Meta's Llama 4: Powerful Multimodal AI Models Arrive

Meta has unveiled its Llama 4 family of AI models, offering Llama 4 Scout and Llama 4 Maverick to cater to diverse developer needs. Llama 4 Scout, a leading multimodal model, boasts 17 billion active parameters and 109 billion total parameters, delivering state-of-the-art performance. Llama 4 Maverick, with 17 billion active parameters and 400 billion total parameters, outperforms Llama 3.3 70B at a lower cost, excelling in image and text understanding across 12 languages. Ideal for general assistants and chat applications, it's optimized for high-quality responses and nuanced tone.

Google Releases Stable Model Signing Library to Secure the AI Supply Chain

2025-04-05
Google Releases Stable Model Signing Library to Secure the AI Supply Chain

The rise of large language models (LLMs) has brought increased focus on AI supply chain security. Model tampering, data poisoning, and other threats are growing concerns. To address this, Google, in partnership with NVIDIA and HiddenLayer, and supported by the Open Source Security Foundation, has released the first stable version of its model signing library. This library uses digital signatures, such as those from Sigstore, to allow users to verify that the model used by an application is identical to the one created by the developers. This ensures model integrity and provenance, protecting against malicious tampering throughout the model's lifecycle, from training to deployment. Future plans include extending this technology to datasets and other ML artifacts, building a more robust AI trust ecosystem.

AI in Healthcare: The Computational Bottleneck

2025-04-05
AI in Healthcare: The Computational Bottleneck

A researcher highlights the inaccuracy of current clinical tools used for cancer risk prediction. AI has the potential to leverage massive patient data for personalized care, enabling earlier cancer detection, improved diagnostics, and optimized treatment protocols. However, the sheer volume of healthcare data overwhelms traditional computer chips, making computational power a bottleneck for realizing AI's full potential in healthcare. While researchers optimize algorithms, silicon-based chip technology is nearing its performance limits, necessitating a new approach to chip technology for AI to reach its full potential.

LeCun: LLMs Will Be Obsolete in Five Years

2025-04-05
LeCun: LLMs Will Be Obsolete in Five Years

Yann LeCun, Meta's chief AI scientist, predicts that large language models (LLMs) will be largely obsolete within five years. He argues that current LLMs lack understanding of the physical world, operating as specialized tools in a simple, discrete space (language). LeCun and his team are developing an alternative approach called JEPA, which aims to create representations of the physical world from visual input, enabling true reasoning and planning capabilities surpassing LLMs. He envisions AI transforming society by augmenting human intelligence, not replacing it, and refutes claims of AI posing an existential risk.

AI

Revolutionary OCR System: Powering AI Education Datasets

2025-04-05
Revolutionary OCR System: Powering AI Education Datasets

A groundbreaking OCR system optimized for machine learning extracts structured data from complex educational materials like exam papers. Supporting multilingual text, mathematical formulas, tables, diagrams, and charts, it's ideal for creating high-quality training datasets. The system semantically annotates extracted elements and automatically generates natural language descriptions, such as descriptive text for diagrams. Supporting Japanese, Korean, and English with easy customization for additional languages, it outputs AI-ready JSON or Markdown, including human-readable descriptions of mathematical expressions, table summaries, and figure captions. Achieving over 90-95% accuracy on real-world academic datasets, it handles complex layouts with dense scientific content and rich visuals.

AI

OpenAI's o3 Model Achieves Breakthrough on ARC-AGI, But AGI Definition Remains Contested

2025-04-04
OpenAI's o3 Model Achieves Breakthrough on ARC-AGI, But AGI Definition Remains Contested

OpenAI's latest model, o3, achieved a stunning 87% score on François Chollet's ARC-AGI test, reaching human-level performance for the first time and sparking a heated debate about whether AGI (Artificial General Intelligence) has been achieved. However, Chollet quickly released the harder ARC-AGI-2 test, where o3's score plummeted, once again challenging the industry's definition and metrics for AGI. This article explores the differing viewpoints and the complex relationship between AGI's definition and commercial interests, prompting deep reflection on the nature of general artificial intelligence.

AI

LLMs Crack a Byzantine Music Notation Cipher

2025-04-04

Researchers discovered that large language models like Claude and GPT-4 can decode a peculiar cipher based on the Byzantine music notation Unicode block. This cipher resembles a Caesar cipher, but with an offset of 118784. The models can decode this cipher directly without chain-of-thought, achieving even higher success rates than with regular Caesar ciphers. Researchers hypothesize this is due to a linear relationship between addition in a specific Unicode range and addition in token space, allowing the models to learn a shift cipher based on this relationship. This phenomenon suggests the existence of yet-ununderstood mechanisms within LLMs.

AI

Google Unveils Sec-Gemini v1: A New Era in AI-Powered Cybersecurity

2025-04-04
Google Unveils Sec-Gemini v1: A New Era in AI-Powered Cybersecurity

Google has announced Sec-Gemini v1, an experimental AI model designed to push the frontiers of cybersecurity AI. Combining Gemini's advanced capabilities with near real-time cybersecurity knowledge and tooling, Sec-Gemini v1 excels in key workflows such as incident root cause analysis, threat analysis, and vulnerability impact understanding. It outperforms other models on key benchmarks, showing at least an 11% improvement on CTI-MCQ and at least a 10.5% improvement on CTI-Root Cause Mapping. Google is making Sec-Gemini v1 freely available to select organizations, institutions, professionals, and NGOs for research purposes to foster collaboration and advance AI in cybersecurity.

AI

DeepMind's Blueprint for Safe AGI Development: Navigating the Risks of 2030

2025-04-04
DeepMind's Blueprint for Safe AGI Development: Navigating the Risks of 2030

As AI hype reaches fever pitch, the focus shifts to Artificial General Intelligence (AGI). DeepMind's new 108-page paper tackles the crucial question of safe AGI development, projecting a potential arrival by 2030. The paper outlines four key risk categories: misuse, misalignment, mistakes, and structural risks. To mitigate these, DeepMind proposes rigorous testing, robust post-training safety protocols, and even the possibility of 'unlearning' dangerous capabilities—a significant challenge. This proactive approach aims to prevent the severe harm a human-level AI could potentially inflict.

AI
1 2 22 23 24 26 28 29 30 38 39