Category: AI

Biomni: A Game-Changing Biomedical AI Agent

2025-07-10
Biomni: A Game-Changing Biomedical AI Agent

Biomni is a game-changing general-purpose biomedical AI agent capable of autonomously conducting a wide array of research tasks across various biomedical subfields. By integrating cutting-edge LLMs, retrieval-augmented planning, and code-based execution, Biomni significantly boosts research productivity and facilitates the generation of testable hypotheses. The open-source project actively solicits community contributions—new tools, datasets, software, benchmarks, and tutorials—to build Biomni-E2, a next-generation environment. Significant contributors will be recognized with co-authorship on publications in top-tier journals or conferences.

rtrvr.ai v12.5: On-the-Fly Tool Generation Redefines AI Agent Tool Integration

2025-07-09
rtrvr.ai v12.5: On-the-Fly Tool Generation Redefines AI Agent Tool Integration

rtrvr.ai v12.5 introduces 'On-the-Fly Tool Generation' (ToolGen), revolutionizing AI agent tool integration. Previously, agents relied on pre-configured tool lists like MCP protocols, making configuration cumbersome and inflexible. ToolGen allows agents to directly extract information from the browser (e.g., API keys) and generate the necessary tools on demand. For example, it can grab an access token from a HubSpot developer page and generate a tool to upload contacts. This significantly improves efficiency and flexibility, eliminating the need for manual configuration of complex tool lists. To celebrate this breakthrough, rtrvr.ai is offering a generous credit update with free BYOK (Bring Your Own Key), referral bonuses, and free credits for all users.

From AI Agents to AI Agencies: A Paradigm Shift in Task Execution

2025-07-09
From AI Agents to AI Agencies: A Paradigm Shift in Task Execution

Two years ago, the transformative potential of AI Agents – autonomous systems capable of breaking down and executing complex tasks – was highlighted. Now, AI Agents autonomously code websites, manage digital workflows, and execute multi-step processes. However, a new architectural pattern, termed 'AI Agencies', is emerging, representing a fundamental leap beyond current AI Agents. Unlike multiple AI Agents collaborating, an AI Agency is a unified system dynamically orchestrating diverse intelligence types to handle different parts of a single task. For example, a high-capability reasoning model plans the task, a fast, efficient model generates boilerplate code, and a debugging-focused model ensures functionality. This shifts AI task execution from monolithic to orchestrated intelligence, improving efficiency, cost-effectiveness, and quality.

The $100B AGI Definition Mess: Microsoft and OpenAI's Rift

2025-07-09
The $100B AGI Definition Mess: Microsoft and OpenAI's Rift

Microsoft and OpenAI are locked in a bitter dispute over the definition of AGI (Artificial General Intelligence), casting a shadow over their $13 billion contract. Some define AGI as an AI system generating $100 billion in profit, a purely arbitrary economic benchmark. The lack of a consensus definition hinders AI development, regulation, and discourse. The author suggests AGI should possess broad generalization capabilities, handling diverse tasks across domains, but the 'human-level' benchmark itself is problematic. This definitional clash highlights the conceptual ambiguity plaguing the AI field.

AI

AI Uncovers Irrationality in Human Decision-Making During Complex Games

2025-07-09
AI Uncovers Irrationality in Human Decision-Making During Complex Games

Researchers from Princeton University and Boston University used machine learning to predict human strategic decisions in various games. A deep neural network trained on human decisions accurately predicted players' choices. A hybrid model, combining a classical behavioral model with a neural network, outperformed the neural network alone, particularly in capturing the impact of game complexity. The study reveals that people act more predictably in simpler games but less rationally in complex ones. This research offers new insights into human decision-making processes and lays the groundwork for behavioral science interventions aimed at promoting more rational choices.

SmolLM3: A Tiny, Multilingual, Long-Context Reasoner

2025-07-09
SmolLM3: A Tiny, Multilingual, Long-Context Reasoner

SmolLM3 is a fully open-source 3B parameter multilingual language model that strikes a compelling balance between efficiency and performance. Outperforming Llama-3.2-3B and Qwen2.5-3B on various benchmarks, it even competes with larger 4B parameter models. Supporting 6 languages and boasting a context length of up to 128k tokens, SmolLM3 features a unique dual-mode reasoning capability (think/no_think). Beyond the model itself, the researchers are releasing the complete engineering blueprint, including architecture details, data mixtures, and training methodology—a valuable resource for anyone building or studying models at this scale.

ChatGPT's New "Study Together" Mode: AI Tutor or Cheating Enabler?

2025-07-08
ChatGPT's New

Some ChatGPT Plus subscribers are reporting a new feature called "Study Together." Instead of directly answering prompts, this mode reportedly asks questions, prompting users to engage actively, much like an AI tutor. Speculation abounds about whether it will evolve into a multi-user study group feature and how effective it will be in deterring academic dishonesty. OpenAI hasn't commented, and ChatGPT itself remains vague about the feature's wider rollout. This new mode highlights ChatGPT's dual role in education: it can aid learning but also facilitate cheating; "Study Together" may be OpenAI's attempt to steer usage towards positive applications.

AI-Powered Generative Models Reshape Anamorphic Images

2025-07-08

Traditional anamorphic images only reveal their true form from a specific viewpoint. This paper uses latent rectified flow models and a novel image warping technique called Laplacian Pyramid Warping to create anamorphic images that retain a valid interpretation even when viewed directly. This work extends Visual Anagrams to latent space models and a wider range of spatial transforms, enabling the creation of novel generative perceptual illusions, opening new possibilities in image generation.

Prototyping Indoor Maps with VLMs: From Photos to Positions

2025-07-07

Over a weekend, the author prototyped an indoor localization system using a single photo and cutting-edge Vision-Language Models (VLMs). By annotating a mall map, identifying visible shops in the photo, and leveraging the VLM's image recognition capabilities, the system successfully matched the photo's location to the map. While some ambiguity remains, the results are surprisingly accurate, showcasing the potential of VLMs for indoor localization. This opens exciting avenues for future AR applications and robotics, while also highlighting potential environmental concerns.

The Exploration Bottleneck in LLMs: The Next Frontier of Experience Collection

2025-07-07

The success of large language models (LLMs) relies on massive pre-training on vast text data, a resource that will eventually be depleted. The future of AI will shift towards an "Era of Experience," where efficient collection of the right kind of experience beneficial to learning will be crucial, rather than simply stacking parameters. This article explores how pre-training implicitly solves part of the exploration problem and how better exploration leads to better generalization. The author proposes that exploration consists of two axes: "world sampling" (choosing learning environments) and "path sampling" (gathering data within environments). Future AI scaling should optimize the information density on these two axes, efficiently allocating computational resources instead of simply pursuing parameter scale or data volume.

AI

My Pocket Data Revealed My Secrets

2025-07-07
My Pocket Data Revealed My Secrets

Before Pocket's shutdown, the author exported nearly 900 saved articles spanning seven years and used the AI tool o3 to analyze them. Surprisingly, o3 accurately inferred the author's age, gender, location, profession, income, family status, and even political leanings, risk tolerance, and learning style. This prompted reflections on data privacy and AI capabilities, inspiring the creation of a personalized content recommendation system.

AI

Anthropic's Claude: Fair Use vs. Piracy in AI Training

2025-07-07
Anthropic's Claude: Fair Use vs. Piracy in AI Training

Anthropic, in training its AI chatbot Claude, "destructively scanned" millions of copyrighted books and downloaded millions of pirated ones. A judge ruled that using purchased books for training constituted fair use, but using pirated books was copyright infringement. This case, a landmark ruling on AI training data, highlights the ongoing debate about the ethical sourcing of training data for large language models.

AI

AGI Timelines: 2028 for Tax AI? 2032 for On-the-Job Learning?

2025-07-07
AGI Timelines: 2028 for Tax AI? 2032 for On-the-Job Learning?

Podcast host Dwarkesh discusses AGI timelines. He argues that while current LLMs are impressive, their lack of continuous learning severely limits their real-world applications. He uses the analogy of learning saxophone to illustrate how LLMs learn differently than humans, unable to accumulate experience and improve skills like humans do. This leads him to be cautious about AGI breakthroughs in the next few years but optimistic about the potential in the coming decades. He predicts 2028 for AI handling taxes as efficiently as a human manager (including chasing down receipts and invoices) and 2032 for AI capable of on-the-job learning as seamlessly as a human. He believes that once continuous learning is solved, AGI will lead to a massive leap, potentially resulting in something akin to an intelligence explosion.

Apple's AI Safety Model Decrypted: Unveiling its Content Filtering Mechanisms

2025-07-07
Apple's AI Safety Model Decrypted: Unveiling its Content Filtering Mechanisms

This project decrypts Apple's AI safety model filter files, which contain rules for various models. Using LLDB debugging and custom scripts, the encryption key can be obtained and these files decrypted. The decrypted JSON files contain rules for filtering harmful content and ensuring safety compliance, such as exact keyword matching, phrases to remove, and regular expression filtering. The project provides the decrypted rule files and decryption scripts, allowing researchers to analyze Apple's AI model safety mechanisms.

Huawei's Pangu LLM: Whistleblower Exposes Plagiarism Scandal

2025-07-06
Huawei's Pangu LLM: Whistleblower Exposes Plagiarism Scandal

A Huawei Noah's Ark Lab employee working on the Pangu large language model has come forward with a shocking exposé of plagiarism within the company. The whistleblower alleges that Wang Yunhe's small model lab repeatedly 're-skinned' models from other companies (like Qwen), presenting them as Huawei's own Pangu models to gain recognition and rewards. The account details intense internal pressure, unfair treatment, and significant talent drain, raising serious questions about Huawei's LLM development management.

Apple's Stealth AI Code Generator: DiffuCode Leaps Forward

2025-07-06
Apple's Stealth AI Code Generator: DiffuCode Leaps Forward

Apple quietly dropped DiffuCode-7B-cpGRPO, a novel AI code generation model on Hugging Face. Unlike traditional autoregressive LLMs, DiffuCode uses a diffusion model architecture, enabling parallel processing of multiple code chunks for significantly faster generation. Built upon Alibaba's open-source Qwen2.5-7B and enhanced with coupled-GRPO training, it achieves high-quality code generation. While not yet reaching GPT-4 or Gemini Diffusion levels, DiffuCode shows promising performance on coding benchmarks, showcasing Apple's innovative approach to generative AI.

AI

Fine-tuning GPT-2 for Positive Sentiment Generation using RLHF

2025-07-06
Fine-tuning GPT-2 for Positive Sentiment Generation using RLHF

This project provides a reference implementation for fine-tuning a pretrained GPT-2 model to generate sentences expressing positive sentiment using Reinforcement Learning from Human Feedback (RLHF). The process involves three steps: 1. Supervised Fine-Tuning (SFT): Fine-tuning GPT-2 on the stanfordnlp/sst2 dataset; 2. Reward Model Training: Training a GPT-2 model with a reward head to predict sentiment; 3. Reinforcement Learning via Proximal Policy Optimization (PPO): Optimizing the SFT model to generate sentences that the reward model evaluates positively. These three steps are implemented in three Jupyter Notebooks, allowing for a step-by-step approach. A Hugging Face access token is required to download the pretrained GPT-2 model.

Generative AI Shakes Up Computer Science Education

2025-07-06
Generative AI Shakes Up Computer Science Education

The rise of generative AI is forcing a rethink of computer science education. Tools like ChatGPT can now perform some coding tasks, challenging universities to adapt their curricula. Some are de-emphasizing programming languages in favor of computational thinking and AI literacy, focusing on critical thinking and communication skills. The tech job market is tightening, with fewer entry-level positions available due to AI automation. The future of computer science education may involve a greater emphasis on computational thinking, AI literacy, and interdisciplinary approaches to meet the demands of the AI era.

AI

Bytebot: A Revolutionary Approach to Giving AI Agents 'Hands'

2025-07-06
Bytebot: A Revolutionary Approach to Giving AI Agents 'Hands'

Bytebot eschews traditional API integration, instead giving AI agents control of a keyboard, mouse, and screen, allowing them to operate like remote human workers. This approach is simpler, more robust, generalizable, and future-proof, solving the problems faced by current AI agents when dealing with complex, API-less software and workflows. This 'human-computer interaction' approach allows Bytebot to adapt to any application and OS without complex integration, saving companies significant time and cost and automatically improving efficiency as models improve.

AI

Beyond Chained LLM Calls: Differentiable Routing for Efficient LLMs

2025-07-06
Beyond Chained LLM Calls: Differentiable Routing for Efficient LLMs

Modern large language model (LLM) agent architectures heavily rely on chaining LLM calls, resulting in high costs, latency, and poor scalability. This paper introduces a differentiable router that models tool selection as a trainable function, instead of relying on LLMs. This approach learns tool selection from data via reinforcement learning or supervised fine-tuning, running outside the LLM. It avoids external API calls, improves determinism and composability, and reduces costs. Experiments show that this method significantly reduces costs, improves performance, and clarifies model behavior, marking a step towards LLM systems that look less like prompt chains and more like programs.

Can Large Neural Networks Solve Robotics? Insights from CoRL 2023

2025-07-05

At CoRL 2023, a central debate emerged: can training large neural networks on massive datasets solve robotics? Proponents argued that the success of large models in computer vision and NLP suggests this approach is promising, citing initial results from Google DeepMind's RT-X and RT-2 as examples. They believe the ongoing advancements in data and compute power fuel this direction. However, critics pointed out the current scarcity of robotics data, the immense variability across robot embodiments and environments, and the prohibitive cost of collecting large-scale datasets. Furthermore, even achieving high accuracy might not translate to the 99.X% reliability needed for practical deployment. Some suggested combining classical control methods with learning, while others called for entirely new approaches. Ultimately, CoRL 2023 highlighted the opportunities and challenges in robotics, offering valuable insights for future research.

LLM Capabilities Doubling Every Seven Months: A 2030 Prediction

2025-07-05
LLM Capabilities Doubling Every Seven Months: A 2030 Prediction

New research reveals a startling rate of progress in large language models (LLMs). Their ability to complete complex tasks is doubling roughly every seven months, according to a metric called "task-completion time horizon." This metric compares the time an LLM takes to complete a task to the time a human would take. The study projects that by 2030, the most advanced LLMs could complete, with 50% reliability, a software task equivalent to a month's worth of human work (40 hours/week). This raises significant concerns and excitement about the potential benefits and risks of LLMs, while acknowledging that hardware and robotics could potentially limit the pace of progress.

AI

The Seven Deadly Sins of the AI Industry: False Promises of AGI and the Perils of Attention-Hacking

2025-07-05
The Seven Deadly Sins of the AI Industry: False Promises of AGI and the Perils of Attention-Hacking

This article critically examines the current state of the AI industry, highlighting seven key problems: exaggerating the proximity of AGI, prioritizing engagement over utility, persistent and unresolved hallucinations in LLMs, oscillating between fear-mongering and utopianism regarding AI risks, a lack of a credible path to profitability, quasi-monopolistic tendencies in the AI field, and the overhype of AI agents. The author argues that these issues stem from the industry's pursuit of short-term gains, lack of self-reflection, and a disregard for real-world accountability, ultimately leading to a potential misdirection of AI development and negative societal consequences.

AI

German Firm TNG Unveils DeepSeek-TNG R1T2 Chimera: A Faster, More Efficient Open-Source LLM

2025-07-05
German Firm TNG Unveils DeepSeek-TNG R1T2 Chimera: A Faster, More Efficient Open-Source LLM

TNG Technology Consulting GmbH, a German firm, has released DeepSeek-TNG R1T2 Chimera, a new large language model (LLM) built upon the open-source DeepSeek-R1-0528. Utilizing their innovative Assembly-of-Experts (AoE) method, R1T2 boasts significant improvements in speed and efficiency, achieving over 200% faster inference than R1-0528 while retaining over 90% of its reasoning capabilities. The model's concise outputs translate to lower compute costs. Released under the permissive MIT license and available on Hugging Face, R1T2 offers a cost-effective and efficient AI solution for enterprises and researchers.

AI

N-Back Training: A Secret Weapon for Boosting Fluid Intelligence?

2025-07-05

Decades of cognitive neuroscience research support the effectiveness of the N-Back test. Jaeggi et al. (2008) published groundbreaking research in PNAS showing that dual N-Back training significantly improves fluid intelligence, with 19 days of training leading to improved intelligence test scores. A large-scale study by Owen et al. (2010) with over 11,000 participants confirmed that working memory training leads to task-specific improvements and some transfer to related cognitive abilities. Klingberg (2010) demonstrated that working memory training, including N-Back exercises, produces measurable changes in brain activity and can be particularly beneficial for individuals with ADHD.

Rent-a-Brain: The World's First Commercial Hybrid of Silicon and Human Brain Cells

2025-07-04
Rent-a-Brain: The World's First Commercial Hybrid of Silicon and Human Brain Cells

Cortical Labs, an Australian biotech startup, in collaboration with UK company bit.bio, has launched CL1, the world's first commercially available hybrid computer combining silicon circuitry and human brain cells. This groundbreaking system, built from 800,000 neurons grown on a silicon chip, boasts incredibly low energy consumption, significantly outperforming comparable AI in terms of efficiency. CL1 demonstrated superior performance in game-playing tests compared to machine learning algorithms and offers potential applications in drug testing. Units are available for $35,000, or remote access can be rented for $300 per week.

AI

Google AI Product Usage Survey Embedded Multiple Times

2025-07-04
Google AI Product Usage Survey Embedded Multiple Times

A blog post contains multiple embedded instances of the same Google AI product usage survey. The survey aims to understand how frequently users utilize Google AI tools like Gemini and NotebookLM, and also gathers feedback on article improvements. The survey includes a question about usage frequency (daily, weekly, monthly, hardly ever, unsure) and an open-ended question asking for suggestions on improving the article (make it more concise, add more detail, make it easier to understand, include more images or videos, it's fine as is).

Context Engineering Strategies for Large Language Model Agents

2025-07-04

As large language model (LLM) agents gain traction, context engineering emerges as a crucial aspect of building efficient agents. This post summarizes four key context engineering strategies: writing (saving context outside the context window, such as using scratchpads or memories), selecting (choosing relevant context from external storage), compressing (summarizing or trimming context), and isolating (splitting context across multiple agents or environments). These strategies aim to address the limitations of LLM context windows, improve agent performance, and reduce costs. The post uses examples from companies like Anthropic and Cognition to detail the specific methods and challenges of each strategy, including memory selection, context summarization, and multi-agent coordination.

AI

Edge AI Inference: A Deep Dive from Software to Hardware Acceleration

2025-07-04
Edge AI Inference: A Deep Dive from Software to Hardware Acceleration

This article delves into the challenges and opportunities of running AI inference on resource-constrained microcontrollers. Starting with the mechanics of TensorFlow Lite Micro, the author analyzes the software implementation and hardware acceleration schemes based on ARM architecture extensions for the addition operator. The article also covers utilizing Arm's Ethos-U NPU for model acceleration. It reveals how different hardware architectures impact AI inference performance and how software and hardware optimizations can be combined to improve efficiency.

1 2 3 4 5 6 8 10 11 12 38 39