Category: AI

Boston Dynamics Partners with RAI Institute to Boost Atlas Robot's Reinforcement Learning

2025-02-06
Boston Dynamics Partners with RAI Institute to Boost Atlas Robot's Reinforcement Learning

Boston Dynamics announced a partnership with its own Robotics & AI Institute (RAI Institute) to leverage reinforcement learning and enhance the capabilities of its electric humanoid robot, Atlas. The collaboration aims to accelerate Atlas's learning of new tasks and improve its movement and interaction in real-world environments, such as dynamic running and manipulating heavy objects. This marks a significant advancement in reinforcement learning for robotics and highlights the importance of vertically integrating robot AI, echoing Figure AI's decision to abandon its partnership with OpenAI.

Deconstructing Complex Systems with Mereology: Beyond Simple Causality

2025-02-06

This article presents a novel approach to understanding higher-order structure in complex systems, based on mereology, a branch of set theory. Using the Borromean rings as an example, it illustrates how the whole can be more than the sum of its parts. The author proposes that by constructing a system's mereology and applying the Möbius inversion formula, macroscopic quantities can be decomposed into sums of microscopic contributions, revealing the nature of higher-order interactions. Examples from gene interactions and mutual information in information theory demonstrate the method's application, with promising implications for machine learning and physics.

Four Approaches to Building Reasoning Models for LLMs

2025-02-06
Four Approaches to Building Reasoning Models for LLMs

This article explores four main approaches to enhancing Large Language Models (LLMs) with reasoning capabilities: inference-time scaling, pure reinforcement learning, supervised fine-tuning plus reinforcement learning, and model distillation. The development of DeepSeek R1 is used as a case study, showcasing how these methods can build powerful reasoning models, and how even budget-constrained researchers can achieve impressive results through distillation. The article also compares DeepSeek R1 to OpenAI's o1 and discusses strategies for building cost-effective reasoning models.

AI Agent Learns to Use Computers Like a Human

2025-02-06
AI Agent Learns to Use Computers Like a Human

The r1-computer-use project aims to train an AI agent to interact with a computer like a human, encompassing file systems, web browsers, and command lines. Inspired by DeepSeek-R1's reinforcement learning techniques, it eschews traditional hard-coded verifiers in favor of a neural reward model to evaluate the correctness and helpfulness of the agent's actions. The training pipeline involves multiple stages, from expert demonstrations to reward-model-guided policy optimization and fine-tuning, ultimately aiming for a safe and reliable AI agent capable of complex tasks.

Sub-$50 AI Reasoning Model Rivals Cutting-Edge Competitors

2025-02-06
Sub-$50 AI Reasoning Model Rivals Cutting-Edge Competitors

Researchers at Stanford and the University of Washington trained an AI reasoning model, s1, for under $50 using cloud compute. s1's performance matches state-of-the-art models like OpenAI's o1 and DeepSeek's R1 on math and coding tasks. The team leveraged knowledge distillation, using Google's Gemini 2.0 Flash Thinking Experimental as a teacher model and a dataset of 1,000 carefully curated questions. This low-cost replication raises questions about the commoditization of AI and has reportedly upset large AI labs.

The 1890s Kinetoscope: A Precursor to AI's Loneliness?

2025-02-05
The 1890s Kinetoscope: A Precursor to AI's Loneliness?

This article draws parallels between the single-user Kinetoscope of the 1890s and today's AI technology, particularly large language models. The article argues that both technologies, while offering mass-produced content, create a simultaneously interconnected yet atomized experience, resulting in a new kind of technological loneliness. The author explores the historical context of Edison's invention and its surprisingly prescient design choice, highlighting the uncanny resemblance to our current reliance on personalized algorithmic feeds and AI companions. It prompts reflection on the direction of technological progress and its impact on individual experience.

Herculaneum Papyrus 5: A Breakthrough in Ink Detection

2025-02-05
Herculaneum Papyrus 5: A Breakthrough in Ink Detection

Significant progress has been made in ink detection and segmentation of P.Herc. 172 from the Bodleian Libraries at Oxford (Scroll 5). The scroll exhibits unusually visible ink, greatly aiding ink detection model training. While segmentation requires further refinement, preliminary analysis suggests authorship by Philodemus, with words like 'disgust', 'fear', and 'life' identified, along with symbols indicating a finished work. Scroll 5's unique characteristics offer potential as a 'Rosetta Stone' for ink detection in other scrolls. The team has released extensive segmentation data to facilitate research.

Gemini 2.0 Family Gets a Major Update: Enhanced Performance and Multimodal Capabilities

2025-02-05
Gemini 2.0 Family Gets a Major Update: Enhanced Performance and Multimodal Capabilities

Google has significantly updated its Gemini 2.0 family of models! The 2.0 Flash model is now generally available via API, enabling developers to build production applications. An experimental version of 2.0 Pro, boasting superior coding performance and complex prompt handling with a 2 million token context window, has also been released. A cost-effective 2.0 Flash-Lite model is now in public preview. All models currently feature multimodal input with text output, with more modalities coming in the following months. This update significantly boosts performance and expands applicability, marking a major step forward for Gemini in the AI landscape.

AI

The Netflix Prize: A Milestone and a Bitter Lesson in Machine Learning

2025-02-05
The Netflix Prize: A Milestone and a Bitter Lesson in Machine Learning

In 2006, Netflix launched a million-dollar competition to improve its recommendation system. This competition attracted thousands of teams and significantly advanced the field of machine learning. Results showed that simple algorithms could surprisingly perform well, larger models yielded better scores, and overfitting wasn't always a concern. However, the competition also left a bitter lesson: data privacy concerns led Netflix to cancel future competitions, limiting open research on recommendation system algorithms, and tech companies' control over data reached an unprecedented level.

AI

$6 AI Model Shakes Up the LLM Landscape: Introducing S1

2025-02-05
$6 AI Model Shakes Up the LLM Landscape: Introducing S1

A new paper unveils S1, an AI model trained for a mere $6, achieving near state-of-the-art performance while running on a standard laptop. The secret lies in its ingenious 'inference time scaling' method: by inserting 'Wait' commands during the LLM's thinking process, it controls thinking time and optimizes performance. This echoes the Entropix technique, both manipulating internal model states for improvement. S1's extreme data frugality, using only 1000 carefully selected examples, yields surprisingly good results, opening up new avenues for AI research and sparking discussion on model distillation and intellectual property. S1's low cost and high efficiency signal a faster pace of AI development.

Toma: Building an AI Workforce for the $1.5T Automotive Industry

2025-02-05
Toma: Building an AI Workforce for the $1.5T Automotive Industry

Toma is building an end-to-end AI workforce for the $1.5 trillion automotive industry. Their largest customers spend over $1.5 billion annually on processes readily automatable with AI, including customer service, repair order management, warranty processing, and sales. Toma's team boasts a track record of building and selling successful AI applications, a best-in-class voice AI product, and deep, first-hand experience from working directly with and studying automotive dealerships. They operate with a team-oriented, accountable approach, emphasizing data-driven decisions and providing significant autonomy. Located in San Francisco's Dogpatch neighborhood, Toma offers a fast-paced, no-BS environment where exceptional people can make a substantial impact. They work in-office five days a week.

AI

Google Deletes AI Pledge Against Weapons and Surveillance

2025-02-04
Google Deletes AI Pledge Against Weapons and Surveillance

Google quietly removed a pledge from its website this week promising not to develop AI for weapons or surveillance. The change, first reported by Bloomberg, sparked controversy. While Google now emphasizes responsible AI development aligned with international law and human rights, its contracts with the US and Israeli militaries, coupled with Pentagon claims that Google's AI is accelerating the military's 'kill chain,' raise concerns about the gap between its stated principles and actions. Internal employee protests and public scrutiny highlight the ethical dilemmas surrounding AI development and deployment.

The Alchemy of Efficient LLM Training: Beyond Compute Limits

2025-02-04

This article delves into the efficient training of large language models (LLMs) at massive scale. The author argues that even with tens of thousands of accelerators, relatively simple principles can significantly improve model performance. Topics covered include model performance assessment, choosing parallelism schemes at different scales, estimating the cost and time of training large Transformer models, and designing algorithms that leverage specific hardware advantages. Through in-depth explanations of TPU and GPU architectures, and a detailed analysis of the Transformer architecture, readers will gain a better understanding of scaling bottlenecks and design more efficient models and algorithms.

OmniHuman-1: Rethinking the Scaling-Up of One-Stage Conditioned Human Animation Models

2025-02-04

OmniHuman-1 is an AI model capable of generating realistic human videos. It supports various visual and audio styles, generating videos at any aspect ratio and body proportion (portrait, half-body, full-body). Realism stems from comprehensive motion, lighting, and texture detail. The model handles diverse inputs, including singing, talking, and various poses, even from cartoons or challenging viewpoints. It leverages audio and video driving signals for precise control. Ethical considerations are addressed, with images and audio sourced from public domains or generated models.

Radiant Foam: Real-time Differentiable Ray Tracing Breaks New Ground

2025-02-04

Researchers introduce Radiant Foam, a novel scene representation combining the efficiency of volumetric mesh ray tracing with the reconstruction quality of splatting methods. Leveraging Voronoi diagrams and Delaunay triangulation, Radiant Foam achieves real-time ray tracing surpassing even hardware-accelerated Gaussian ray tracing in speed while nearly matching Gaussian splatting's reconstruction quality. Avoiding complex acceleration structures and special hardware/APIs, it only requires a standard programmable GPU. This breakthrough promises to advance real-time rendering significantly.

OpenAI's $3B SoftBank Deal and Potential Open-Sourcing of Models

2025-02-04
OpenAI's $3B SoftBank Deal and Potential Open-Sourcing of Models

OpenAI announced a joint venture with Japan's SoftBank on Monday, involving a $3 billion annual investment from SoftBank to utilize OpenAI's software. This strategic shift follows the surprising emergence of DeepSeek, a Chinese AI firm whose advanced model requires significantly less computing power than OpenAI's ChatGPT, challenging conventional wisdom on AI's resource needs. OpenAI CEO Sam Altman hinted at potentially open-sourcing their models, a move he suggested on Reddit was a correction of OpenAI's past mistake of keeping its source code private.

Bonobos Show They Understand Ignorance: A Breakthrough in Theory of Mind Research

2025-02-04
Bonobos Show They Understand Ignorance: A Breakthrough in Theory of Mind Research

A new study demonstrates that bonobos possess theory of mind, understanding others' lack of knowledge and acting accordingly. Researchers designed an experiment where bonobos helped an experimenter find hidden treats. Results showed bonobos pointed faster and more often when they realized the experimenter didn't know the treat's location. This indicates bonobos track and respond to differing perspectives, suggesting theory of mind might be more evolutionarily ancient than previously thought, and potentially present in our common ancestor.

Auto-AVSR: Open-Source Lip-Reading Speech Recognition Framework Achieves SOTA

2025-02-03
Auto-AVSR: Open-Source Lip-Reading Speech Recognition Framework Achieves SOTA

Auto-AVSR is an open-source, end-to-end audio-visual speech recognition (AV-ASR) framework focusing on visual speech (lip-reading). Achieving a word error rate (WER) of 20.3% for visual speech recognition (VSR) and 1.0% for audio speech recognition (ASR) on the LRS3 benchmark, it provides code and tutorials for training, evaluation, and API usage, supporting multi-node training. Users can leverage pre-trained models or train from scratch, customizing hyperparameters as needed.

OpenEuroLLM: Europe's Push for Open-Source Multilingual LLMs

2025-02-03

A consortium of 20 leading European research institutions and companies has launched OpenEuroLLM, a project to build a family of high-performance, multilingual large language models (LLMs). The initiative aims to boost Europe's AI competitiveness by democratizing access to high-quality AI technology through open-source principles. This will empower European companies and public organizations to develop impactful products and services. OpenEuroLLM operates within Europe's regulatory framework and collaborates with open-source communities to ensure complete openness of models, software, data, and evaluation, catering to diverse industry and public sector needs while preserving linguistic and cultural diversity.

AI

Lost IBM Training Doc: Computers Can't Be Held Accountable (1979)

2025-02-03
Lost IBM Training Doc: Computers Can't Be Held Accountable (1979)

A legendary page from a 1979 internal IBM training resurfaced online, stating 'A computer can never be held accountable; therefore a computer must never make a management decision.' The original source is lost, reportedly destroyed in a flood. This statement resonates powerfully in our AI-driven age, prompting reflection on AI responsibility and decision-making.

Anthropic's Constitutional Classifiers: A New Defense Against AI Jailbreaks

2025-02-03
Anthropic's Constitutional Classifiers: A New Defense Against AI Jailbreaks

Anthropic's Safeguards Research Team unveils Constitutional Classifiers, a novel defense against AI jailbreaks. This system, trained on synthetic data, effectively filters harmful outputs while minimizing false positives. A prototype withstood thousands of hours of human red teaming, significantly reducing jailbreak success rates, though initially suffering from high refusal rates and computational overhead. An updated version maintains robustness with only a minor increase in refusal rate and moderate compute cost. A temporary live demo invites security experts to test its resilience, paving the way for safer deployment of increasingly powerful AI models.

Klarity: Uncovering Uncertainty in Generative Models

2025-02-03
Klarity: Uncovering Uncertainty in Generative Models

Klarity is a tool for analyzing uncertainty in generative model outputs. It combines raw probability analysis and semantic understanding to provide deep insights into model behavior during text generation. The library offers dual entropy analysis, semantic clustering, and structured JSON output, along with AI-powered analysis for human-readable insights. Currently supporting Hugging Face Transformers, with plans for broader framework and model support.

Perceptually-Aligned Dynamic Facial Projection Mapping: High-Speed Tracking & Co-axial Setup

2025-02-03
Perceptually-Aligned Dynamic Facial Projection Mapping: High-Speed Tracking & Co-axial Setup

Researchers developed a novel high-speed dynamic facial projection mapping (DFPM) system that significantly reduces misalignment artifacts. This is achieved through a high-speed face-tracking method using a cropped-area-limited interpolation/extrapolation-based face detection and a fast Ensemble of Regression Trees (ERT) for landmark detection (0.107ms). A lens-shift co-axial projector-camera setup maintains high optical alignment with minimal error (1.274 pixels between 1m and 2m). This system achieves near-perfect alignment, improving immersive experiences in makeup and entertainment.

Bayesian Epistemology 101: Credences, Evidence, and Rationality

2025-02-03

This tutorial introduces Bayesian epistemology, focusing on its core norms: probabilism and the principle of conditionalization. Using Eddington's solar eclipse observation as a case study, it illustrates how Bayesian methods update belief in hypotheses. The tutorial then explores disagreements within Bayesianism regarding prior probabilities, coherence, and the scope of conditionalization, presenting foundational arguments like Dutch book arguments, accuracy-dominance arguments, and arguments from comparative probability. Finally, it addresses the idealization problem and the application of Bayesian methods in science.

Real Thinking vs. Fake Thinking: Staying Awake in the Age of AI

2025-02-03
Real Thinking vs. Fake Thinking: Staying Awake in the Age of AI

This essay explores the difference between 'real thinking' and 'fake thinking.' The author argues that 'real thinking' isn't simply thinking about concrete things, but a deeper, more insightful way of thinking that focuses on truly understanding the world, rather than remaining trapped in abstract concepts or pre-existing frameworks. Using examples like AI risk, philosophy, and competitive debate, the essay outlines several dimensions of 'real thinking' and suggests methods for cultivating this ability, such as slowing down, following curiosity, and paying attention to the motivations behind thinking. The author calls for staying awake in the age of AI, avoiding the traps of 'fake thinking,' and truly understanding and responding to the changes ahead.

TopoNets: High-Performing Vision and Language Models Mimicking Brain Topography

2025-02-03
TopoNets: High-Performing Vision and Language Models Mimicking Brain Topography

Researchers introduce TopoLoss, a novel method for incorporating brain-like topography into leading AI architectures (convolutional networks and transformers) with minimal performance loss. The resulting TopoNets achieve state-of-the-art performance among supervised topographic neural networks. TopoLoss is easy to implement, and experiments show TopoNets maintain high performance while exhibiting brain-like spatial organization. Furthermore, TopoNets yield sparse, parameter-efficient language models and demonstrate brain-mimicking region selectivity in image recognition and temporal integration windows in language models, mirroring patterns observed in the visual cortex and language processing areas of the brain.

AI

OpenAI's 'Strawberry' Project: Aiming for Deep Reasoning in AI

2025-02-03
OpenAI's 'Strawberry' Project: Aiming for Deep Reasoning in AI

OpenAI is secretly developing a project codenamed "Strawberry," aiming to overcome limitations in current AI models' reasoning abilities. The project seeks to enable AI to autonomously plan and conduct in-depth research on the internet, rather than simply answering queries. Internal documents reveal that the "Strawberry" model will use a specialized post-training method, combined with self-learning and planning capabilities, to reliably solve complex problems. This is considered a significant breakthrough, potentially revolutionizing AI's role in scientific discovery and software development, while also raising ethical concerns about future AI capabilities.

Chinese AI Chatbot DeepSeek Censors Tank Man Photo, Shakes Up US Markets

2025-02-02
Chinese AI Chatbot DeepSeek Censors Tank Man Photo, Shakes Up US Markets

The Chinese AI chatbot DeepSeek has sparked controversy by refusing to answer questions about the iconic 1989 Tiananmen Square "Tank Man" photo. The chatbot abruptly cuts off discussions about the image and other sensitive topics related to China, while providing detailed responses about world leaders like the UK's Prime Minister. Simultaneously, DeepSeek's powerful image generation capabilities (Janus-Pro-7B) and surprisingly low development cost (reportedly just $6 million) have sent shockwaves through US markets, causing a record 17% drop in Nvidia stock and prompting concern from US tech giants and politicians.

Sci-Fi Author Ted Chiang on AI and the Future of Tech

2025-02-02
Sci-Fi Author Ted Chiang on AI and the Future of Tech

This interview with science fiction master Ted Chiang explores his creative inspiration, his critical perspective on AI, and his concerns about the future direction of technology. Chiang argues that current AI, especially large language models, are more like low-resolution images of the internet, lacking reliability and true understanding. He emphasizes the relationship between humans and tools, and the human tendency to see ourselves in our tools. The interview also touches on the nature of language, the role of AI in artistic creation, and ethical considerations in technological development. Chiang's optimism about technology is cautious; he believes we need to be mindful of potential negative impacts and work to mitigate their harm.

AI
1 2 33 34 35 36 37 38 40