Category: AI

The Alchemy of Efficient LLM Training: Beyond Compute Limits

2025-02-04

This article delves into the efficient training of large language models (LLMs) at massive scale. The author argues that even with tens of thousands of accelerators, relatively simple principles can significantly improve model performance. Topics covered include model performance assessment, choosing parallelism schemes at different scales, estimating the cost and time of training large Transformer models, and designing algorithms that leverage specific hardware advantages. Through in-depth explanations of TPU and GPU architectures, and a detailed analysis of the Transformer architecture, readers will gain a better understanding of scaling bottlenecks and design more efficient models and algorithms.

OmniHuman-1: Rethinking the Scaling-Up of One-Stage Conditioned Human Animation Models

2025-02-04

OmniHuman-1 is an AI model capable of generating realistic human videos. It supports various visual and audio styles, generating videos at any aspect ratio and body proportion (portrait, half-body, full-body). Realism stems from comprehensive motion, lighting, and texture detail. The model handles diverse inputs, including singing, talking, and various poses, even from cartoons or challenging viewpoints. It leverages audio and video driving signals for precise control. Ethical considerations are addressed, with images and audio sourced from public domains or generated models.

Radiant Foam: Real-time Differentiable Ray Tracing Breaks New Ground

2025-02-04

Researchers introduce Radiant Foam, a novel scene representation combining the efficiency of volumetric mesh ray tracing with the reconstruction quality of splatting methods. Leveraging Voronoi diagrams and Delaunay triangulation, Radiant Foam achieves real-time ray tracing surpassing even hardware-accelerated Gaussian ray tracing in speed while nearly matching Gaussian splatting's reconstruction quality. Avoiding complex acceleration structures and special hardware/APIs, it only requires a standard programmable GPU. This breakthrough promises to advance real-time rendering significantly.

OpenAI's $3B SoftBank Deal and Potential Open-Sourcing of Models

2025-02-04
OpenAI's $3B SoftBank Deal and Potential Open-Sourcing of Models

OpenAI announced a joint venture with Japan's SoftBank on Monday, involving a $3 billion annual investment from SoftBank to utilize OpenAI's software. This strategic shift follows the surprising emergence of DeepSeek, a Chinese AI firm whose advanced model requires significantly less computing power than OpenAI's ChatGPT, challenging conventional wisdom on AI's resource needs. OpenAI CEO Sam Altman hinted at potentially open-sourcing their models, a move he suggested on Reddit was a correction of OpenAI's past mistake of keeping its source code private.

Bonobos Show They Understand Ignorance: A Breakthrough in Theory of Mind Research

2025-02-04
Bonobos Show They Understand Ignorance: A Breakthrough in Theory of Mind Research

A new study demonstrates that bonobos possess theory of mind, understanding others' lack of knowledge and acting accordingly. Researchers designed an experiment where bonobos helped an experimenter find hidden treats. Results showed bonobos pointed faster and more often when they realized the experimenter didn't know the treat's location. This indicates bonobos track and respond to differing perspectives, suggesting theory of mind might be more evolutionarily ancient than previously thought, and potentially present in our common ancestor.

Auto-AVSR: Open-Source Lip-Reading Speech Recognition Framework Achieves SOTA

2025-02-03
Auto-AVSR: Open-Source Lip-Reading Speech Recognition Framework Achieves SOTA

Auto-AVSR is an open-source, end-to-end audio-visual speech recognition (AV-ASR) framework focusing on visual speech (lip-reading). Achieving a word error rate (WER) of 20.3% for visual speech recognition (VSR) and 1.0% for audio speech recognition (ASR) on the LRS3 benchmark, it provides code and tutorials for training, evaluation, and API usage, supporting multi-node training. Users can leverage pre-trained models or train from scratch, customizing hyperparameters as needed.

OpenEuroLLM: Europe's Push for Open-Source Multilingual LLMs

2025-02-03

A consortium of 20 leading European research institutions and companies has launched OpenEuroLLM, a project to build a family of high-performance, multilingual large language models (LLMs). The initiative aims to boost Europe's AI competitiveness by democratizing access to high-quality AI technology through open-source principles. This will empower European companies and public organizations to develop impactful products and services. OpenEuroLLM operates within Europe's regulatory framework and collaborates with open-source communities to ensure complete openness of models, software, data, and evaluation, catering to diverse industry and public sector needs while preserving linguistic and cultural diversity.

AI

Lost IBM Training Doc: Computers Can't Be Held Accountable (1979)

2025-02-03
Lost IBM Training Doc: Computers Can't Be Held Accountable (1979)

A legendary page from a 1979 internal IBM training resurfaced online, stating 'A computer can never be held accountable; therefore a computer must never make a management decision.' The original source is lost, reportedly destroyed in a flood. This statement resonates powerfully in our AI-driven age, prompting reflection on AI responsibility and decision-making.

Anthropic's Constitutional Classifiers: A New Defense Against AI Jailbreaks

2025-02-03
Anthropic's Constitutional Classifiers: A New Defense Against AI Jailbreaks

Anthropic's Safeguards Research Team unveils Constitutional Classifiers, a novel defense against AI jailbreaks. This system, trained on synthetic data, effectively filters harmful outputs while minimizing false positives. A prototype withstood thousands of hours of human red teaming, significantly reducing jailbreak success rates, though initially suffering from high refusal rates and computational overhead. An updated version maintains robustness with only a minor increase in refusal rate and moderate compute cost. A temporary live demo invites security experts to test its resilience, paving the way for safer deployment of increasingly powerful AI models.

Klarity: Uncovering Uncertainty in Generative Models

2025-02-03
Klarity: Uncovering Uncertainty in Generative Models

Klarity is a tool for analyzing uncertainty in generative model outputs. It combines raw probability analysis and semantic understanding to provide deep insights into model behavior during text generation. The library offers dual entropy analysis, semantic clustering, and structured JSON output, along with AI-powered analysis for human-readable insights. Currently supporting Hugging Face Transformers, with plans for broader framework and model support.

Perceptually-Aligned Dynamic Facial Projection Mapping: High-Speed Tracking & Co-axial Setup

2025-02-03
Perceptually-Aligned Dynamic Facial Projection Mapping: High-Speed Tracking & Co-axial Setup

Researchers developed a novel high-speed dynamic facial projection mapping (DFPM) system that significantly reduces misalignment artifacts. This is achieved through a high-speed face-tracking method using a cropped-area-limited interpolation/extrapolation-based face detection and a fast Ensemble of Regression Trees (ERT) for landmark detection (0.107ms). A lens-shift co-axial projector-camera setup maintains high optical alignment with minimal error (1.274 pixels between 1m and 2m). This system achieves near-perfect alignment, improving immersive experiences in makeup and entertainment.

Bayesian Epistemology 101: Credences, Evidence, and Rationality

2025-02-03

This tutorial introduces Bayesian epistemology, focusing on its core norms: probabilism and the principle of conditionalization. Using Eddington's solar eclipse observation as a case study, it illustrates how Bayesian methods update belief in hypotheses. The tutorial then explores disagreements within Bayesianism regarding prior probabilities, coherence, and the scope of conditionalization, presenting foundational arguments like Dutch book arguments, accuracy-dominance arguments, and arguments from comparative probability. Finally, it addresses the idealization problem and the application of Bayesian methods in science.

Real Thinking vs. Fake Thinking: Staying Awake in the Age of AI

2025-02-03
Real Thinking vs. Fake Thinking: Staying Awake in the Age of AI

This essay explores the difference between 'real thinking' and 'fake thinking.' The author argues that 'real thinking' isn't simply thinking about concrete things, but a deeper, more insightful way of thinking that focuses on truly understanding the world, rather than remaining trapped in abstract concepts or pre-existing frameworks. Using examples like AI risk, philosophy, and competitive debate, the essay outlines several dimensions of 'real thinking' and suggests methods for cultivating this ability, such as slowing down, following curiosity, and paying attention to the motivations behind thinking. The author calls for staying awake in the age of AI, avoiding the traps of 'fake thinking,' and truly understanding and responding to the changes ahead.

TopoNets: High-Performing Vision and Language Models Mimicking Brain Topography

2025-02-03
TopoNets: High-Performing Vision and Language Models Mimicking Brain Topography

Researchers introduce TopoLoss, a novel method for incorporating brain-like topography into leading AI architectures (convolutional networks and transformers) with minimal performance loss. The resulting TopoNets achieve state-of-the-art performance among supervised topographic neural networks. TopoLoss is easy to implement, and experiments show TopoNets maintain high performance while exhibiting brain-like spatial organization. Furthermore, TopoNets yield sparse, parameter-efficient language models and demonstrate brain-mimicking region selectivity in image recognition and temporal integration windows in language models, mirroring patterns observed in the visual cortex and language processing areas of the brain.

AI

OpenAI's 'Strawberry' Project: Aiming for Deep Reasoning in AI

2025-02-03
OpenAI's 'Strawberry' Project: Aiming for Deep Reasoning in AI

OpenAI is secretly developing a project codenamed "Strawberry," aiming to overcome limitations in current AI models' reasoning abilities. The project seeks to enable AI to autonomously plan and conduct in-depth research on the internet, rather than simply answering queries. Internal documents reveal that the "Strawberry" model will use a specialized post-training method, combined with self-learning and planning capabilities, to reliably solve complex problems. This is considered a significant breakthrough, potentially revolutionizing AI's role in scientific discovery and software development, while also raising ethical concerns about future AI capabilities.

Chinese AI Chatbot DeepSeek Censors Tank Man Photo, Shakes Up US Markets

2025-02-02
Chinese AI Chatbot DeepSeek Censors Tank Man Photo, Shakes Up US Markets

The Chinese AI chatbot DeepSeek has sparked controversy by refusing to answer questions about the iconic 1989 Tiananmen Square "Tank Man" photo. The chatbot abruptly cuts off discussions about the image and other sensitive topics related to China, while providing detailed responses about world leaders like the UK's Prime Minister. Simultaneously, DeepSeek's powerful image generation capabilities (Janus-Pro-7B) and surprisingly low development cost (reportedly just $6 million) have sent shockwaves through US markets, causing a record 17% drop in Nvidia stock and prompting concern from US tech giants and politicians.

Sci-Fi Author Ted Chiang on AI and the Future of Tech

2025-02-02
Sci-Fi Author Ted Chiang on AI and the Future of Tech

This interview with science fiction master Ted Chiang explores his creative inspiration, his critical perspective on AI, and his concerns about the future direction of technology. Chiang argues that current AI, especially large language models, are more like low-resolution images of the internet, lacking reliability and true understanding. He emphasizes the relationship between humans and tools, and the human tendency to see ourselves in our tools. The interview also touches on the nature of language, the role of AI in artistic creation, and ethical considerations in technological development. Chiang's optimism about technology is cautious; he believes we need to be mindful of potential negative impacts and work to mitigate their harm.

AI

OpenAI Uses Reddit's r/ChangeMyView to Benchmark AI Persuasion

2025-02-02
OpenAI Uses Reddit's r/ChangeMyView to Benchmark AI Persuasion

OpenAI leveraged Reddit's r/ChangeMyView subreddit to evaluate the persuasive abilities of its new reasoning model, o3-mini. The subreddit, where users post opinions and engage in debates, provided a unique dataset to assess how well the AI's generated responses could change minds. While o3-mini didn't significantly outperform previous models like o1 or GPT-4o, all demonstrated strong persuasive abilities, ranking in the top 80-90th percentile of human performance. OpenAI emphasizes that the goal isn't to create hyper-persuasive AI, but rather to mitigate the risks associated with excessively persuasive models. The benchmark highlights the ongoing challenge of securing high-quality datasets for AI model development.

DeepSeek-R1: China's AI Surge and the Open-Source Victory

2025-02-02
DeepSeek-R1: China's AI Surge and the Open-Source Victory

DeepSeek, a Chinese company, released DeepSeek-R1, a large language model comparable to OpenAI's models, under an open-weight MIT license. This triggered a market selloff in US tech stocks, highlighting several key trends: China is rapidly catching up to the US in generative AI; open-weight models are commoditizing the foundation model layer, creating opportunities for application builders; scaling isn't the only path to AI progress, with algorithmic innovations rapidly lowering training costs. DeepSeek-R1 signifies a shift in the AI landscape, offering new opportunities for AI application development.

LLMs Hit a Wall: Einstein's Riddle Exposes Limits of Transformer-Based AI

2025-02-02
LLMs Hit a Wall:  Einstein's Riddle Exposes Limits of Transformer-Based AI

Researchers have discovered fundamental limitations in the ability of current transformer-based large language models (LLMs) to solve compositional reasoning tasks. Experiments involving Einstein's logic puzzle and multi-digit multiplication revealed significant shortcomings, even after extensive fine-tuning. These findings challenge the suitability of the transformer architecture for universal learning and are prompting investigations into alternative approaches, such as improved training data and chain-of-thought prompting, to enhance LLM reasoning capabilities.

OpenAI AMA: Admitting Lag, Embracing Open Source?

2025-02-01
OpenAI AMA: Admitting Lag, Embracing Open Source?

In a wide-ranging Reddit AMA, OpenAI CEO Sam Altman admitted that OpenAI's lead in AI is shrinking, partly due to competitors like DeepSeek. He hinted at a shift towards a more open-source strategy, potentially releasing older models. OpenAI is also navigating pressure from Washington, a massive funding round, and the need to build out substantial data center infrastructure. To compete, the company plans to increase model transparency by revealing the reasoning process behind its outputs. Altman expressed optimism about the potential for rapid AI advancement but acknowledged the risk of misuse, particularly in the development of weapons.

AI AI Race

Sparse Interpretable Audio Codec: Towards a More Intuitive Audio Representation

2025-02-01

This paper introduces a proof-of-concept audio encoder that aims to encode audio as a sparse set of events and their times of occurrence. It leverages rudimentary physics-based assumptions to model the attack and physical resonance of both the instrument and the room, hopefully encouraging a sparse, parsimonious, and easy-to-interpret representation. The model works by iteratively removing energy from the input spectrogram, producing event vectors and one-hot vectors representing time of occurrence. The decoder uses these vectors to reconstruct the audio. Experimental results show the model's ability to decompose audio, but there's room for improvement, such as enhancing reconstruction quality and reducing redundant events.

DeepSeek R1 Brings AI to the Edge on Copilot+ PCs

2025-02-01
DeepSeek R1 Brings AI to the Edge on Copilot+ PCs

Microsoft is bringing the power of AI to the edge with DeepSeek R1, now optimized for Copilot+ PCs powered by Qualcomm Snapdragon and Intel Core Ultra processors. Leveraging the Neural Processing Unit (NPU), DeepSeek R1 runs efficiently on-device, enabling faster response times and lower power consumption. Developers can easily integrate the model using the AI Toolkit to build native AI applications. This initial release of DeepSeek R1-Distill-Qwen-1.5B, along with upcoming 7B and 14B variants, showcases the potential of edge AI for efficient inference and continuously running services.

AI Edge AI

AI's $200 Task Conquest: A Progress Report

2025-02-01
AI's $200 Task Conquest: A Progress Report

The author recounts commissioning a $200 mascot design in 2013, illustrating the type of tasks now achievable by AI. AI excels at transactional tasks with well-defined outputs, like logo design, transcription, and translation, previously requiring specialized skills. However, more complex tasks demanding nuanced expertise and judgment, such as landscape design, remain beyond AI's current capabilities. While AI's progress is impressive, its economic impact in solving paid tasks is still in its early stages.

OpenAI's o3-mini: A Budget-Friendly LLM Powerhouse

2025-02-01

OpenAI has released o3-mini, a new language model that excels in the Codeforces competitive programming benchmark, significantly outperforming GPT-4o and o1. While not universally superior across all metrics, its low price ($1.10/million input tokens, $4.40/million output tokens) and exceptionally high token output limit (100,000 tokens) make it highly competitive. OpenAI plans to integrate it into ChatGPT for web search and summarization, and support is already available in LLM 0.21, but currently limited to Tier 3 users (at least $100 spent on the API). o3-mini offers developers a powerful and cost-effective LLM option.

AI

AI Music Generation: Convenience vs. Creativity

2025-01-31
AI Music Generation: Convenience vs. Creativity

The success of AI music company Suno sparks a reflection on the role of AI in artistic creation. The author, a Stanford professor, questions Suno's claim that AI can easily solve the tedious parts of music creation, arguing that the challenges and difficulties inherent in the creative process constitute the meaning and value of art. Using his own experiences and teaching practices as examples, he illustrates the importance of the creative process and calls for the preservation of human active creation in the age of AI, avoiding a purely consumerist culture.

Tensor Diagrams Simplify Tensor Manipulation: Introducing Tensorgrad

2025-01-31

High-dimensional tensor manipulation can be confusing? A new book, "The Tensor Cookbook," simplifies this process using tensor diagrams. Tensor diagrams are more intuitive than traditional index notation (einsum), easily revealing patterns and symmetries, avoiding the hassle of vectorization and Kronecker products, simplifying matrix calculus, and effortlessly representing functions and broadcasting. The accompanying Python library, Tensorgrad, uses tensor diagrams for symbolic tensor manipulation and differentiation, making complex calculations easier to understand.

OpenAI Launches Cheaper, Faster Reasoning Model: o3-mini

2025-01-31
OpenAI Launches Cheaper, Faster Reasoning Model: o3-mini

OpenAI unveiled o3-mini, a new AI reasoning model in its 'o' family. While comparable in capability to the o1 family, o3-mini boasts faster speeds and lower costs. Fine-tuned for STEM problems, particularly programming, math, and science, it's available in ChatGPT with adjustable 'reasoning effort' settings balancing speed and accuracy. Paid users get unlimited access, while free users have a query limit. Also accessible via OpenAI's API to select developers, o3-mini offers competitive pricing and improved safety, though it doesn't surpass DeepSeek's R1 model in all benchmarks.

AI
1 2 3 4 5 6 7 8 9 11