Towards System 2 Reasoning in LLMs: Meta Chain-of-Thought

2025-01-10
Towards System 2 Reasoning in LLMs: Meta Chain-of-Thought

Researchers propose Meta Chain-of-Thought (Meta-CoT), a novel framework extending traditional Chain-of-Thought (CoT) by explicitly modeling the reasoning behind a given CoT. Meta-CoT leverages process supervision, synthetic data generation, and search algorithms. The paper outlines a training pipeline incorporating instruction tuning with linearized search traces and reinforcement learning. This work provides a roadmap for enabling Meta-CoT in LLMs, promising more powerful and human-like reasoning in AI.

Read more

Challenging the CAP Theorem: A Partial Progress Conjecture Under Asynchrony

2025-01-08
Challenging the CAP Theorem: A Partial Progress Conjecture Under Asynchrony

A new paper challenges the well-known CAP theorem. The authors conjecture that partial progress is possible under network partitions, meaning the system can remain responsive to a subset of clients and achieve non-zero throughput during failures. They present the design of their CASSANDRA consensus protocol, allowing partitioned replicas to order client requests, potentially offering a path to systems that are both consistent and available to some degree, even during partitions. This research offers a novel approach to building more robust distributed systems.

Read more

A Decade Review: Diving Deep into Time-Series Anomaly Detection

2025-01-06
A Decade Review: Diving Deep into Time-Series Anomaly Detection

Advances in data collection and the explosion of streaming data highlight the crucial need for time-series analytics. This paper provides a decade-long review of time-series anomaly detection, encompassing methods from traditional statistical measures to the surge of machine learning algorithms. It presents a process-centric taxonomy to categorize and summarize existing solutions, offering a meta-analysis of the literature and outlining general trends in the field. This comprehensive survey serves as a valuable resource for researchers.

Read more

Scientists Crack the Code of the Perfect Cacio e Pepe

2025-01-04
Scientists Crack the Code of the Perfect Cacio e Pepe

A team of scientists delved into the culinary arts, specifically the classic Italian dish Cacio e Pepe, to uncover the secrets behind its perfect creamy texture. Their research revealed starch concentration as the key factor influencing sauce stability. Starch levels below 1% (relative to cheese mass) lead to clumping, a phenomenon dubbed the "Mozzarella Phase," resulting in a separated and unpleasant sauce. The study also explored the impact of cheese-to-water ratios at a fixed starch level, observing a lower critical solution temperature and developing a minimal effective free-energy model to explain it. Ultimately, they presented a scientifically optimized recipe guaranteeing consistently flawless Cacio e Pepe.

Read more

Reproducing OpenAI's o1: A Roadmap from a Reinforcement Learning Perspective

2025-01-03
Reproducing OpenAI's o1: A Roadmap from a Reinforcement Learning Perspective

A new paper explores the path to reproducing OpenAI's enigmatic model, o1, from a reinforcement learning perspective. Researchers argue o1's powerful reasoning isn't due to a single technique, but rather the synergy of four key components: policy initialization, reward design, search, and learning. Policy initialization equips the model with human-like reasoning; reward design provides dense and effective signals guiding search and learning; search generates high-quality solutions during training and testing; learning utilizes data from search to improve the policy, ultimately achieving better performance. This paper offers valuable insights into understanding and reproducing o1, providing new avenues for LLM development.

Read more

4.5 Million Fake GitHub Stars: A Shadowy Popularity Contest

2025-01-02
4.5 Million Fake GitHub Stars: A Shadowy Popularity Contest

A new study reveals 4.5 million suspected fake stars on GitHub, primarily used to promote short-lived malware repositories disguised as pirated software, game cheats, or cryptocurrency bots. Researchers developed StarScout, a tool to detect anomalous starring behavior. The study shows a rapid surge in fake star activity since 2024. While fake stargazers don't differ significantly from average users in profile characteristics, their activity patterns are highly abnormal. While offering short-term promotional benefits, fake stars ultimately become a long-term burden. This research has significant implications for platform moderators, open-source practitioners, and supply chain security researchers.

Read more
Tech

TinyStories: Can Small Language Models Still Tell Coherent English Stories?

2025-01-02
TinyStories: Can Small Language Models Still Tell Coherent English Stories?

Researchers introduce TinyStories, a synthetic dataset of short stories using only vocabulary understood by typical 3-4 year olds, generated by GPT-3.5 and GPT-4. They demonstrate that LMs trained on TinyStories, even those with fewer than 10 million parameters and simple architectures (a single transformer block), can generate fluent, coherent multi-paragraph stories exhibiting surprisingly good grammar and reasoning. This challenges the notion that coherent text generation requires massive models and complex architectures, and introduces a novel evaluation paradigm using GPT-4 to grade generated stories like a human teacher, overcoming limitations of standard benchmarks.

Read more

Activation Engineering: Manipulating Personality Traits in LLMs

2024-12-31
Activation Engineering: Manipulating Personality Traits in LLMs

A paper on arXiv explores a novel method for identifying and manipulating personality traits in large language models (LLMs) using 'activation engineering'. Inspired by prior research on LLM refusal and steering, the researchers propose a technique to adjust activation directions linked to personality traits, enabling dynamic LLM personality fine-tuning. This work contributes to a better understanding of LLM interpretability while also raising crucial ethical considerations.

Read more

Beyond Gradient Averaging in Parallel Optimization: Improved Robustness through Gradient Agreement Filtering

2024-12-30
Beyond Gradient Averaging in Parallel Optimization: Improved Robustness through Gradient Agreement Filtering

This paper introduces Gradient Agreement Filtering (GAF), a novel method to improve gradient averaging in distributed deep learning optimization. Traditional methods average micro-batch gradients to compute a macro-batch gradient, but this can lead to orthogonal or negatively correlated gradients in later training stages, resulting in overfitting. GAF reduces gradient variance by computing the cosine distance between micro-gradients and filtering out conflicting updates before averaging. Experiments on image classification benchmarks like CIFAR-100 and CIFAR-100N-Fine show that GAF significantly improves validation accuracy, even with smaller micro-batch sizes, achieving up to an 18.2% improvement over traditional approaches while reducing computational cost.

Read more

Evaluating LLMs' Code Generation Capabilities: Introducing MultiCodeBench

2024-12-30
Evaluating LLMs' Code Generation Capabilities: Introducing MultiCodeBench

AI-powered programming assistants based on code Large Language Models (LLMs) have become increasingly prevalent, significantly boosting developer productivity. However, existing code generation benchmarks primarily focus on general-purpose scenarios, leaving the performance of LLMs in specific application domains largely unknown. This paper introduces MultiCodeBench, a new benchmark comprising 2,400 programming tasks across 12 popular software development domains and 15 programming languages. Experiments on eleven mainstream LLMs reveal their code generation performance across different domains, offering practical insights for developers in selecting LLMs and guidance for model developers to enhance domain-specific code generation capabilities.

Read more
Development Code Generation

Breakthrough in Evaluating Large Language Models for Unit Test Generation

2024-12-30
Breakthrough in Evaluating Large Language Models for Unit Test Generation

Researchers conducted a comprehensive evaluation of the potential of Large Language Models (LLMs) in automating unit test generation. They compared the performance of five open-source LLMs against the closed-source GPT-4 and the traditional tool Evosuite across 17 Java projects, investigating the impact of different prompting strategies. The study found that open-source LLMs offer advantages in data privacy and outperform in certain tasks, but also revealed limitations in LLM-based unit test generation. This research provides valuable insights to guide future applications of LLMs in this area.

Read more
Development Unit Testing

LLM Identity Confusion: A Crisis of Trust Emerges

2024-12-30
LLM Identity Confusion: A Crisis of Trust Emerges

A recent study reveals widespread "identity confusion" in Large Language Models (LLMs). Researchers found that over 25% of LLMs exhibit misrepresentation of their origins or identities, primarily stemming from model hallucinations rather than replication or reuse. This identity confusion significantly erodes user trust, especially in critical tasks like education and professional use, surpassing the negative impact of logical errors. The findings highlight the systemic risks posed by LLM identity confusion and call for greater attention to model reliability and trustworthiness.

Read more

Explaining Large Language Model Decisions Using Shapley Values

2024-12-28
Explaining Large Language Model Decisions Using Shapley Values

Large language models (LLMs) offer exciting possibilities for simulating human behavior, but their decision-making processes lack transparency. This paper introduces a novel approach based on Shapley values to interpret LLM behavior and quantify the contribution of each prompt component to the model's output. Through two applications, the study reveals that LLM decisions are susceptible to "token noise," where the model disproportionately reacts to tokens with minimal informative content. This raises concerns about the robustness and generalizability of insights from LLMs in simulating human behavior, highlighting the need for careful prompt engineering and a nuanced understanding of their limitations when used in research.

Read more

Invariants: Advances in Computation and Applications

2024-12-27
Invariants: Advances in Computation and Applications

A tutorial paper published in the proceedings of ISSAC 2023 explores the computation and applications of invariants in mathematics. The paper focuses on the interplay between differential and algebraic invariant theories, presenting an algebraic adaptation of the moving frame method from differential geometry to compute a generating set of rational invariants. It also discusses the role of differential invariant signatures in solving equivalence problems in geometry and algebra, and the challenges in designing algorithms based on this concept.

Read more

Adversarial Policies Defeat Superhuman Go AIs

2024-12-24
Adversarial Policies Defeat Superhuman Go AIs

Researchers achieved a >97% win rate against the state-of-the-art Go AI, KataGo, by training adversarial policies. These adversaries didn't win by playing Go well, but by tricking KataGo into making critical blunders. The attack transferred zero-shot to other superhuman Go AIs and was simple enough for human experts to replicate without algorithmic assistance. The vulnerability persisted even after KataGo was adversarially trained to defend against it, highlighting surprising failure modes in even superhuman AI systems.

Read more

Supernovae Data Suggests Foundational Shift in Cosmological Models

2024-12-23
Supernovae Data Suggests Foundational Shift in Cosmological Models

A new study presents a cosmologically model-independent statistical analysis of the Pantheon+ Type Ia supernovae spectroscopic dataset, improving upon the standard methodology used by Lane et al. By employing the Tripp equation for supernova standardization alone, the study avoids potential correlations in stretch and color distributions. The results strongly favor the 'Timescape' cosmology over the standard ΛCDM model in explaining the data, providing evidence for the need to revisit the foundations of theoretical and observational cosmology. Even when restricting the sample to redshifts beyond conventional scales of statistical homogeneity (z > 0.075), Timescape remains preferred over ΛCDM.

Read more

Offline Reinforcement Learning Boosts Multi-Step Reasoning in LLMs

2024-12-23
Offline Reinforcement Learning Boosts Multi-Step Reasoning in LLMs

Researchers introduce OREO, an offline reinforcement learning method designed to enhance the multi-step reasoning capabilities of large language models (LLMs). Building upon maximum entropy reinforcement learning, OREO jointly learns a policy model and value function by optimizing the soft Bellman equation. This addresses limitations of Direct Preference Optimization (DPO) in multi-step reasoning, specifically the need for extensive paired preference data and the challenge of effective credit assignment. Experiments demonstrate OREO's superiority over existing offline learning methods on benchmarks involving mathematical reasoning and embodied agent control.

Read more

Tokenization Problem Proven NP-Complete, Doubling Data Compression Challenges

2024-12-22
Tokenization Problem Proven NP-Complete, Doubling Data Compression Challenges

A paper published on arXiv proves the NP-completeness of two variants of tokenization, defined as the problem of compressing a dataset to at most δ symbols by either finding a vocabulary directly (direct tokenization) or selecting a sequence of merge operations (bottom-up tokenization). This finding has significant implications for data compression and natural language processing, highlighting the immense challenge of efficiently solving the tokenization problem for large-scale datasets.

Read more

Groundbreaking Advance: Safely Compiling C to Rust

2024-12-21
Groundbreaking Advance: Safely Compiling C to Rust

Researchers have developed a novel method for safely compiling C code into Rust. This technique utilizes static analysis and type-directed translation to avoid reliance on Rust's `unsafe` blocks, thus guaranteeing memory safety. The method has been successfully applied to code from the HACL* cryptographic library and EverParse libraries, resulting in an 80,000-line pure Rust verified modern cryptographic library—a first of its kind.

Read more
Development C compilation

Lightweight Safety Classification Using Pruned Language Models

2024-12-19
Lightweight Safety Classification Using Pruned Language Models

Researchers introduce Layer Enhanced Classification (LEC), a novel lightweight technique for content safety and prompt injection classification in Large Language Models (LLMs). LEC trains a streamlined Penalized Logistic Regression (PLR) classifier on the hidden state of an LLM's optimal intermediate transformer layer. Combining the efficiency of PLR with the sophisticated language understanding of LLMs, LEC outperforms GPT-4o and specialized models. Small general-purpose models like Qwen 2.5 and architectures such as DeBERTa v3 prove robust feature extractors, effectively training with fewer than 100 high-quality examples. Crucially, intermediate transformer layers often outperform the final layer. A single general-purpose LLM can classify content safety, detect prompt injections, and generate output, or smaller LLMs can be pruned to their optimal intermediate layer for feature extraction. Consistent results across architectures suggest robust feature extraction is inherent to many LLMs.

Read more

Classical Sorting Algorithms Reveal Unexpected Competencies in a Minimal Model of Basal Intelligence

2024-12-19
Classical Sorting Algorithms Reveal Unexpected Competencies in a Minimal Model of Basal Intelligence

A new study uses classical sorting algorithms as a model of morphogenesis, challenging conventional wisdom about these algorithms. By breaking assumptions of top-down control and perfectly reliable hardware, researchers discovered that arrays of autonomous elements sort themselves more reliably and robustly than traditional implementations, even in the presence of errors. Surprisingly, these algorithms exhibit the ability to temporarily reduce progress to navigate around defects and unexpected clustering behavior among elements in chimeric arrays following different algorithms. This discovery provides a novel perspective on diverse intelligence, demonstrating how basal forms of intelligence can emerge in simple systems without explicit encoding in their underlying mechanics.

Read more

Cultural Evolution of Cooperation Among LLM Agents

2024-12-18
Cultural Evolution of Cooperation Among LLM Agents

Researchers investigated whether a 'society' of Large Language Model (LLM) agents can learn mutually beneficial social norms despite incentives to defect. Experiments revealed significant differences in the evolution of cooperation across base models, with Claude 3.5 Sonnet significantly outperforming Gemini 1.5 Flash and GPT-4o. Furthermore, Claude 3.5 Sonnet leveraged a costly punishment mechanism to achieve even higher scores, a feat not replicated by the other models. This study proposes a new benchmark for LLMs focused on the societal implications of LLM agent deployment, offering insights into building more robust and cooperative AI agents.

Read more

No More Adam: Learning Rate Scaling at Initialization is All You Need

2024-12-18
No More Adam: Learning Rate Scaling at Initialization is All You Need

Researchers introduce SGD-SaI, a novel optimizer improving stochastic gradient descent. SGD-SaI addresses training imbalances by scaling learning rates at initialization for different parameter groups based on their gradient signal-to-noise ratios. Significantly more memory-efficient than AdamW, SGD-SaI matches or surpasses AdamW's performance across various Transformer-based tasks, including ImageNet classification and LLM pretraining. Its robustness and practicality are demonstrated across diverse applications, making it a compelling alternative.

Read more
AI

Best-of-N Jailbreaking: A Novel Attack on AI Systems

2024-12-15
Best-of-N Jailbreaking: A Novel Attack on AI Systems

Researchers have developed a new AI attack algorithm called Best-of-N (BoN) Jailbreaking. This black-box algorithm repeatedly modifies prompts—randomly shuffling or capitalizing text, for example—until it elicits a harmful response from the AI system. BoN achieved impressively high attack success rates (ASRs) on closed-source language models like GPT-4o (89%) and Claude 3.5 Sonnet (78%), effectively circumventing existing defenses. Furthermore, BoN seamlessly extends to vision and audio language models, highlighting the vulnerability of even advanced AI systems to seemingly innocuous input variations. This research underscores significant security concerns in the field of AI.

Read more

Automated Assembly System Creates Cyborg Insects

2024-12-15
Automated Assembly System Creates Cyborg Insects

Scientists have developed an automated system for assembling insect-computer hybrid robots. The system uses a vision-guided robotic arm to precisely implant custom-designed bipolar electrodes onto the backs of Madagascar hissing cockroaches. The entire process takes only 68 seconds, and the assembled robots achieve steering and deceleration control comparable to manually assembled systems. A multi-agent system of 4 robots successfully navigated an obstacle course, demonstrating the feasibility of mass production and real-world applications. This research paves the way for scalable production and deployment of insect robots.

Read more

CCxTrust: A Confidential Computing Platform Leveraging Collaborative Trust from TEE and TPM

2024-12-12
CCxTrust: A Confidential Computing Platform Leveraging Collaborative Trust from TEE and TPM

CCxTrust is a novel confidential computing platform that cleverly combines the strengths of Trusted Execution Environments (TEEs) and Trusted Platform Modules (TPMs) to establish a collaborative trust framework. By leveraging the black-box Root of Trust (RoT) embedded in CPU-TEEs and the flexible white-box RoT of TPMs, CCxTrust achieves end-to-end protection of sensitive data and models, overcoming the limitations of relying on a single hardware RoT. The platform implements independent Roots of Trust for Measurement (RTM) and a collaborative Root of Trust for Report (RTR), further enhanced by a composite attestation protocol for improved security and efficiency. Experimental results demonstrate significant performance advantages.

Read more

Breakthrough in Reachability Analysis of the Domain Name System

2024-12-12
Breakthrough in Reachability Analysis of the Domain Name System

Researchers have presented the first decision procedure for verifying the Domain Name System (DNS), establishing its complexity as 2ExpTime. The study formalizes DNS semantics and uses a novel abstraction based on positive prefix-testable languages, reducing the DNS verification problem to the verification problem for pushdown systems. This approach effectively models attack vectors in DNS, such as amplification attacks and rewrite blackholing, providing a new theoretical foundation for ensuring DNS security and reliability.

Read more
1 2 4 Next →