arXivLabs: Experimental Projects with Community Collaborators

2025-02-03
arXivLabs: Experimental Projects with Community Collaborators

arXivLabs is a framework enabling collaborators to develop and share new arXiv features directly on the arXiv website. Individuals and organizations involved with arXivLabs embrace our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners who adhere to them. Have an idea for a project that will benefit the arXiv community? Learn more about arXivLabs.

Read more
Development

arXivLabs: Experimenting with Community-Driven Features

2025-02-02
arXivLabs: Experimenting with Community-Driven Features

arXivLabs is an experimental platform enabling collaborators to develop and share new arXiv features directly on the website. Participants share arXiv's values of openness, community, excellence, and user data privacy. Got an idea to improve the arXiv community? Learn more about arXivLabs.

Read more
Development

arXivLabs: Experimenting with Community-Driven Features

2025-02-01
arXivLabs: Experimenting with Community-Driven Features

arXivLabs is a framework enabling collaborators to build and share new arXiv features directly on the website. Participants, both individuals and organizations, embrace arXiv's values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only partners with those who share them. Got an idea for a project that will benefit the arXiv community? Learn more about arXivLabs.

Read more

arXivLabs: Community Collaboration on arXiv Features

2025-02-01
arXivLabs: Community Collaboration on arXiv Features

arXivLabs is an experimental framework enabling collaborators to develop and share new arXiv features directly on the website. Participants must adhere to arXiv's values of openness, community, excellence, and user data privacy. Got an idea to improve the arXiv community? Learn more about arXivLabs.

Read more
Development

arXivLabs: Experimenting with Community Collaboration

2025-02-01
arXivLabs: Experimenting with Community Collaboration

arXivLabs is a framework enabling collaborators to develop and share new arXiv features directly on the website. Individuals and organizations involved share arXiv's values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only partners with those who adhere to them. Got an idea for a project that will benefit the arXiv community? Learn more about arXivLabs.

Read more
Tech

arXivLabs: Community-Driven Feature Development for arXiv

2025-02-01
arXivLabs: Community-Driven Feature Development for arXiv

arXivLabs is a framework enabling collaborators to develop and share new arXiv features directly on the website. Participants, individuals and organizations alike, embrace arXiv's values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only partners with those who share them. Have an idea to enhance the arXiv community? Learn more about arXivLabs.

Read more
Development

arXivLabs: Experimental Projects with Community Collaborators

2025-01-31
arXivLabs: Experimental Projects with Community Collaborators

arXivLabs is a framework enabling collaborators to develop and share new arXiv features directly on the website. Individuals and organizations involved embrace arXiv's values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only partners with those who share them. Have an idea to enhance the arXiv community? Learn more about arXivLabs.

Read more
Development experimental projects

arXivLabs: Experimenting with Community Collaboration

2025-01-31
arXivLabs: Experimenting with Community Collaboration

arXivLabs is a framework for collaborators to develop and share new features directly on the arXiv website. Individuals and organizations involved share arXiv's values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only partners with those who adhere to them. Have an idea to improve the arXiv community? Learn more about arXivLabs.

Read more
Development open platform

A Faster Quantum Fourier Transform Algorithm

2025-01-27
A Faster Quantum Fourier Transform Algorithm

Ronit Shah presents an improved algorithm for the Quantum Fourier Transform (QFT). Traditionally, approximate QFT requires Θ(n log n) gates, and exact QFT requires Θ(n²) gates. The new algorithm, leveraging a novel recursive partitioning of qubits, reduces the cost of approximate QFT to Θ(n(log log n)²) gates and exact QFT to Θ(n(log n)²) gates. This breakthrough promises significant efficiency gains in quantum computation.

Read more

DeepSeek-R1: Boosting LLM Reasoning with Reinforcement Learning

2025-01-25
DeepSeek-R1: Boosting LLM Reasoning with Reinforcement Learning

DeepSeek-AI unveils DeepSeek-R1, its first-generation reasoning model trained via large-scale reinforcement learning (RL) without supervised fine-tuning. Its precursor, DeepSeek-R1-Zero, surprisingly demonstrated strong reasoning capabilities, but suffered from readability and language mixing issues. DeepSeek-R1 addresses these flaws with multi-stage training and cold-start data, achieving performance comparable to OpenAI's models. To foster research, DeepSeek-AI open-sources DeepSeek-R1-Zero, DeepSeek-R1, and six distilled models of varying sizes, built upon Qwen and Llama.

Read more
AI

Foundations of Large Language Models: A New Book Decoding Core Concepts

2025-01-23
Foundations of Large Language Models: A New Book Decoding Core Concepts

A new book, "Foundations of Large Language Models," has been released. Instead of aiming for comprehensive coverage of cutting-edge technologies, it delves into the core foundational concepts of large language models. Structured into four chapters covering pre-training, generative models, prompting techniques, and alignment methods, the book is geared towards college students, professionals, and practitioners in natural language processing and related fields. It serves as a valuable reference for anyone interested in LLMs.

Read more
AI

Lossless Compression of Vector IDs Boosts Approximate Nearest Neighbor Search

2025-01-23
Lossless Compression of Vector IDs Boosts Approximate Nearest Neighbor Search

Researchers introduce a lossless compression scheme for vector IDs to address the high storage cost of indexes in approximate nearest neighbor search. Leveraging the fact that the order of IDs is irrelevant in many index structures, and utilizing asymmetric numeral systems or wavelet trees, the method achieves up to 7x compression of vector IDs without impacting accuracy or search runtime. This translates to a 30% reduction in index size for billion-scale datasets. Furthermore, the approach can also losslessly compress quantized vector codes by exploiting sub-optimalities in the original quantization algorithm.

Read more

FLAME: A Lightweight Language Model for Spreadsheet Formulas

2025-01-22
FLAME: A Lightweight Language Model for Spreadsheet Formulas

Large language models are expensive to train and deploy for assisting with Excel formula authoring. This paper introduces FLAME, a transformer-based model trained exclusively on Excel formulas. With only 60 million parameters and a fraction of the training data used by larger models, FLAME achieves competitive or even superior performance on formula repair, completion, and retrieval tasks compared to models like Codex and CodeT5. This is attributed to its novel pre-training objectives and Excel-specific tokenizer.

Read more
Development formula

Tensor Product Attention: All You Need

2025-01-22
Tensor Product Attention: All You Need

Scaling language models to handle longer input sequences typically requires large key-value (KV) caches, resulting in substantial memory overhead during inference. This paper proposes Tensor Product Attention (TPA), a novel attention mechanism that uses tensor decompositions to represent queries, keys, and values compactly, significantly reducing KV cache size during inference. By factorizing these representations into contextual low-rank components (contextual factorization) and seamlessly integrating with RoPE, TPA improves model quality while maintaining memory efficiency. Based on TPA, the authors introduce the Tensor ProducT ATTenTion Transformer (T6), a new model architecture for sequence modeling. Extensive empirical evaluation on language modeling tasks demonstrates that T6 surpasses standard Transformer baselines including MHA, MQA, GQA, and MLA across various metrics, including perplexity and a range of well-known evaluation benchmarks. Notably, TPA's memory efficiency enables the processing of significantly longer sequences under fixed resource constraints, addressing a critical scalability challenge in modern language models. Code is available.

Read more

ELIZA Reanimated: World's First Chatbot Restored

2025-01-18
ELIZA Reanimated: World's First Chatbot Restored

Researchers have successfully resurrected ELIZA, widely considered the world's first chatbot, on a restored CTSS—the world's first time-sharing system (emulated on an IBM 7094). Using original printouts, MAD-SLIP code, and supporting documents found in Prof. Weizenbaum's archives at MIT, they recreated ELIZA and its famous DOCTOR script. The entire project is open-source, allowing anyone with a Unix-like OS to run the groundbreaking chatbot.

Read more
AI

Unraveling the Math Behind NYT's Daily Word Game 'Waffle'

2025-01-17
Unraveling the Math Behind NYT's Daily Word Game 'Waffle'

A paper on arXiv explores the mathematics behind the New York Times' daily word game, Waffle. Author S.P. Glasby delves into the combinatorial properties of the game, explaining why some puzzles are easy while others are exceptionally difficult. The research reveals that a perfect solution requires precisely 11 orbits among the 21 squares, with at least one orbit of length 1. This provides a mathematical framework for understanding and potentially improving similar word puzzles.

Read more
Game

Titans: A Novel Neural Architecture for Learning to Memorize at Test Time

2025-01-16
Titans: A Novel Neural Architecture for Learning to Memorize at Test Time

Researchers introduce Titans, a novel neural architecture that combines a neural memory module with an attention mechanism to effectively memorize long-term historical context. Unlike traditional recurrent models and attention mechanisms, Titans demonstrates superior efficiency and accuracy in handling long sequences, particularly excelling in "needle-in-a-haystack" tasks. It outperforms Transformers and recent linear recurrent models across various tasks including language modeling, common-sense reasoning, genomics, and time series, and scales to context windows exceeding 2 million tokens.

Read more

Towards System 2 Reasoning in LLMs: Meta Chain-of-Thought

2025-01-10
Towards System 2 Reasoning in LLMs: Meta Chain-of-Thought

Researchers propose Meta Chain-of-Thought (Meta-CoT), a novel framework extending traditional Chain-of-Thought (CoT) by explicitly modeling the reasoning behind a given CoT. Meta-CoT leverages process supervision, synthetic data generation, and search algorithms. The paper outlines a training pipeline incorporating instruction tuning with linearized search traces and reinforcement learning. This work provides a roadmap for enabling Meta-CoT in LLMs, promising more powerful and human-like reasoning in AI.

Read more

Challenging the CAP Theorem: A Partial Progress Conjecture Under Asynchrony

2025-01-08
Challenging the CAP Theorem: A Partial Progress Conjecture Under Asynchrony

A new paper challenges the well-known CAP theorem. The authors conjecture that partial progress is possible under network partitions, meaning the system can remain responsive to a subset of clients and achieve non-zero throughput during failures. They present the design of their CASSANDRA consensus protocol, allowing partitioned replicas to order client requests, potentially offering a path to systems that are both consistent and available to some degree, even during partitions. This research offers a novel approach to building more robust distributed systems.

Read more

A Decade Review: Diving Deep into Time-Series Anomaly Detection

2025-01-06
A Decade Review: Diving Deep into Time-Series Anomaly Detection

Advances in data collection and the explosion of streaming data highlight the crucial need for time-series analytics. This paper provides a decade-long review of time-series anomaly detection, encompassing methods from traditional statistical measures to the surge of machine learning algorithms. It presents a process-centric taxonomy to categorize and summarize existing solutions, offering a meta-analysis of the literature and outlining general trends in the field. This comprehensive survey serves as a valuable resource for researchers.

Read more

Scientists Crack the Code of the Perfect Cacio e Pepe

2025-01-04
Scientists Crack the Code of the Perfect Cacio e Pepe

A team of scientists delved into the culinary arts, specifically the classic Italian dish Cacio e Pepe, to uncover the secrets behind its perfect creamy texture. Their research revealed starch concentration as the key factor influencing sauce stability. Starch levels below 1% (relative to cheese mass) lead to clumping, a phenomenon dubbed the "Mozzarella Phase," resulting in a separated and unpleasant sauce. The study also explored the impact of cheese-to-water ratios at a fixed starch level, observing a lower critical solution temperature and developing a minimal effective free-energy model to explain it. Ultimately, they presented a scientifically optimized recipe guaranteeing consistently flawless Cacio e Pepe.

Read more

Reproducing OpenAI's o1: A Roadmap from a Reinforcement Learning Perspective

2025-01-03
Reproducing OpenAI's o1: A Roadmap from a Reinforcement Learning Perspective

A new paper explores the path to reproducing OpenAI's enigmatic model, o1, from a reinforcement learning perspective. Researchers argue o1's powerful reasoning isn't due to a single technique, but rather the synergy of four key components: policy initialization, reward design, search, and learning. Policy initialization equips the model with human-like reasoning; reward design provides dense and effective signals guiding search and learning; search generates high-quality solutions during training and testing; learning utilizes data from search to improve the policy, ultimately achieving better performance. This paper offers valuable insights into understanding and reproducing o1, providing new avenues for LLM development.

Read more

4.5 Million Fake GitHub Stars: A Shadowy Popularity Contest

2025-01-02
4.5 Million Fake GitHub Stars: A Shadowy Popularity Contest

A new study reveals 4.5 million suspected fake stars on GitHub, primarily used to promote short-lived malware repositories disguised as pirated software, game cheats, or cryptocurrency bots. Researchers developed StarScout, a tool to detect anomalous starring behavior. The study shows a rapid surge in fake star activity since 2024. While fake stargazers don't differ significantly from average users in profile characteristics, their activity patterns are highly abnormal. While offering short-term promotional benefits, fake stars ultimately become a long-term burden. This research has significant implications for platform moderators, open-source practitioners, and supply chain security researchers.

Read more
Tech

TinyStories: Can Small Language Models Still Tell Coherent English Stories?

2025-01-02
TinyStories: Can Small Language Models Still Tell Coherent English Stories?

Researchers introduce TinyStories, a synthetic dataset of short stories using only vocabulary understood by typical 3-4 year olds, generated by GPT-3.5 and GPT-4. They demonstrate that LMs trained on TinyStories, even those with fewer than 10 million parameters and simple architectures (a single transformer block), can generate fluent, coherent multi-paragraph stories exhibiting surprisingly good grammar and reasoning. This challenges the notion that coherent text generation requires massive models and complex architectures, and introduces a novel evaluation paradigm using GPT-4 to grade generated stories like a human teacher, overcoming limitations of standard benchmarks.

Read more

Activation Engineering: Manipulating Personality Traits in LLMs

2024-12-31
Activation Engineering: Manipulating Personality Traits in LLMs

A paper on arXiv explores a novel method for identifying and manipulating personality traits in large language models (LLMs) using 'activation engineering'. Inspired by prior research on LLM refusal and steering, the researchers propose a technique to adjust activation directions linked to personality traits, enabling dynamic LLM personality fine-tuning. This work contributes to a better understanding of LLM interpretability while also raising crucial ethical considerations.

Read more

Beyond Gradient Averaging in Parallel Optimization: Improved Robustness through Gradient Agreement Filtering

2024-12-30
Beyond Gradient Averaging in Parallel Optimization: Improved Robustness through Gradient Agreement Filtering

This paper introduces Gradient Agreement Filtering (GAF), a novel method to improve gradient averaging in distributed deep learning optimization. Traditional methods average micro-batch gradients to compute a macro-batch gradient, but this can lead to orthogonal or negatively correlated gradients in later training stages, resulting in overfitting. GAF reduces gradient variance by computing the cosine distance between micro-gradients and filtering out conflicting updates before averaging. Experiments on image classification benchmarks like CIFAR-100 and CIFAR-100N-Fine show that GAF significantly improves validation accuracy, even with smaller micro-batch sizes, achieving up to an 18.2% improvement over traditional approaches while reducing computational cost.

Read more

Evaluating LLMs' Code Generation Capabilities: Introducing MultiCodeBench

2024-12-30
Evaluating LLMs' Code Generation Capabilities: Introducing MultiCodeBench

AI-powered programming assistants based on code Large Language Models (LLMs) have become increasingly prevalent, significantly boosting developer productivity. However, existing code generation benchmarks primarily focus on general-purpose scenarios, leaving the performance of LLMs in specific application domains largely unknown. This paper introduces MultiCodeBench, a new benchmark comprising 2,400 programming tasks across 12 popular software development domains and 15 programming languages. Experiments on eleven mainstream LLMs reveal their code generation performance across different domains, offering practical insights for developers in selecting LLMs and guidance for model developers to enhance domain-specific code generation capabilities.

Read more
Development Code Generation

Breakthrough in Evaluating Large Language Models for Unit Test Generation

2024-12-30
Breakthrough in Evaluating Large Language Models for Unit Test Generation

Researchers conducted a comprehensive evaluation of the potential of Large Language Models (LLMs) in automating unit test generation. They compared the performance of five open-source LLMs against the closed-source GPT-4 and the traditional tool Evosuite across 17 Java projects, investigating the impact of different prompting strategies. The study found that open-source LLMs offer advantages in data privacy and outperform in certain tasks, but also revealed limitations in LLM-based unit test generation. This research provides valuable insights to guide future applications of LLMs in this area.

Read more
Development Unit Testing

LLM Identity Confusion: A Crisis of Trust Emerges

2024-12-30
LLM Identity Confusion: A Crisis of Trust Emerges

A recent study reveals widespread "identity confusion" in Large Language Models (LLMs). Researchers found that over 25% of LLMs exhibit misrepresentation of their origins or identities, primarily stemming from model hallucinations rather than replication or reuse. This identity confusion significantly erodes user trust, especially in critical tasks like education and professional use, surpassing the negative impact of logical errors. The findings highlight the systemic risks posed by LLM identity confusion and call for greater attention to model reliability and trustworthiness.

Read more
1 2 3 4 5 6 7 9