Radix Sort Beats Hash Tables: A Performance Showdown for Counting Unique Values

2025-09-11
Radix Sort Beats Hash Tables: A Performance Showdown for Counting Unique Values

In the problem of counting unique values in a large array of mostly-unique uint64s, radix sort, when well-tuned, is typically faster than hash tables. By efficiently utilizing memory bandwidth and cleverly fusing hashing with the sorting process, radix sort achieves up to a 1.5x speedup over tuned hash tables for datasets larger than 1MB, and up to 4x faster than Rust's excellent Swiss Table hash tables. However, radix sort's performance degrades with non-uniform data distributions; using an invertible hash function pre-processes data to maintain efficiency. The article benchmarks both approaches under varying data sizes and access frequencies, and discusses strategy for choosing between them in real-world applications.

Read more
Development

Rotten Tomatoes Inflation: A Hollywood Secret?

2025-08-20
Rotten Tomatoes Inflation: A Hollywood Secret?

A recent observation of nearly every film on Rotten Tomatoes being labeled "Certified Fresh" sparked suspicion. Data analysis reveals a significant rise in Rotten Tomatoes' average score over the past decade, coinciding with Fandango's acquisition. The author suggests Rotten Tomatoes may be manipulating scores by expanding its reviewer pool to include those giving more favorable reviews. While this might boost box office numbers short-term, it's detrimental to the long-term health of the film industry.

Read more

LearnLM Team Acknowledgements: The Minds Behind the Model

2025-09-19
LearnLM Team Acknowledgements: The Minds Behind the Model

The Google Research LearnLM team published an acknowledgement post, expressing gratitude to everyone who contributed to their work. The post lists numerous contributors, ranging from researchers to executive sponsors, highlighting the collaborative nature of the project's success. The progress made on LearnLM is a testament to the collective effort of these individuals.

Read more
AI

Marks & Spencer Hit by Cyberattack, Customer Data Breached

2025-05-14
Marks & Spencer Hit by Cyberattack, Customer Data Breached

UK retail giant Marks & Spencer confirmed a cyberattack last month resulted in the theft of customer personal information. Stolen data includes names, dates of birth, addresses, email addresses, phone numbers, household information, and online order histories. Marks & Spencer has reset online account passwords, but some stores remain disrupted with empty shelves. The ransomware gang DragonForce reportedly claimed responsibility, and other UK retailers like the Co-op and Harrods were also targeted. The UK's National Cyber Security Centre is investigating.

Read more
Tech

Talanoa: A Decade-Long Vision, Finally Realized

2025-04-30
Talanoa: A Decade-Long Vision, Finally Realized

John Martin, a web engineer, conceived the idea for Talanoa, an email application designed like a conversation, back in 2014. Revisiting the idea annually, he finally launched it after realizing no similar product existed in the market. This story highlights the dedication and persistence needed to bring a vision to life and fill a market gap.

Read more
Development

The Parnassus Plays: A Hilarious Look at Elizabethan Academia and the Job Market

2025-03-12
The Parnassus Plays: A Hilarious Look at Elizabethan Academia and the Job Market

The Parnassus Plays, a trilogy of Elizabethan comedies written between 1598 and 1602, offer a satirical look at university life and the struggles of graduates entering the workforce. Following two students, Philomusus and Studioso, the plays use allegory and realistic portrayals to depict their academic journey and subsequent challenges in finding meaningful employment. The plays are rife with allusions to Shakespeare and other contemporary writers, reflecting the intellectual climate of the time and the tensions between university-trained scholars and professional playwrights. Despite the mystery surrounding their authorship, the plays remain a valuable insight into Elizabethan society and the anxieties of ambitious young scholars.

Read more

Running GPT-2 on the GPU with WebGL Shaders: A Hacker's Journey

2025-05-27

This Hacker News hit details the author's experience implementing GPT-2 using WebGL and shaders on the GPU. The article explores the origins and evolution of general-purpose GPU programming, comparing traditional graphics APIs (like OpenGL) with compute APIs (CUDA and OpenCL). The author cleverly leverages textures and framebuffers as a data bus, using fragment shaders as compute kernels to perform neural network operations like matrix multiplication and GELU activation. While acknowledging limitations in shared memory, texture size, and precision, the article showcases the power and potential of GPU programming and demonstrates innovative use of graphics processing techniques for general-purpose computation. The code is available on Github.

Read more
Development

The Rise and Fall (and Rise?) of Literary Criticism

2025-05-29
The Rise and Fall (and Rise?) of Literary Criticism

This essay explores the current state of literary criticism, tracing its lineage back to Henry James's sharp critiques of authors like Dickens. James believed that good criticism stems from a deep understanding and unique perspective on the work, not from superficial praise. The article points out that today's book reviews often lack depth and critical thinking, which not only harms the literary works themselves but also hinders further literary development. The author calls for a return to the Jamesian critical spirit: to examine works with professionalism and a unique perspective, thereby promoting literary prosperity.

Read more
Misc novel art

Near 100% GPU Utilization for Embedding Millions of Documents with Daft

2025-08-17
Near 100% GPU Utilization for Embedding Millions of Documents with Daft

The Daft team achieved near-100% GPU utilization while embedding millions of text documents using the Qwen3-Embedding-0.6B model. This blog post details a three-step data pipeline: text chunking, embedding generation, and distributed processing, providing code examples. They subsequently improved performance by 3x without relying on maximum GPU utilization.

Read more

Modern C Updated: Free Edition Now Available with Full C23 Support

2025-03-27

The free version of the updated Modern C is now available! This release focuses on complete support for the new C23 standard. Key improvements include enhancements to integer types (new _BitInt(N) type, `` and `` headers, 128-bit type support), a nullptr constant, attribute annotations, enhanced type-generic programming (auto and typeof type inference), default initialization, and constexpr. New chapters cover compound literals, lambdas, internationalization, and robust error handling. An appendix and a temporary include header are also included to ease the transition to C23.

Read more
Development C23 standard

Rust's Ownership System: Preventing Memory Errors at Compile Time

2025-02-15
Rust's Ownership System: Preventing Memory Errors at Compile Time

Rust prevents memory management errors at compile time through its ownership system and RAII (Resource Acquisition Is Initialization). Each value has only one owner; ownership can be moved between variables, but a given object cannot be mutably referenced in more than one place at a time. Example code demonstrates ownership transfer: after the ownership of variable `a` is moved to `_b`, accessing `a` again results in a compile-time error, ensuring memory safety. This contrasts with traditional garbage collection; Rust guarantees memory safety through compile-time checks, resulting in improved performance and reliability.

Read more
Development Ownership

LLaMA-Factory: A Unified Framework for Efficient Fine-tuning of 100+ LLMs

2025-09-19
LLaMA-Factory: A Unified Framework for Efficient Fine-tuning of 100+ LLMs

LLaMA-Factory is an open-source framework that enables efficient fine-tuning of over 100 large language models (LLMs), including LLaMA, LLaVA, and Mistral. It integrates various fine-tuning methods (like LoRA, QLoRA, and OFT), offers scalable resources and advanced algorithms, and covers a wide range of tasks such as multi-turn dialogue and image understanding. LLaMA-Factory also supports various inference acceleration techniques and provides a user-friendly interface and API. Constantly updated with support for the latest models and techniques, LLaMA-Factory aims to provide developers with a convenient and efficient tool for LLM fine-tuning.

Read more
Development Open-source Framework

H5N1 Avian Flu: A Deep Dive into the Pandemic Threat

2025-01-01
H5N1 Avian Flu: A Deep Dive into the Pandemic Threat

This article delves into the potential pandemic threat posed by the H5N1 avian flu virus. The virus has already infected birds, cows, and mink, and has now been detected in pigs. While human cases remain relatively low, the author, drawing on epidemiological models and expert forecasts, assesses the probability of a pandemic in the next year (5%), and the potential mortality rate (ranging from comparable to a normal seasonal flu to resembling the 1918 Spanish flu). The article also discusses strategies for responding to a potential pandemic and highlights the economic impact on agriculture.

Read more

Gödel Prize Awarded for Breakthrough in Explicit Two-Source Extractors

2025-06-09
Gödel Prize Awarded for Breakthrough in Explicit Two-Source Extractors

The 2025 Gödel Prize was awarded to Eshan Chattopadhyay and David Zuckerman for their groundbreaking paper, "Explicit two-source extractors and resilient functions," published in STOC 2016 and the Annals of Math 2019. This work significantly improves the construction of Ramsey graphs, achieving an exponential bound far exceeding previous methods. The result is lauded for its implications in derandomization and its surprising application to Ramsey theory, sparking debate about its dual significance in pseudorandomness and combinatorics.

Read more

Honda's Ohio EV Hub: Flexible Manufacturing for the Future

2025-02-02
Honda's Ohio EV Hub: Flexible Manufacturing for the Future

Honda is investing over $1 billion to transform its Ohio facilities into a flexible EV production hub, capable of producing EVs, hybrids, and gasoline cars on the same lines. Starting late 2025, the hub will begin production of the Acura RSX EV, followed by Honda 0 Series SUVs and sedans, and the Sony Honda Mobility Afeela 1. This innovative approach allows for efficient manufacturing of both ICE and EV vehicles, enhancing competitiveness and improving overall production efficiency. The flexible model ensures Honda’s future preparedness for evolving market demands.

Read more

arXivLabs: Experimenting with Community Collaboration

2025-08-27
arXivLabs: Experimenting with Community Collaboration

arXivLabs is a framework enabling collaborators to develop and share new arXiv features directly on the website. Individuals and organizations involved uphold arXiv's values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only partners with those who share them. Got an idea for a project that would benefit the arXiv community? Learn more about arXivLabs.

Read more
Tech

Solar Orbiter Captures Unprecedented Full Sun Image

2025-04-27
Solar Orbiter Captures Unprecedented Full Sun Image

The Solar Orbiter mission, a joint effort between ESA and NASA, has achieved a stunning feat. From a distance of 77 million kilometers, its Extreme Ultraviolet Imager (EUI) captured the most detailed and comprehensive image of the Sun ever taken. Composed of 200 individual images, the resulting picture reveals intricate details of the solar corona, including bright coronal loops, darker filaments and prominences, and the complex magnetic field structures within the Sun's atmosphere. This breakthrough provides invaluable data for scientists studying solar activity and space weather.

Read more

Massive Dataset CommonPool Leaks Sensitive Personal Information

2025-07-31
Massive Dataset CommonPool Leaks Sensitive Personal Information

A new study reveals that CommonPool, a massive dataset containing 12.8 billion image-text pairs, harbors vast amounts of sensitive personal information. This includes credit cards, driver's licenses, passports, birth certificates, resumes, and even sensitive details like medical history and race. Used to train numerous AI models, including Stable Diffusion and Midjourney, CommonPool's over 2 million downloads mean this private information is likely widely disseminated, posing significant privacy risks. Researchers urge greater attention to data privacy and ethical considerations when building large-scale datasets.

Read more
AI dataset

Samuel Pepys' Diary: A Timeless Bestseller

2025-06-11

Samuel Pepys' diary was first published in June 1825 and became an instant success. Newspapers featured reviews quoting memorable passages, such as his descriptions of the Great Fire of London, his new wig, and his first cup of tea. Subsequent editions followed, and by the end of the 19th century, it was celebrated as a classic of British history and literature. Today, Pepys is a star of museum exhibits and historical novels, and excerpts from his diary are used to introduce students to the Restoration period and even to history itself; six-year-olds in England, following the National Curriculum, can recount how Pepys buried his expensive cheese to save it from the fire.

Read more

Leningrad's Forbidden Garden: Botanists' Sacrifice During the Siege

2025-02-04
Leningrad's Forbidden Garden: Botanists' Sacrifice During the Siege

During the brutal 900-day siege of Leningrad in WWII, a group of botanists at the All-Union Institute of Plant Breeding made a harrowing choice: starve rather than consume their invaluable seed bank. Facing unimaginable hunger and death, they prioritized preserving the world's most comprehensive collection of plant specimens, a potential lifeline for future generations. Their story raises profound questions about the ethics of scientific progress versus immediate human needs, the value of preservation, and the complex legacy of sacrifice during wartime. Their actions ultimately contributed to the development of high-yield crops, but their decision to prioritize the future over present survival remains ethically complex and deeply moving.

Read more

Deno Fights Oracle's JavaScript Trademark: A Crucial Discovery Phase

2025-09-19
Deno Fights Oracle's JavaScript Trademark: A Crucial Discovery Phase

Deno, a JavaScript runtime, is battling Oracle over the "JavaScript" trademark. After filing a cancellation petition following a widely signed open letter, they've reached the crucial discovery phase. Facing expensive litigation, Deno launched a GoFundMe campaign to fund professional surveys, expert witnesses, and legal filings to prove "JavaScript" is a generic term, not an Oracle brand. The outcome will determine if trademarks can be used to claim ownership of generic terms and impact the future of open-source development.

Read more
Development

OpenEarable FAQ: Your Questions Answered

2025-05-03

This FAQ covers common questions about OpenEarable, an open-source customizable wireless earbud. It addresses compatibility (Android LEAudio support only), firmware updates (via J-Link debugger), battery life (45-minute charge time), connection troubleshooting (check device drivers, permissions, and Chrome version), and microSD card requirements (exFAT format, Class 10/A30 recommended). The BLE range is up to 10 meters.

Read more

Knowledge Distillation: How Small AI Models Can Challenge the Giants

2025-07-24
Knowledge Distillation: How Small AI Models Can Challenge the Giants

DeepSeek's R1 chatbot, released earlier this year, caused a stir by rivaling the performance of leading AI models from major companies, but at a fraction of the cost and computing power. This led to accusations that DeepSeek used knowledge distillation, a technique potentially involving unauthorized access to OpenAI's o1 model. However, knowledge distillation is a well-established AI technique, dating back to a 2015 Google paper. It involves transferring knowledge from a large 'teacher' model to a smaller 'student' model, significantly reducing costs and size with minimal performance loss. This method has become ubiquitous, powering improvements to models like BERT, and continues to show immense potential across various AI applications. The controversy highlights the power and established nature of this technique, not its novelty.

Read more

Agent Orange's Lingering Legacy: Vietnam's Struggle for Cleanup Amidst US Aid Cuts

2025-04-28
Agent Orange's Lingering Legacy: Vietnam's Struggle for Cleanup Amidst US Aid Cuts

Decades after the Vietnam War ended, the devastating effects of Agent Orange continue to plague millions of Vietnamese people. While the US began providing funding for cleanup efforts in the mid-2000s, Trump-era cuts to foreign aid have cast a shadow over these crucial projects, leaving millions of victims in a precarious situation. The cleanup faces funding shortages and staff reductions, while the science surrounding the long-term health impacts remains incomplete. The article highlights the plight of individuals like Nguyen Thanh Hai, showcasing the enduring suffering caused by Agent Orange and the profound impact of shifting US policy on the Vietnamese people.

Read more

Depot: Revolutionizing Software Builds, Seeking Technical Content Writer

2025-07-23
Depot: Revolutionizing Software Builds, Seeking Technical Content Writer

Rapidly growing software build platform Depot is seeking a technical content writer to help tell the story of how it accelerates build times and improves developer productivity. Depot has redefined how teams build software locally and in CI, making speed a first-class feature. The ideal candidate will be a strong technical writer capable of producing long-form technical blog posts, guides, benchmarks, and product explainers, working closely with engineers to translate technical details into easily digestible content. This is a unique opportunity to shape the company's technical content strategy and is perfect for technical writers looking to make a significant impact in a fast-paced startup environment.

Read more
Development software build

Psilocybin Shows Promise in Treating Depression and Anxiety in Cancer Patients

2025-07-18

A double-blind, crossover trial investigated the effects of psilocybin, a classic hallucinogen, on 51 cancer patients experiencing life-threatening diagnoses and symptoms of depression and/or anxiety. High-dose psilocybin significantly reduced clinician- and self-rated depression and anxiety, improving quality of life, life meaning, and optimism while decreasing death anxiety. These positive effects were sustained at the 6-month follow-up, with approximately 80% of participants showing clinically significant improvements. The study highlights the mediating role of mystical-type psilocybin experiences in achieving therapeutic outcomes.

Read more

AI Solves Factorio's Belt Balancer Conundrum

2024-12-30
AI Solves Factorio's Belt Balancer Conundrum

This blog post details the author's journey in automating the design of Factorio's belt balancers, a notoriously complex problem. Using Mixed Integer Programming (MIP) and Constraint Programming SAT (CP-SAT) solvers, the author tackled the challenge. While the MIP model struggled with numerical instability for larger balancers, the CP-SAT approach, employing discretization of flows and incorporating Banes networks and memory optimization, successfully solved the design for a 16x16 balancer—a feat practically impossible by hand. The process highlights the crucial role of modeling techniques, algorithm selection, and optimization strategies in achieving efficient solutions.

Read more

California Takes Aim at Ultra-Processed Foods in School Meals

2025-03-27
California Takes Aim at Ultra-Processed Foods in School Meals

California has introduced Assembly Bill 1264, the first US bill to phase out certain ultra-processed foods from school meals by 2032. The bill defines ultra-processed foods and tasks scientists with identifying and removing harmful products. This initiative, supported by both Democrats and Republicans, addresses concerns about the health impacts of these foods, including obesity and ADHD. It follows California's previous bans on certain food dyes and chemicals, and mirrors similar legislation emerging in other states, reflecting a growing national focus on food safety and children's health.

Read more

LLM Benchmark: Price vs. Performance Analysis

2025-06-05
LLM Benchmark: Price vs. Performance Analysis

This report benchmarks large language models across various domains, including reasoning, science, mathematics, code generation, and multilingual capabilities. Results reveal significant performance variations across tasks, with strong performance in scientific and mathematical reasoning but relatively weaker performance in code generation and long-context processing. The report also analyzes pricing strategies and shows that model performance doesn't correlate linearly with price.

Read more

NYC Subway Uses Pixel Phones and AI to Revolutionize Track Inspections

2025-02-28
NYC Subway Uses Pixel Phones and AI to Revolutionize Track Inspections

The MTA is testing TrackInspect, a revolutionary system using Google Pixel phones mounted on subway cars. The phones' microphones and motion sensors collect vibration and sound data, which is then AI-analyzed on Google Cloud to pinpoint track defects. The pilot program yielded 335 million sensor readings, and AI accurately identified 92% of defects confirmed by human inspectors. This innovative approach promises fewer delays, faster repairs, and a more reliable subway system, potentially transforming track inspections across the network.

Read more
Tech
1 2 130 131 132 134 136 137 138 596 597