DeepSeek Chatbot: Data Security Concerns Spark Alarm

2025-02-06
DeepSeek Chatbot: Data Security Concerns Spark Alarm

Security researchers have discovered that the website of DeepSeek, a Chinese AI company whose chatbot became the most downloaded app in the US, contains code that could send user login information to China Mobile, a state-owned telecommunications company banned from operating in the US. The code, found within DeepSeek's web login page, appears to connect to China Mobile's infrastructure and seems integrated into account creation and login processes. While DeepSeek's privacy policy acknowledges data storage in China, this discovery reveals a closer-than-previously-known link to the Chinese state. This raises significant national security concerns and underscores the growing worry about data security and privacy risks posed by Chinese-controlled digital services.

Read more
Tech

Rust's Long War for the Linux Kernel

2025-02-09
Rust's Long War for the Linux Kernel

Rust is making inroads into the Linux kernel, but the transition will be a long and contentious one. While Rust offers significant advantages in memory safety and is backed by companies like Google, its adoption faces strong resistance within the kernel community. Concerns about its steep learning curve and integration challenges with existing C code have sparked heated debates, even described as a “religious war.” However, proponents argue that Rust improves kernel stability and security, attracting more developers. Ultimately, Rust's complete replacement of C depends on technological maturity and community consensus.

Read more
Development

OpenAI's o3-mini: A Budget-Friendly LLM Powerhouse

2025-02-01

OpenAI has released o3-mini, a new language model that excels in the Codeforces competitive programming benchmark, significantly outperforming GPT-4o and o1. While not universally superior across all metrics, its low price ($1.10/million input tokens, $4.40/million output tokens) and exceptionally high token output limit (100,000 tokens) make it highly competitive. OpenAI plans to integrate it into ChatGPT for web search and summarization, and support is already available in LLM 0.21, but currently limited to Tier 3 users (at least $100 spent on the API). o3-mini offers developers a powerful and cost-effective LLM option.

Read more
AI

How I Got 100% Off My Train Travel in the UK

2025-03-19
How I Got 100% Off My Train Travel in the UK

High UK train delays led to a clever money-saving scheme. By predicting delays using strike actions, planned engineering works, and bad weather, the author consistently received full refunds, essentially getting free long-distance train travel. The 'Train Delay Prediction Paradigm' (TDPP) involves monitoring public information to maximize the chances of delays and claiming refunds. While effective, the author advises using this to get work done and to prepare for potentially long journeys.

Read more

Pinterest Improves Embedding-Based Retrieval for Homefeed Recommendations

2025-02-14
Pinterest Improves Embedding-Based Retrieval for Homefeed Recommendations

Pinterest's engineering team significantly improved its embedding-based retrieval system for personalized and diverse content recommendations on the Homefeed. They achieved this through advanced feature crossing techniques (MaskNet and DHEN frameworks), pre-trained ID embeddings, and a revamped serving corpus with time-decayed summation. Furthermore, they explored cutting-edge methods like multi-embedding retrieval and conditional retrieval to cater to diverse user intents, resulting in increased user engagement and saves.

Read more

Genomics Reveals the Origin of Indo-European Languages: An Ancient Secret from the Lower Volga

2025-02-10
Genomics Reveals the Origin of Indo-European Languages: An Ancient Secret from the Lower Volga

A groundbreaking genomics study has unearthed the surprising origins of the Indo-European language family. Researchers discovered that an ancient population from the Caucasus Lower Volga region was the ultimate source of Indo-European languages, sharing close connections with later Yamnaya culture and Anatolian language speakers. The Yamnaya culture spread Indo-European languages across Europe and into the Indian subcontinent through population expansion, with their unique cultural traditions, like kurgan burials, also stemming from the Caucasus Lower Volga people. This research not only reshapes our understanding of Indo-European origins but also showcases the immense potential of ancient DNA technology in tracing human history and cultural diffusion.

Read more

arXivLabs: Experimental Projects with Community Collaboration

2025-03-08
arXivLabs: Experimental Projects with Community Collaboration

arXivLabs is a framework enabling collaborators to develop and share new arXiv features directly on the website. Individuals and organizations involved uphold arXiv's values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only partners with those who share them. Have an idea to enhance the arXiv community? Learn more about arXivLabs.

Read more
Development

Spy Novels and Cryptanalysis: A Literary Look at Sigint

2025-03-10

This article explores the portrayal of cryptanalysis in spy fiction. The author argues that directly describing the cryptanalytic process is difficult to make engaging for readers; successful works focus on characters and plot, not technical details. Using John Buchan and Dorothy L. Sayers as examples, the author analyzes how they cleverly handle cryptanalytic subplots. The article also mentions a few other British novels that touch on intelligence agencies and cryptography, notably recommending Michael Frayn's *The Tin Men* as a satirical take on GCHQ and a pioneering work on AI.

Read more

Pocket-Sized AI Inference: Introducing the Coral USB Accelerator

2025-02-20
Pocket-Sized AI Inference: Introducing the Coral USB Accelerator

The Coral USB Accelerator brings high-speed machine learning inference to your desktop. Simply plug this tiny (65mm x 30mm) device into your USB port (supports Windows, macOS, and Debian Linux, including Raspberry Pi) to unleash the power of its 4 TOPS Edge TPU coprocessor. Boasting impressive power efficiency (2 TOPS per watt), it can run state-of-the-art models like MobileNet v2 at nearly 400 FPS. Leveraging TensorFlow Lite, it simplifies model deployment.

Read more
Hardware AI accelerator

Mako: Blazing Fast, Zero-Config Bundler Redefines Frontend Development

2025-03-09

Mako is a Rust-based frontend bundler boasting zero configuration, exceptional speed, and production-ready stability. It handles TypeScript, Less, CSS, React, and more out-of-the-box without requiring loaders or plugins. Used extensively at Ant Group and rigorously tested across thousands of projects and npm packages, Mako ensures compatibility. Features include Hot Module Replacement (HMR) with React Fast Refresh, built-in code splitting, and module concatenation for optimized performance and developer experience.

Read more
Development frontend bundler

Turning Google Sheets into Handy Web Apps: A Programmer's Tale

2024-12-31
Turning Google Sheets into Handy Web Apps: A Programmer's Tale

An Ars Technica reporter shares his journey of transforming simple Google Sheets into phone-friendly web apps using Glide. Initially created to streamline takeout ordering, the app manages local restaurant information with efficient search and filtering. He expanded his approach to create apps for recipes and pantry items, improving daily life. The article showcases the power of no-code tools and how simple solutions can solve real-world problems, highlighting ingenuity and a quest for better living.

Read more
Development

Frink: A Practical Calculator and Programming Language

2025-03-21

Frink is a powerful calculating tool and programming language designed to simplify physical calculations, ensure accurate answers, and provide a truly useful tool. It tracks units of measure (feet, meters, kilograms, watts, etc.) throughout calculations, allowing transparent mixing of units and verification of results. Frink also boasts a large database of physical constants, supports multiple languages, advanced mathematical functions, unit conversions, date/time math, regular expressions, and graphics, even supporting object-oriented programming and Java code calls. It runs on various operating systems and devices and auto-updates via Java Web Start.

Read more
Development unit tracking

ClickHouse Lock Contention: A Year-Long Performance Bottleneck

2025-03-21

Tinybird experienced a year-long puzzle of extremely low CPU utilization in one of their ClickHouse clusters during peak loads. The root cause was identified as Context lock contention. By adding a `ContextLockWaitMicroseconds` metric to monitor lock wait times and redesigning the Context locking mechanism – replacing a single global mutex with read-write mutexes – performance significantly improved. The article details using Clang's thread safety analysis to debug and resolve concurrency issues, along with benchmark results showing a 3x increase in QPS and substantial CPU utilization gains.

Read more
Development

SpaceX Starship V2 Test Failure: Design Flaws Cause Delay

2025-03-12

Anonymous sources suggest that parts of SpaceX's Starship will require a major redesign after its break-up shortly after stage separation on its last two test flights. The issues stem from fundamental miscalculations in the design of Starship V2, specifically within the fuel lines, engine wiring, and power unit, requiring urgent rework. The fate of S35 and S36 is unclear, with potential for revision or scrapping. Production of subsequent ships may be paused until design issues are resolved. Leaks suggest the next test flight is delayed until after June. However, the author believes the situation may not be as dire, as the issues seem localized and fixable. Furthermore, the FAA is no longer an obstacle, allowing SpaceX to lead the investigation and implement fixes.

Read more

Bocoup Goes Worker-Owned: Focusing on Public Interest Tech

2025-03-03

Software consultancy Bocoup has transitioned to a worker-owned cooperative, with each team member becoming a worker-owner. They're sharpening their focus on developing capture-resistant, privacy-preserving technology for the public good, continuing their commitment to interoperability, accessibility, and robust testing. Bocoup retains its existing corporate entity, meaning existing contracts remain unchanged, and they are committed to serving clients focused on public interest. They champion equal pay, four-day workweeks, and personal growth, aiming to build a more equitable model of prosperity.

Read more

Cinder JIT: Efficient Type Representation Using Bitsets and Semilattices

2025-03-11
Cinder JIT:  Efficient Type Representation Using Bitsets and Semilattices

The Cinder JIT compiler employs a clever type representation, treating types as sets (even lattices) and choosing a compact bitset representation. This article delves into how Cinder leverages bitsets and semilattice structures for efficient type information handling, covering basic type representation, type unions, and specialization. By encoding type information into bitsets, Cinder effectively represents type unions and allows for finer-grained type distinctions. Furthermore, Cinder introduces a specialization mechanism to track the specific value of individual objects, further improving compiler optimization efficiency. The article also discusses the Bottom type and details on generating the type lattice.

Read more
Development bitsets

DC Shocker: Passenger Jet Collides With Black Hawk Helicopter

2025-01-30
DC Shocker: Passenger Jet Collides With Black Hawk Helicopter

An American Airlines passenger jet collided with a Black Hawk helicopter mid-air near Ronald Reagan Washington National Airport in northern Virginia. The incident resulted in a shutdown of flights at the airport, with search and rescue teams currently searching the Potomac River for survivors. Eyewitnesses reported a large explosion and loud noise. Social media posts show footage of the explosion from a Kennedy Center webcam and what appears to be a subsequent search and rescue operation. Casualty information is pending.

Read more

Meta Faces Legal Trouble Over AI Training Data Copyright

2025-03-11
Meta Faces Legal Trouble Over AI Training Data Copyright

Meta is facing a lawsuit alleging it illegally removed copyright management information (CMI) from material used to train its AI models. Authors Richard Kadrey, Sarah Silverman, and Christopher Golden accuse Meta of using their work to train its neural networks without permission and removing CMI to obscure its actions. A judge ruled that Meta must answer to claims of violating the Digital Millennium Copyright Act (DMCA), signaling that the copyright implications of AI model training data are set to face more legal scrutiny. While some claims were dismissed, the case's progression could set a precedent for other similar lawsuits, with the Tremblay lawsuit against OpenAI being amended with new evidence.

Read more
Tech

Firefox Terms of Use: A Deep Dive

2025-02-28
Firefox Terms of Use: A Deep Dive

Firefox, the free and open-source web browser, operates under a comprehensive set of Terms of Use outlining the agreement between users and Mozilla. These terms cover software licensing, intellectual property rights, user feedback, terms for optional features, updates and termination, user responsibilities, limitations of liability, and disclaimers. Users must adhere to Mozilla's Acceptable Use Policy, refraining from infringing on others' rights or violating applicable laws. Mozilla disclaims liability for losses incurred through Firefox usage but commits to notifying users of service suspensions or terminations. California law governs the agreement.

Read more
Development Terms of Use

LLMs Explain Linear Programs: From Side Project to Microsoft Research

2025-02-10

Back in 2020, while working in Google's supply chain, the author developed a side project to help understand linear programs (LPs). When LPs become complex, understanding their results is challenging even for experts. The author's approach involved interactively modifying the model and diffing the results to explain model behavior, finding that adding semantic metadata simplified the process. Recently, Microsoft researchers published a paper using Large Language Models (LLMs) to translate natural language queries into structured queries, achieving a similar outcome. The author believes LLMs are a great fit for translating human ambiguity into structured queries, processed by a robust classical optimization system, with results summarized by the LLM. While the author's early work remained unpublished, he argues that understanding explanations of simpler systems is crucial for explaining more complex AI systems.

Read more

Advanced Git Configuration: How Core Devs Configure Git

2025-02-25
Advanced Git Configuration: How Core Devs Configure Git

This post delves into lesser-known Git configuration settings that can significantly improve the Git experience. The author shares the best configurations discovered by Git core developers during a "Spring Cleaning" experiment, categorized into three groups: settings that demonstrably improve Git (like improved branch sorting, diff algorithms, push and fetch operations), harmless but occasionally helpful settings (like autocorrect prompting, showing diffs on commit, reusing conflict resolutions), and settings based on personal preference (like improved merge conflict handling, rebase defaults, and filesystem monitoring). Each setting's function is explained in detail with corresponding commands, helping readers optimize their Git configurations for increased efficiency.

Read more
Development Configuration

Beyond Vector Databases: Efficient Text Embedding Processing with Parquet and Polars

2025-02-24
Beyond Vector Databases: Efficient Text Embedding Processing with Parquet and Polars

This article presents a method for efficient text embedding processing without relying on vector databases. The author uses Parquet files to store tabular data containing Magic: The Gathering card embeddings and their metadata, and leverages the Polars library for fast similarity search and data filtering. Polars' zero-copy feature and excellent support for nested data make this approach faster and more efficient than traditional CSV or Pickle methods, maintaining high performance even when filtering the dataset. The author compares other storage methods such as CSV, Pickle, and NumPy, concluding that Parquet combined with Polars is the optimal choice for handling medium-sized text embeddings, with vector databases only becoming necessary for extremely large datasets.

Read more
Development text embeddings

Orange Pi RV2: An Octa-Core RISC-V SBC for $30

2025-03-09
Orange Pi RV2: An Octa-Core RISC-V SBC for $30

Orange Pi has launched its second RISC-V single-board computer, the RV2, featuring an octa-core Ky X1 processor with a 2 TOPS AI accelerator, starting at just $30. This upgrade from their quad-core model boasts enhanced performance, dual Gigabit Ethernet ports, and dual PCIe 2.0 x2 connectors. It also supports WiFi 6, Bluetooth 5.0, and a variety of interfaces, with 2GB, 4GB, and 8GB LPDDR4X memory options. While the Ky X1's single-core performance boost isn't groundbreaking, it offers excellent value for a RISC-V board in this price range.

Read more

Ory Hydra: The Open-Source OAuth2 Server Powering ChatGPT

2025-03-20
Ory Hydra: The Open-Source OAuth2 Server Powering ChatGPT

Ory Hydra, initially a Go-based Keycloak alternative, evolved from a less flexible initial design to become a robust OAuth2 server. Focusing on building Ory Fosite, a library for OpenID Connect-compliant OAuth2 servers, and simplifying by removing user management, Ory Hydra now boasts impressive performance, reaching thousands of auth flows per second. The project's success is highlighted by its use in OpenAI's OAuth2 infrastructure, showcasing the importance of choosing clear, scalable technology and continuous optimization. This open-source project demonstrates a compelling journey from a student project to powering web-scale services.

Read more
Development

Efficient 2D Modality Fusion into Sparse Voxels for 3D Reconstruction

2025-02-21

This research presents an efficient 3D reconstruction method by fusing data from various 2D modalities (rendered depth, semantic segmentation results, and CLIP features) into pre-trained sparse voxels. The method utilizes a classical volume fusion approach, weighting and averaging 2D views to generate a 3D sparse voxel field containing depth, semantic, and language information. Examples are shown using rendered depth for mesh reconstruction via SDF, Segformer for semantic segmentation, and RADIOv2.5 and LangSplat for vision and language feature extraction. Jupyter Notebook links are provided for reproducibility.

Read more

Farallon Islands: A Crucial Wildlife Refuge

2025-03-04
Farallon Islands: A Crucial Wildlife Refuge

Located nearly 30 miles off the coast of San Francisco, the Farallon Islands National Wildlife Refuge is home to hundreds of thousands of seabirds and thousands of seals and sea lions. Since 1968, Point Blue Conservation Science has partnered with the U.S. Fish and Wildlife Service to conduct research and train the next generation of scientists on the islands, working to conserve and restore this complex ecosystem in the face of climate change and other threats. Due to the sensitive seabird and mammal breeding grounds, the islands are closed to the public, accessible only to a small number of wildlife biologists and resource managers.

Read more
Misc ecosystem

Simplicity Wins: The Essence of Great Software Design

2025-03-07

This article argues that great software design isn't about complex language features or architectures, but about eliminating potential failure modes. The author uses personal anecdotes to illustrate how removing redundant components, centralizing state management, and using robust systems minimizes risk and increases reliability. The core message is that good design is simple and reliable, avoiding flashy features and focusing on solving problems. The author cites the Unicorn web server as a prime example of this approach.

Read more
Development Failure Modes

Zig: Reflections After Months of Use

2025-02-05

After months of using Zig, the author offers a mature perspective. The article details both strengths and weaknesses. Strengths include arbitrary-sized integers, packed structs, generics as type-level functions, and excellent C interop. Weaknesses center around insufficient error handling, the prohibition of shadowing variables, the uncertainties of compile-time duck typing, the lack of typeclasses/traits, and misconceptions about memory safety. The author concludes that Zig sacrifices memory safety and robustness for simplicity, posing risks in large projects, ultimately leading to the decision to abandon its use.

Read more
Development
1 2 305 306 307 309 311 312 313 483 484