PgDog: A Clever Postgres Proxy for Sharding

2025-04-15
PgDog: A Clever Postgres Proxy for Sharding

PgDog is a network proxy that intercepts all communication between Postgres clients and servers, understands SQL to infer query destinations, and requires no application code changes. It parses SQL queries, extracts sharding keys, uses Postgres's built-in partitioning hash function, and routes queries to the correct database shard. The article details how PgDog handles simple and extended protocols, cross-shard queries, and distributed COPY operations, ultimately enabling Postgres sharding and linearly scaling data ingestion speed.

Read more
Development

Vertical Sharding: A Nightmare?

2025-04-13
Vertical Sharding: A Nightmare?

The author recounts their experience with vertical sharding (functional sharding), highlighting its pitfalls. While it alleviates database load, it fragments the application, forcing the application layer to handle joins and queries that should be handled by the database. This significantly increases code complexity and maintenance overhead, and reduces system availability. Using humor and an uptime formula, the author shows how vertical sharding lowers system stability, ultimately delaying product roadmaps and demoralizing engineers. The article concludes by introducing PgDog, an open-source project aiming to solve Postgres sharding.

Read more
Development vertical sharding

PgDog: Open-Source Sharding for pgvector

2025-03-26
PgDog: Open-Source Sharding for pgvector

Scaling pgvector beyond a million embeddings becomes challenging due to slow index building. This post introduces PgDog, an open-source project that shards the pgvector index. Leveraging IVFFlat's inherent clustering, PgDog distributes vector space partitions across multiple machines. Query vectors are routed to appropriate shards based on proximity to centroids, calculated using scikit-learn, significantly improving search speed and recall. The implementation details cover centroid calculation, a custom sharding function, and SQL parsing using pg_query. Experiments demonstrate PgDog's effectiveness, offering optimizations like parallel cross-shard queries and refined centroid allocation. Future work includes supporting more distance algorithms and SIMD instructions for faster calculations.

Read more
Development sharding

Postgres Sharding: A Thrilling Tale of Scaling to 6x

2025-03-14
Postgres Sharding: A Thrilling Tale of Scaling to 6x

A company faced a challenge with PostgreSQL's write capacity, handling 100,000 users/second. Instead of migrating to NoSQL, the engineering team chose to shard their database. They split the database into 6 instances, syncing data with logical replication. This involved writing Ruby and Python code to handle sharding keys and custom tools to address sequence issues. The successful 6x expansion resulted in the creation of PgDog, an open-source project for automated Postgres sharding. This story highlights the ingenuity and determination of engineers, and the scalability of PostgreSQL.

Read more
Development database sharding