OpenAI's PostgreSQL at Scale: Best Practices and Challenges
At PGConf.dev 2025, OpenAI shared its best practices for using PostgreSQL, revealing insights into database usage at one of the world's leading AI companies. They utilize a single-writer, multi-reader, unsharded architecture, successfully handling massive read loads for its 500 million active users. However, write requests became a bottleneck, prompting optimizations including load control on the primary database (offloading writes, lazy writes), query optimization (avoiding long transactions, optimizing complex queries), addressing single points of failure (prioritizing high-priority requests), and careful schema management (restricting schema changes). Despite these efforts, OpenAI encountered challenges related to index management, observability, and schema change history, prompting suggestions for PostgreSQL improvements. Veteran PostgreSQL expert Lao Feng provided insights based on his experience, showing that many issues could be solved with existing tools or methods, or even using his open-source Pigsty system. Ultimately, OpenAI's PostgreSQL cluster successfully processed over one million QPS, demonstrating the potential of PostgreSQL in large-scale applications.