PostgreSQL Multi-AZ Cluster Transaction Visibility Issue: A Jepsen Report Deep Dive

A recent Jepsen report highlights a long-standing transaction visibility issue in Amazon RDS for PostgreSQL Multi-AZ clusters: the order in which transactions become visible differs between the primary and replicas. This doesn't cause data loss or corruption, and doesn't affect single-AZ deployments or Aurora databases. The issue relates to the 'Long Fork' anomaly, violating Snapshot Isolation. The post details the root cause (asynchronous updates to ProcArray and WAL), illustrating how it leads to inconsistent results (e.g., Alice and Bob observing different rankings of a Hacker News article). While rarely impacting application correctness, fixing it is crucial for enterprise-grade PostgreSQL clusters. AWS is collaborating with the PostgreSQL community to resolve this, offering workarounds like reviewing application assumptions about transaction ordering and using explicit synchronization.