Elastic Cloud Serverless: Unstable Throughput and Packet Loss on Azure AKS

2025-06-10
Elastic Cloud Serverless: Unstable Throughput and Packet Loss on Azure AKS

Elastic's SRE team observed unstable throughput and packet loss in Elastic Cloud Serverless running on Azure Kubernetes Service (AKS). Investigation revealed RX ring buffer overflows and kernel input queue saturation on SR-IOV interfaces as the root cause. Increasing RX buffer sizes and adjusting the netdev backlog significantly improved network stability. The experience highlights that even with high-performance hardware, OS-level network parameter tuning is crucial for optimal performance.

Development