Unexpected CPU Performance Boost from Data Structure Optimization

2025-08-26

A program processing a large dataset encountered memory and CPU performance bottlenecks. Initially using a single array to store data resulted in up to 1GB of memory consumption. By employing data-oriented programming, splitting the data into multiple arrays saved approximately 200MB of memory. Further optimization involved replacing a string array with byte array indices for field names, further reducing memory usage. Surprisingly, this change also significantly decreased CPU usage. The reason lies in the garbage collection mechanism: processing a string array requires the GC to traverse all string objects, while processing a byte array doesn't, thus drastically reducing GC overhead.