Spicing up a high-load, low-latency REST service
Quiz question. You have a low-latency, high-load service running on 42 virtual machines, each having 2 CPU cores. Someday, you migrate your application nodes to five beasts of physical servers, each having 32 CPU cores. Given that each virtual machine had a heap of 2GB, what size should it be for each physical server?
So, you must divide 42 * 2 = 84GB of total memory over five machines. That boils down to 84 / 5 = 16.8GB per machine. To take no chances, you round this number up to 25GB. Sounds plausible, right? Well, the correct answer appears to be less than 2GB, because that’s the number we got by calculating the heap size based on the LDS. Can’t believe it? No worries, we couldn’t believe it either. Therefore, we decided to run an experiment.
We have five application nodes, so we can run our experiment with five differently-sized heaps. We give node one 2GB, node two 4GB, node three 8GB, node four 12GB, and node five 25GB. (Yes, we are not brave enough to run our application with a heap under 2GB.)
As a next step, we fire up our performance tests generating a stable, production-like load of a baffling 56K requests per second. Throughout the whole run of this experiment, we measure the number of requests each node receives to ensure that the load is equally balanced. What is more, we measure this service’s key performance indicator – latency.
Because we got weary of downloading the GC logs after each test, we invested in Grafana dashboards to show us the GC’s pause times, throughput, and heap size after a garbage collect. This way we can easily inspect the GC’s health.
This blog is about GC tuning, so let’s start with that. The following figure shows the GC’s pause times and throughput. Recall that pause times indicate how long the GC freezes the application while sweeping out memory. Throughput then specifies the percentage of time the application is not paused by the GC.