The second part in a series of articles looking at the microbenchmarking of big data solutions running on the JVM. In this part the performance model is further refined over a number of configuration steps, each step building on the previous steps with the purpose of deriving a smaller, simpler and more relevant model of the microbenchmark to facilitate more targeted code inspection as well as a better understanding of the nature of execution flow across method, class and package boundaries.
The challenge in measuring not just the microbenchmark code but the underlying method invocations within the software under test (performance analysis) is ensuring that the required instrumentation, measurement and collection does not perturb the system to such an extent that the system is an entirely different system (measurement is not relevant) and the data collected is neither accurate or actionable. This challenge can be overcome with adaptive profiling of the codebase, both online and offline. Here this advanced application performance monitoring approach is applied to Apache Spark.