Spot Checking Apache Cassandra 3.0 Performance Hotspots

This article is a follow-up to a previous article, Advance Adaptive Profiling of Apache Cassandra 3.0, that attempts to verify the accuracy of the hotspot analysis captured after many benchmarks iterations in which that bytecode instrumentation is adapted, both online and offline, based on what is learnt from the performance model exported after each stress test. The performance model was generated with hotspot thresholds of 2 microseconds and 1 microseconds for inclusive and exclusive (inherent) timings respectively. These set points are extremely aggressive and far from the defaults of 10 microseconds and 2 microseconds which themselves already low for any sort of production usage, especially when compared with the typical high measurement overhead that nearly all other profilers exhibit.

As stated in the previous article, dropping down threshold level for both hotspot set points is going to incur a greater impact on response times and throughput. But when the goal is to more precisely identifying potential optimization target methods this can be an acceptable trade-off – outside of a production environment.

For verification the instrumentation coverage set will remain as defined by the very last snapshot, performance model, exported by the Satoris agent. What will change in this new round of a benchmark testing will be the degree of measurement that takes place when the instrumentation is fired before and after a hotspot method is invoked. Below is the last performance model published in the previous article. This will act as the baseline for future comparison purposes. It is important to take note of the metering counts, especially the repeated 3 million frequency, as this will change across probes as we change the frequency of measurement.

Click on the images below to get the full technicolor table display.

adaptive.benchmarking.cassandra.3.t2ti1.satoris.2.6.5
The first approach to verify the accuracy of the above measurements would be to perform some kind of spot check on every N occurrence for a particular method execution without measuring, metering, all the direct or indirect called methods. This is exactly what the Satoris spotcheck metering extension provides. With the optional spotcheck extension enabled alongside the hotspot metering, enabled by default, we get the following performance model after executing a single stress test.

apache.cassandra.3.0.verification.spotcheck.table

The frequency of many probes, methods, has dropped down from +3 million to +75K. With spotcheck randomly selecting a method and not metering any methods within the scope of a selected method the total clock time is now equal to the inherent (self) clock time. The ordering of probes is not important here due to the randomness introduced into the measurement. What needs to be compared with the above performance model are the inclusive average wall clock time timings.

Below is a table comparing a selection of the probes most with significant differences in the average inclusive clock timings and the average exclusive (self/inherent) clock timings. These probes are most likely to be impacted by the decision to meter direct and indirect callee method executions within the scope of metered execution.

apache.cassandra.3.0.verification.spotcheck.compare

Another way of selectively metering, measuring, instrumented hotspot methods is offered by the Satoris tracepermit metering extension. The tracepermit extension limits the degree of concurrent metering performed by the runtime. The decision to meter a particular call trace (path) is determined at the top most traced probe (entry) point with the result used in deciding whether all direct and indirect nested firing probes are metered or not. The underlying mechanism uses a pool of permits which are requested at the entry point to the trace and then subsequently returned to the pool following the completion of the trace. If no permit is available in the pool at the time of the top most trace probe firing then no metering is performed until the thread completes the entire execution and re-enters the same or other trace probe.

apache.cassandra.3.0.verification.tracepermit.table

With a default permit pool size of 1 the net effect is that only one thread, or trace, is ever metered at any one moment. Any drop in the metering count from the initial performance model above will reflect the degree of concurrency. A benefit of the tracepermit extension over the spotcheck is the accurate restoration of the inherent timings as nested method invocations are now metered. This can be seen in the revised performance model above and comparison table below.

apache.cassandra.3.0.verification.tracepermit.compare

An alternative to the tracepermit metering extension is the tracepoint extension which enables or disables the metering of all firing probes including nested ones based on a specified sampling frequency (default = 100).

apache.cassandra.3.0.verification.tracepoint.table

The tracepoint metering extension is far more aggressive than the spotcheck in reducing measurement because the sampling decision happens at the trace entry point level and extends downwards to direct and indirect callees.

apache.cassandra.3.0.verification.tracepoint.compare

An issue with the introduction of some form of sampling into the metering of instrumented methods is the impact on the hotspot analysis which now takes longer in reaching the unmanaged state at runtime.  To address this the hotspot upper credit limit can be reduced down from 100K to 2K. It should not impact the counts so much more so any possible measurement overhead as unmanaged is the most efficient measurement state for a probe save for disabled which eliminates metering of a probe altogether.

apache.cassandra.3.0.verification.tracepoint.hotspot.u2k.table

And the finally comparison table.

apache.cassandra.3.0.verification.tracepoint.v2.compare