This article outlines the reason behind, and usage of, the
gap metering extension included in Satoris which unlike possibly all other application performance monitoring solutions focuses on what transpires between consecutive service request processing by a particular thread for the same entry point method. Measuring, or metering, the gap between consecutive root method invocations is important in trading environments as it offers an insight into possible performance bottlenecks further upstream (before) in the order submission. If throughput for each thread within a JVM process is tracked then observing and analyzing the gap between events, that constitute the actual throughput work, is incredibly important. The larger the gap the less throughput achieved. In some low latency environments that gap can be far greater than the service time but yet it is not readily observed or analyzed.
But how can an activity based metering engine track and present what is effectively a blind spot outside of any measured activity? It’s a strange sort of performance blind spot because nothing is actually occurring in terms of what is observed – call invocations. The
gap metering extension addresses this by assigning the time, or metering, differences between the end of the last activity and the beginning of a new activity, by the same thread, to the subsequent activity. For every meter enabled within the metering runtime the
gap metering extension creates an additional meter that is prefixed with
"gap". The changes in the underlying meter measure are effectively carried forward into the new activity and assigned to it. This only happens if both the previous and next (current) activities are the same probe name.
Here is a simple snippet of code that will be used to demonstrate the
gap metering extension in action. The
outer() method, called repeatedly in a
while loop, will initially represent the actual service entry point. The call
spin(50_000L) will be the gap that occurs between each consecutive
outer() call. The
outer method takes approximately 125 microseconds to execute with 100 microseconds spent within
outer and 25 microsecond within
inner, called by
spin method is not instrumented.
Below is a sample metering snapshot of the above code executed within the Satoris agent loaded. The
gap metering has not been enabled yet so we only have the basic
clock.time meter included in the model.
Here is the software execution behavior visualized by the
blink metering extension. The white bar represents the
outer method and the green the
jxinsight.override.config file enables the
gap metering extension and excludes main method from being augmented by the
gap metering extension.
j.s.p.gap.enabled=true j.s.p.gap.filter.enabled=true j.s.p.gap.filter.exclude.name.groups=samples.Gaps.main
Below is a sample snapshot of the resulting performance metering model. There are now two metering rows per probe name –
gap.clock.time. The only probe to have an actual value assigned to the
gap.clock.time total column is
samples.Gap.outer. The average
gap.clock.time for outer is 51 microseconds. Close to the explicit delay placed in the code.
In the following revised configuration the
gap metering extension augmentation is restricted to the
j.s.p.gap.enabled=true j.s.p.gap.filter.enabled=true j.s.p.gap.filter.include.name.groups=samples.Gaps.inner
The following table is sample of a metering model generated with the above configuration applied. Note that both
gap.clock.time metering assignments. That is because
inner method is enclosed, executed, within the scope of
outer method so any metering changes applied to
inner method (probe) will propagated upwards the caller (probe) chain. Only the
inner method has the inherent (self)
gap.clock.time attributed to it. The average has also changed from 51 microseconds to 152 microseconds because the gap now relates to time interval between consecutive calls of
inner and not