Logging, Metrics & Distributed Tracing – These are Problems, not Solutions!

To achieve a real understanding of software for the purpose of managing and changing we need a deeper level of observation that is as close as possible to what it is software actually does. Software does not log. Developers write logs calls to create an echo chamber.

Iterative Performance Benchmarking of Apache Kafka – Part 2

A second article in a series demonstrating how iterative performance benchmarking analysis can be applied to the latest release of Apache Kaka Java – a highly optimized publish-subscribe messaging technology based around the concept of a distributed commit log.

Sub-microsecond Code Profiling of Google Cloud DataFlow

An initial investigation into the performance model (profile) of Google Cloud DataFlow which provides a simple, powerful programming model for building both batch and streaming parallel data processing pipelines.

Iterative Performance Benchmarking of Apache Kafka – Part 1

An article demonstrating how iterative performance benchmarking analysis can be applied to the latest release of Apache Kaka – a highly optimized publish-subscribe messaging technology based around the concept of a distributed commit log.

Beyond Big Data – Mirrored Algorithmic Simulation

With the proliferation of IoT devices embedded in households under the banner of “smart home” initiatives there is good cause for concern regarding the privacy afforded to consumers especially when such devices share data, intentional or not, with other devices or services.

Performance Measurement Budgeting of Apache Cassandra 3.0

In this post, performance measurement budgetary control is applied alongside trace sampling to further reduce the impact of nested call (method) measurements on the accuracy of the measure caller chain.

Spot Checking Apache Cassandra 3.0 Performance Hotspots

This article attempts to verify the accuracy of a previous Apache Cassandra hotspot analysis captured after many benchmarks iterations in which that bytecode instrumentation is adapted, both online and offline, based on what is learnt from the performance model exported after each stress test.

Advance Adaptive Profiling of Apache Cassandra 3.0

With Apache Cassandra 3.0 just released it is time to check out whether previous performance hotspot analysis of the codebase using Satoris still stands with this new major release.

Policing Software Service Processing – Part 2

Part 2 in a series of articles on effective ways of controlling concurrent code execution using techniques such as quality of service (QoS), adaptive control valves, circuit breakers and back-pressure as promoted by proponents of reactive programming.

Policing Software Service Processing – Part 1

Part 1 in a series of articles on effective ways of controlling concurrent code execution using techniques such as quality of service (QoS), adaptive control valves, circuit breakers and back-pressure as promoted by proponents of reactive programming.

A Gap in Time – Measuring What Came Before

This article outlines the reason behind and usage of the gap metering extension included in Satoris which unlike possibly all other application performance monitoring solutions focuses on what transpires between consecutive service request processing by a particular thread for the same entry point method.

Microbenchmarking Big Data Solutions on the JVM – Part 2

The second part in a series of articles looking at the microbenchmarking of big data solutions running on the JVM. In this part the performance model is further refined over a number of configuration steps, each step building on the previous steps with the purpose of deriving a smaller, simpler and more relevant model of …

Microbenchmarking Big Data Solutions on the JVM – Part 1

The challenge in measuring not just the microbenchmark code but the underlying method invocations within the software under test (performance analysis) is ensuring that the required instrumentation, measurement and collection does not perturb the system to such an extent that the system is an entirely different system (measurement is not relevant) and the data collected …

Software Tracing – Static, Dynamic, Adaptive and Simulated

There are a few interpretations of “static tracing” but at this point I can only assume that the above tweet is referring to the more common case of traces (probes) being explicitly coded or compiled into software at build time. The reason for the different possible interpretations of static, dynamic and adaptive is that in the …

The “New Possible” in Application Monitoring and Management

This week Simz 2.3 broke all previous benchmark records in simulating 270 million metered network streamed calls a second on a Google Cloud n1-highcpu-32 machine type instance. That is 540 million call events a second – 32 billion events a minute. The software execution calls originated in 28 client JVMs also running on a n1-highcpu-32 …

The Power of Now in the Performance Analysis of JVM Applications

I’ve long fascinated over how best to perceive the behavior of software machines that for the most part appear as black boxes; consuming input we feed them and producing output we consume, directly or indirectly. I cannot help feeling there is a lost beauty in the motion of action that needs to be rediscovered in …

Beyond Metrics and Logging with Metered Software Memories

A proposal for a different approach to application performance monitoring that is far more efficient, effective, extensible and eventual than traditional legacy approaches based on metrics and event logging. Instead of seeing logging and metrics as primary datasources for monitoring solutions we should instead see them as a form of human inquiry over some software …

Software Performance Optimization Heuristics: Fast, Frugal and Factual

The following is a graphic I’ve used in the past to frame various software performance optimization techniques. It is not a comprehensive inventory of all software performance optimization techniques (or concerns) but I’ve found it serves a purpose in managing the amount of effort that, in general, should be spent on each technique outside of …

Software Regulators! Mirror Outwards, Simulate Inwards.

The Good Regulator Theorem states “every good regulator of a system must be a model of that system”. But what exactly would such a model look like? What elements should the model contain and how might they be related and reasoned about? The theorem itself does not address this so in this article I present …



Clients are offered expertise in the performance tuning, monitoring and management of JVM runtimes executing Java, Scala, Clojure, JavaScript (Nashorn/Rhino) and Ruby (JRuby) developed applications, with particular experience in scaling and optimizing high frequency low latency request processing systems.


Using self-adaptive instrumentation and measurement tooling, performance and scalability problem identification is all but guaranteed. Within a matter of minutes, measuring a representative workload, various potential bottlenecks and optimization calls sites will be accurately identified.


Efficient data collection coupled with unique software execution visualizations ensures that all parties involved in a performance investigation will gain an unprecedented insight into the execution nature and resource consumption patterns of applications and more importantly, a high degree of confidence in report findings.


Through distributed software recording and simulated playback the time spent in performance measuring an application under observation and analysis is greatly reduced. This allows much of the investigative work to be moved outside of business critical operating windows.



The software execution model is focused on the algorithmic and resource consumption behavior of a particular processing pattern such as a transaction, service request or workflow.

Imagine you need to drive across town from location A to B. You use a navigation system to plan the steps and to estimate the expected time of arrival.

Different navigation systems (algorithms) and options (context) will likely result in a different route plan.


The navigation system creates a basic route plan and calculates the time for each leg based on the distance and allowed speed. But it naively assumes that no other driver is on the road and sharing the same time and space. It also is unaware of possible roadworks and accidents that will result in a delay or detour.

The system execution model looks at the impact of sharing resources across concurrent and competing processing call flows. What are the utilization levels? How is resource consumption policed? What additional costs and penalties are incurred in the co-ordination of sharing? Is the policy fair? What variations in performance is introduced due to contention? Is prolonged resource starvation possible?


Naturally when we do drive and encounter obstacles or delays we change course. More importantly this information is retained and used to train future behaviors and predictions.

The software adaptation model looks at what degree of self and situational awareness does software hold, create and manage. Can changes in the environment and variation in the performance be sensed? Can the software reason about what is sensed and alter its behavior accordingly? Can the software reason about the effectiveness of its reactions and reinforce good behavioral patterns and adaptations?


Traffic and transportation management is crucially important for a city, especially as many suffer from traffic congestion. In our drive through the city we encounter many methods to control traffic flow in order to globally minimize delay or prioritize particular traffic types. These include signaled junctions, roundabouts, bus and car poll lanes, metering ramps, traffic flow signage, etc.

The system dynamics model looks at the nature of processing (flows) and how adaptive policies around resources (stocks) can be used to influence the processing behavior in order to optimize throughput or response, increase resilience (to surges) as well as to improve stability in performance.



In projecting software execution behavior and contextual state across space and time, software engineers have the capability to develop new and augmented systems that bridge the past, present and future, allowing software machines to transcend structures formed in the early stages of design and over the course of extemporaneous reactive change.


Your software has memory but no memories. What if software could recall past memories for the purpose of learning? What if we could observe machine memories to more effectively reason about complex software execution behavior? Post-execution simulated playback is done without access to the application bytecode (or source code)


To fight current levels of complexity in IT systems we must look to imbue software with the ability to sense, perceive, reason and act locally with immediacy. Software must adapt not simply react. Feedback signals need to flow freely across machine boundaries as well as man-and-machine interfaces.


“They [autoletics] are more autonomous and independent because they cannot be as easily manipulated with threats or rewards from the outside. At the same time, they are more involved with everything around them because they are fully immersed in the current of life.”