At the weekend I posted a video on Vimeo titled “A Time Machine for Augmenting Past Software Execution in the Cloud”. In the video I demonstrate how using an Aspect Oriented Programming (AOP) like programming interface to the Satoris metering engine, that is embedded within Simz and Stenos, you can effectively intercept software execution behavior across space, machine and process, as well as time, online and offline.
Imagine writing a method invocation interceptor class that is called within a process when a method is invoked by a thread. This is something that has been possible for sometime using various frameworks, such as Spring and JEE/CDI, and AOP technologies, such as AspectJ. Now imagine that same interceptor class is able to intercept method invocations across multiple Java runtimes and threads and not actually be present within each of the runtimes without a single line of code change – a mirrored runtime down to the thread and call stack as well as some environment state. Now lets go even further and imagine the very same interception class being able to receive the same call backs within the same mirrored threads from a past recording, that can be repeatedly run. Again no changes though the space and time aspects of the entire environment has changed. That is exactly what plays out in the video demonstration. The power of this is not just in the distribution of the interception across multiple machines but in the ability to go back in time in the execution of software and augment it with future capabilities.
In the video I walk through four main distributed application monitoring scenarios. In the first part of the video I instrument a DataStax Cassandra server with Satoris configured to send the metering data measured by the instrumentation applied at runtime to a remote Simz server. A monitoring console is connected the Simz server, completely oblivious to the fact that the runtime it is monitoring is merely a mirror of another runtime. I then configured the Simz runtime to install a custom interceptor implementation I had written to output the call profile history for any request exceeding a specified clock time threshold. I could just as easily have installed the interceptor in the actual Cassandra server runtime but then I would have had to worry about possible slowdowns caused by the interception and it’s writing to the System.out stream, as the interception occurs within the thread of execution that is being metered (measured).
In the second part of the video I enabled the Stenos extension within the Simz runtime to create a binary metering recording, similar to what is transmitted over the wire, that can be played back when the Cassandra server is not actually running. Finally using Stenos I recreate the entire simulated environment and software execution behavior, but this time with the interceptor extension enabled. The interceptor, now in an offline mode, behaves the exact same as when online within the Simz runtime. Importantly this is achieved without having to query a database, instead the interceptor experiences the software execution behavior as if it was present within the runtime of the application when it originally occurred.
Here are the two interfaces within the Probes Open API that must be implemented by any interceptor wishing to become a time lord of the Java and JVM universe.
Below is the actual source code of the extension I used in the above video demonstration. Most of the code is related to environment configuration and reporting. Basically the
InterceptorFactory creates an instance of the
Interceptor class for each thread that is metered (and possibly simulated) within the runtime (real or not, online or offline). The
Interceptor class then keeps track of whether a call trace has been started or ended for an associated thread. When it enters into a trace it creates a
SavePoint and when it exits from the trace it generates a
ChangeSet using the previously stored
SavePoint for the same thread
Context. It then checks whether the delta for the clock time
Reading exceeds a threshold and if so it dumps the
Change instances within the
Here is a sample of the output produced by the above custom interceptor during the startup of the DataStax Cassandra server.
Below is a table I’ve found useful in explaining the underlying model of the metering engine and the mapping from concrete to abstract in order to allow Simz and Stenos simulate any type of software execution behavior, and not just within the Java runtime. It is not meant to be a comprehensive mapping and the naming within the generic model is debatable but hopefully it helps in making the above code appear less foreign and complex when it really is very simple and natural.
Can’t a time lord travel, or see, into the future? Using the Probes Open API I can simulate the future execution of method invocations and the timing of such invocations before the code has been written. This is great for testing interceptors that look to detect faults that have yet to occur in production.
– Dynamic runtime adaptation of software execution behavior should be done by way of behavioral signals and signal aware software libraries. Probe interceptors and metering extensions should only influence software execution behavior through the raising and dampening of “adaptive” signals, not through direct changes in data structure and call argument states.
– Developers and Operations should be able decide the degree of isolation for any dynamic tracing or runtime augmentation. This evolved into the possibility to distribute probe interceptors and metering extensions across process and machine boundaries.
– To support distribution and isolation there needed to be no direct class or method references within the interception or metering extension code. This should all be done through a reflective namespace and abstract runtime model, which the Probes Open API had already provided in Satoris, in those days it was called OpenCore.
– It should be possible to simulate and test interceptors and metering extensions in a metering sandbox. This turned into the ability to record calls into the Probes Open API and then play them back in a separate process, offline or online, with the entire application runtime mirrored including thread and call stack creation as well as meter readings.
– The performance impact on the runtime should be negligible, whether in-process or out-of-process. This is the most crucial success factor and the one that other tools have in the past, and still to this day, failed to achieve. With the Satoris hotspot metering extension I was able to instrument a significant amount of any codebase and then at runtime whittle this down using self-aware and self-adaptive capabilities of both the instrumentation and measurement code.
– The probes interceptor or metering extension should not be concerned with the actual machine or process boundaries of the intercepted thread execution. The software execution behavior is paramount. This lead me to envisaging a kind of Matrix for the (Java Virtual) Machine in which the real-time behavior of multiple connected JVMs streamed into a single observation and control plane were the stream was in turn simulated as some kind of machine consciousness. I later came across research work into mirror neurons in the human mind.
When it came to the deployment of the technologies into a production environment with 100s and 1000s of machines and nodes a few technical challenges arose including:
– How to compress the streaming protocol between client and mirrored runtimes such that a single method entry or exit event could be described in as little as 4 bytes.
– The efficient and safe publication of some minimal amount of heap state scoped to a thread, stack or call frame that could be transparently accessed by interceptors or metering extensions. As customers started to consider the implications of changing the space and time dimensions of enterprise integrations using the simulated runtime the need for some contextual data capture became more apparent. Fortunately the Probes Open API had already offered a suitable interface for data exchanging between application and interception –