Policing Software Service Processing – Part 1

Part 1 in a series of articles on effective ways of controlling concurrent code execution using techniques such as quality of service (QoS), adaptive control valves, circuit breakers and back-pressure as promoted by proponents of reactive programming.

In this post, I look at what it means to police, control or influence, the execution of software code as performed by multiple threads within some managed runtime. Here a managed runtime encompasses both the Java Virtual Machine (JVM) as well as some measurement and management layer above the JVM that imposes various resource management policies and techniques. So what exactly does it mean to control software without actually altering software state or code paths? In operations today “control”, or the illusion of control, is limited to simply stopping and (re)starting the execution of a process. The different techniques of software control described in this series of articles will delve deep inside the black-box of a software machine and look at ways influence can be exerted on the system without altering in anyway the expected outcomes, state changes or response outputs, of some computational processing other than what can transpire with possibly different scheduling of concurrent processing.

At the lowest level the type of influence is one of delay or no delay, at much higher levels it is prioritization, congestion control, and consumption management. Prioritization means giving some processing a better chance at completing execution when there is competition for shared resources such as CPU, IO, etc. Congestion control is also concerned with competition for shared resources but with a focus on reducing the overhead of coordinating excessive concurrent resource demands. We can’t always limit the resource usage of a service request or a code path without actually altering the code but we can stagger the rate of consumption over a particular time interval. This is where consumption management comes in.

When we want to exert some influence on a system we first need to identify the actors (or flows), then the activities they perform and finally the resources consumed and cost incurred in the course of such actions. In a managed runtime an actor is typically mapped to a thread, an activity to a method, a resource to some accounting counter, and an action to the invocation of a method (activity) by a thread (actor). We can’t directly manage costs and consumption instead we manage the causes of consumption – the ability of an actor to begin an action at some point in time.

software.regulators.mirror.up.metering

To control software we must have the capability of exerting some influence at the point of when a thread enters or exists from a method or code block. Before performance can be effectively managed instrumentation must be applied in order to intercept and augment service processing as if the augmentation was present within the source code. Following on from that comes monitoring in order to verify that influence has indeed occurred in an expected manner – positively or negatively. The Probes Open API, which can be applied to bytecode dynamically using an instrumentation agent or statically to source code by a developer, provides the ideal interface for demarcating the begin and end processing points of an action, a method, performed by an actor, a thread. Irrespective of how the Open API is employed it is important to note that the API does not deal directly with control itself. Control, whether exerted or not, is never surfaced in the API. The purpose of the API is simply to define the possible points of performance monitoring and management. Software control, or performance management, is only ever employed at such points in the processing when the underlying managed runtime decides that the action or actor requires some degree of supervision and intervention – the intervention being a delay in proceeding with the execution of a call or code block.

pssp-probes-open-api-stack

The next article in the series will look at how best to abstract a policy of control that is manageable when the number of activity types and actors is extremely large using activity classification.