Thursday 28 October 2010

Collecting and Displaying Runtime Statistics With Byteman

The latest 1.4.0 Byteman release includes a couple of example rule scripts in the sample directory which show off just how powerful a tool Byteman is. Both scripts use Byteman rules to count statistically important events. One of them monitors events in the JVM like Thread creates, and starts, FileStream opens etc. The other counts the number of times your application makes calls to any of the javax Transaction API methods. Well, that's not exactly very impressive is it? I mean, it's fairly easy to inject code which increments a counter, no?

The nice part is that the rules in the script automatically turn those counters into a variety of statistics then format and present the statistics in a dynamically generated MBean displayed in the JMX console. Even nicer the rules in the script control what gets counted and how it gets presented. So, if you can write a rule to count events you are interested in, either in JVM code or in your application code, then you can automatically present this data as statistics. The MBean gets installed when you load your Byteman rules. Unload the rules and the MBean gets uninstalled. Tweak the rules and reload them and you can measure different data. Ad hoc monitoring of your JVM or your app is now fully dynamic and takes just seconds.

In the rest of this blog I'm going to take a walk through how the script operates. For those who want to see the end result here is a quick look ahead at the MBean display for the JVM stats script. After that I talk about how the scripts are written. Hopefully, by the end of it you should know enough to be able to write your own counting rules and generate a statistics display for your own app. Here's the screenshot

JVMMBeanStats.txt is the script which was used to create and populate that MBean. First off let's see how it counts raw events. As one example, consider the following rule
# this rule counts Thread creates
RULE count thread creates
CLASS java.lang.Thread
METHOD <init>
AT EXIT
IF TRUE
DO debug("create thread " + $0.getName());
incrementCounter("thread creates")
ENDRULE

The rule gets injected into any constructor method of class java.lang.Thread and, since its condition is TRUE, it always fires. It uses the built-in method incrementCounter to add one to the counter identified by the String object "thread creates". Now, every thread create, whether for a java.lang.Thread or for a subclass of java.lang.Thread will end up executing one of the constructor methods and triggering this rule. So, effectively the counter tracks the number of threads created in the JVM.

Note that the rule is injected AT EXIT -- i.e. it triggers just before the method returns. This ensures that the Thread instance has been initialised before the debug statement in the rule makes a call to Thread.getName().

It is obviously relatively simple to inject Byteman rules into any JVM or application method and have them increment or decrement counters in order to track what the runtime/application is doing. Note that method incrementCounter() can be provided with an extra integer argument defining the value to be added to the counter. Method readCounter() allows the counter to be reset to zero while retrieving the current count atomically. This makes it is possible to count in many different ways e.g. by adding in the value of a method parameter or field value each time the rule is triggered.

Sometimes the location and condition of a rule need to be a little more sophisticated in order to ensure that only relevant events are counted. For example, the following rule only counts calls to run methods of Thread classes when they are entered from the VM.
# this rule counts Thread runs
RULE count thread run
CLASS ^java.lang.Thread
METHOD run()
# only count run when called from JVM i.e. when there is no caller frame
IF !callerMatches(".*")
DO debug("run thread " + $0.getName());
incrementCounter("thread runs")
ENDRULE

This rule uses the ^ notation in the CLASS clause. This means that the rule is injected down the class hierarchy i.e. not just into Thread.run() but also into any overriding implementation of run() defined by a subclass of Thread.

The condition in this rule uses the built-in operation callerMatches() to ensure that the call is made from the JVM itself (i.e. in response to a call to Thread.start()) rather than from Java code. Method callerMatches() is implemented with a variety of argument patterns; in this one the argument is a regular expression (regexp) to be matched against the name of the method which directly called the trigger method. callerMatches() returns true if the caller method name matches the regexp; if not it returns false. Here's an example of a call which would not fire the rule. In this case the caller method name would be runFooThread.
class FooThreadExecutor {
public runFooThread(FooThread fooThread) {
fooThread.run();
}
}
Now, the pattern ".*" in the Byteman rule will match any method name. So, in the example above the call returns true making the condition false. The only way callerMatches can return false when called with ".*" as an argument is if there is no caller frame in the stack when the run() method gets entered. This is precisely the situation where run() is called from the JVM as a result of some other thread calling Thread.start(). So, this rule does not count pseudo-run operations like the example above.

That's all very well as far as doing the raw counting is concerned but how do the statistics get sampled and displayed to the user? Well, there's a give away in the name of the example scripts, both of which contain the word MBean. The rules in both scripts employ a special helper class called JMXHelper which collects the raw counter data and then presents it in an MBean.

At the top of each script is a line like this
 HELPER org.jboss.byteman.sample.helper.JMXHelper
This means that all the rules in the script will use this class unless they specify their own helper by including a HELPER clause in the rule body. A rule's helper class is used to resolve built-in calls like the ones in the example rules above, incrementCounter() or debug(). Built-in calls have no target object and look like static method calls but without any package qualifier.

Helper classes do not need to implement any special interface or be annotated in any specific way. JMXHelper is just a POJO class. It can be found in the byteman-sample jar included with the Byteman release. JMXHelper inherits from Byteman's default helper class which is why the example rules are able to call the standard built-in methods. So, for example, debug(String) is a public instance method of the default helper and the call to debug("run thread " + $0.getName()) in the example rules gets executed by calling this method. Any public instance method provided by the helper can be called in the rule body as a built-in.

Actually, JMXHelper does not add any extra public methods to the set provided by the default helper class. However, what it does do is implement some public static lifecycle methods. Helper lifecycle methods get called when rules which employ the helper are installed and/or deinstalled by Byteman. Here is one of the methods on JMXHelper
public static void activated()
{
if (theHelper == null) {
theHelper = new JMXHelper(null);
theHelper.start();
}
}

This method gets run when the first rule which employs the helper class is installed into the JVM by Byteman. So, if you load one of these scripts on the java command line or if you upload the rules at runtime using the bmsubmit.sh script then this method will get called once only.

The lifecycle method assigns an instance of JMXHelper to static field theHelper and then runs the instance's start() method. The latter method starts a background thread which wakes up at regular intervals and samples the counters incremented by the rules in the script. The background thread also creates and installs a DynamicMBean into the JVM's MBean server to display the counter values.

Of course, it is possible to use the bmsubmit.sh script provided with the release to deinstall the rules which employ JMXHelper from the runtime. Once all these rules have been removed it is only appropriate to unregister the MBean and shutdown the background thread. Naturally, there is a corresponding lifecycle method which gets called when the last rule employing JMXHelper gets removed from the runtime
public static void deactivated()
{
if (theHelper != null) {
theHelper.shutdown();
theHelper = null;
}
}

Ok, so that explains how the sampling and display operates (well modulo the details of exactly how the thread and MBean get created and started/stopped but that's just bread and butter coding). The more important question is how does the background thread know which counters to sample, how often to sample them and how to display them in the MBean. It would not be much use if these values were hard-wired into class JMXHelper since that would mean that every time you wanted to collect different statistics you would need to rewrite and redeploy the class.

In fact it is perfectly possible to use Byteman rules to parameterise the behaviour of the helper and the MBean. Class JMXHelper provides three methods which are called by the background thread when it is started. A script can inject code into these methods which supplies all the information needed to configure the behaviour of the background thread and the MBean.

So, the first rule we will look at is the one which defines the sample period i.e. how long the background thread waits before waking up and checking the value of the counters used in the script. Here is the method implemented by JMXHelper
public final static long DEFAULT_PERIOD = 10000L;
private long samplePeriod()
{
return DEFAULT_PERIOD;
}

So, by default the background thread wakes up every 10,000 milliseconds i.e. every 10 seconds. The scripts override this period using the following rule which resets the sample period to 5 seconds by injecting code which returns value 5000 when the method is called.
# this rule is triggered when the periodic helper thread starts
# it returns a sample period in milliseconds for which the
# periodic helper thread sits and waits before sampling each
# of the counters and updating the MBean
RULE set period
CLASS JMXHelper
METHOD samplePeriod()
AT ENTRY
IF TRUE
DO RETURN 5000
ENDRULE

So, how about configuring the counters and the MBean display? Well, the background thread calls the following method during startup
private KeyInfo keyInfo()
{
// if the rule set does not generate a keyinfo value then we return this one to indicate
// that something is missing.
String[] keyNames = new String[1];
keyNames[0] = "No counters defined";
return new KeyInfo("Byteman Periodic Statistics", keyNames);
}

The method returns a KeyInfo object. Essentially this contains 3 lists of data. The first list identifies the names of the counters to be sampled. The second list identifies the format in which each corresponding counter value should be displayed and the third provides a descriptive label for each counter. The default implementation above creates a KeyInfo object with only one counter which, presumably, never gets incremented. The format and description take the default values.

This method is overridden by the JVMMBeanStats script as follows
# this rule is triggered when the periodic helper thread starts
# it returns a KeyInfo object identifying the stats counters
# updated by rules in this rule set
RULE return key info
CLASS JMXHelper
METHOD keyInfo()
BIND keyInfo : KeyInfo = new KeyInfo("JVM Statistics in a Dynamic MBean")
IF TRUE
DO keyInfo.addKey("thread creates", KeyInfo.KEY_TYPE_CUMULATIVE, "Thread() total calls");
keyInfo.addKey("thread runs", KeyInfo.KEY_TYPE_CUMULATIVE, "Thread.run() total calls");
...
keyInfo.addKey("thread creates", KeyInfo.KEY_TYPE_RATE, "Thread() calls per second");
keyInfo.addKey("thread runs", KeyInfo.KEY_TYPE_RATE, "Thread.run() calls per second");
...
keyInfo.addKey("class loads", KeyInfo.KEY_TYPE_MEAN, "ClassLoader.defineClass() mean calls per sample");
RETURN keyInfo
ENDRULE

The rule creates a KeyInfo object in the BINDS clause binding it to variable keyInfo. The rule action calls the addkey method repeatedly to add the name, format type and description for each of the counters updated in the rule set. Finally it returns the object identified by keyInfo to the background thread. The thread uses this object to construct a dynamic MBean populated with fields each of whose name, format, description and value is derived from each key entry and associated counter. Every time the thread wakes up it samples the value of the counters. When the MBean is refreshed the latest sampled values are used to recompute the displayed statistic.

Note that the same counter can be used more than once with a different format. So, for example, the first counter, "thread creates", is added with format CUMULATIVE and then added again later with format RATE. Format CUMULATIVE indicates that the MBean should display the last sampled value for the counter. With this format the counter is simply being presented as raw data.

By contrast format RATE means that the MBean should compute the difference between successive sample values and divide it by the sample period to compute the rate at which the counter is changing. So, if the latest sampled value for "thread creates" is 5090 and the previous sampled value was 4940 then the statistic displayed in the MBean would be (5090 - 4944) / 5.0 i.e. 29.2 Thread() calls per second.

In fact the sampling is slightly more sophisticated in order to smooth out variations which occur because of wakeup delays and overlap of application phase changes with the sample intervals. The RATE algorithm can be configured to employ up to 11 prior samples (i.e up to 10 differences) and it divides the difference by the actual time interval between samples rather than the nominal wakeup period.

The MEAN statistic provides another view of the counter data. It computes the change in the counter value across a sample period i.e. it represents how many things were counted during the period. Once again it can be based on up to 11 sample counter readings and the value is corrected to allow for the number of samples included and the variation in the sample intervals.

How is the sample set count defined? Well, that's the third method of class JMXHelper which can be overridden by a rule. It's called sampleSetSize() and it returns 5 as the default set size. It can be overridden using a rule as follows
# this rule is triggered when the periodic helper thread starts
# it returns a count for the number of samples which will be
# combined when computing counter rates or counter sample period
# means
RULE set sample set size
CLASS JMXHelper
METHOD sampleSetSize()
IF TRUE
DO RETURN 3
ENDRULE

Ok, so it's clear that JMXHelper provides a general purpose display mechanism which can be used to present any statistics which you can gather by injecting calls to incrementCounter(). How do you actually install it into a running JVM and what does the final display look like?
Well, the scripts themselves contain detailed instructions but for now I'll show you how to install the JVMMBeanStats script into a JBoss Application Server instance. In the process I'll show off another neat feature of the latest Byteman release

Normally, in order to be able to load and unload byteman rules into a JVM you need to install the Byteman agent when you start up the JVM. There is a useful script in the installed bin directory called bmjava.sh which simplifies this job in the case where you are starting your program using the java command. In most cases you simply call bmjava.sh instead of java. Unfortunately, JBoss AS wraps up the call to java inside its own startup script. This means that you have to set environment variable JAVA_OPTS to pass the -javaagent argument on the java command line. Also, the javaagent argument needs to have several options appended to it to identify the location of the byteman jar and either point Byteman at a rule script or switch on the agent listener so you can upload rules using bmsubmit.sh after the program is started.

With the 1.4.0 release you can now start up JBoss and then upload the agent once the JVM is running. This is possible on any JVM which implements the server side of the com.sun.tools.attach.VirtualMachine API. Ok, you don't actually need to know what that means but you do need to know that it has been found to work on Sun's JDK6 (naturally), OpenJDK6 and JRockit JDK6. I have not yet found an IBM JDK6 on which it does work (although the other Byteman features all appear to work fine) but I have not made a comprehensive survey of all their releases so you'll have to verify this for yourselves.

So, first off you need to start your JBoss instance by calling run.sh
 [adinn@localhost adinn]$ $JBOSS_HOME/bin/run.sh
=========================================================================
JBoss Bootstrap Environment
JBOSS_HOME: /home/adinn/jboss/jbossas/trunk/build/target/jboss-6.0.0-SNAPSHOT
JAVA: /usr/java/jdk1.6.0_21/bin/java
...
Now you need to identify the process id of the JBoss AS process
 [adinn@localhost adinn]$ jps -l
23812 org.jboss.Main
23892 sun.tools.jps.Jps
Now you run the bminstall.sh script to upload the agent into the JBoss JVM
 [adinn@localhost adinn]$ ${BYTEMAN_HOME}/bin/bminstall.sh -b \
-Dorg.jboss.byteman.transform.all=true 23812
I have added a couple of extra arguments to the install command. The -b argument ensures that the agent jar is installed into the JVM bootstrap classpath and the -D argument sets a system property checked by the agent. Why are these needed? Well, the JVMMBeanStats script needs to be able to inject code into JVM classes in the java.* packages. This feature is normally disabled to stop you shooting yourself in the foot. Setting this system property when the agent is loaded allows injection into any class including those in the java.* package. Even with this enabled injection into java.* classes will not 'just work'. The problem is that the injected code must be able to reference exception types declared in the byteman jar. Since the java.* classes live in the bootstrap classpath the agent jar must also be added to that path.

Ok, so just to check that the agent is running let's see if we can talk to it using the bmsubmit.sh script.
 [adinn@localhost adinn]$ ${BYTEMAN_HOME}/bin/bmsubmit.sh -l
no rules installed
Good, the agent is running and responding. Now, with JBoss also up and running we can take a look at the JMX console.


Nothing there at present that mentions Byteman. That's because we haven't yet loaded any rules which use JMXHelper. So, let's upload the rules in the JVMMBeanStats rule set and then run some code to exercise them. Before loading the rule set we also need to install the sample jar into the boot classpath so that our injected rule code can reference classes JMXHelper and KeyInfo. We use bmsubmit.sh with the -b flag to load a jar into the bootstrap classpath and then again with the -l flag to load a rule script.
[adinn@localhost adinn]$ ${BYTEMAN_HOME}/bin/bmsubmit.sh -b ${BYTEMAN_HOME}/sample/lib/byteman-sample.jar
append boot jar /home/adinn/jboss/byteman/trunk/install/sample/lib/byteman-sample.jar
[adinn@localhost adinn]$ ${BYTEMAN_HOME}/bin/bmsubmit.sh ${BYTEMAN_HOME}/sample/scripts/JVMMBeanStats.txt
install rule return key info
install rule set period
install rule set sample set size
install rule count thread create
install rule count thread start
install rule count thread run
install rule count thread exit
install rule count file open read File
install rule count file open read File Descriptor
install rule count file open write File
install rule count file open write File Descriptor
install rule count file input stream close
install rule count file output stream close
install rule count class loads
Right, the agent should now have injected the rules into the relevant methods of classes Thread, FileOutputStream etc and then activated the JMXHelper class. So lets take another look at the JMX console. We can see that a new bean has popped up in the left hand side menu. Clicking on it reveals it to be the bean added by Byteman.
A click on the PeriodicStats element in the display shows us the actual stats.

The period (5) and sample set size (5) are displayed at the top followed by the statistics for each entry defined in the key info. We can see that the total number of threads created since the rule set was loaded is 34. The number created in the last five sample intervals must be 5 since the creation rate appears to be exactly 0.2 per second. Only 2 threads have exited which, presumably, indicates that the allocated threads are being used to fill up a thread pool.

Note that the sample period is a writeable property resettable from the MXBean. The MXBean also provides an operation (off screen in this snapshot) which allows all counters to be reset to zero.

Finally, we can use the bmsubmit.sh script to deinstall all the rules. This causes JMXHelper to be deactivated and it removes the MBean from the display. The agent removes all the injected rule code, restoring the affected JVM methods back exactly as they were before the rule script was uploaded.
[adinn@localhost adinn]$ ${BYTEMAN_HOME}/bin/bmsubmit.sh -u /home/adinn/jboss/byteman/trunk/install/sample/scripts/JVMMBeanStats.txt
uninstall RULE return key info
uninstall RULE set period
uninstall RULE set sample set size
uninstall RULE count thread create
uninstall RULE count thread start
uninstall RULE count thread run
uninstall RULE count thread exit
uninstall RULE count file open read File
uninstall RULE count file open read File Descriptor
uninstall RULE count file open write File
uninstall RULE count file open write File Descriptor
uninstall RULE count file input stream close
uninstall RULE count file output stream close
uninstall RULE count class loads

Here's the JMX console again, with the Byteman stats MBean removed

If we want to vary the information displayed or the way it is computed we can just tweak the rule script to record different events or use different counting rules for the same events and then reload the script by calling bmsubmit.sh. The MBean is automatically recreated and reloaded into the JMX console.

If you want to try this out you can download the latest release from the Byteman downloads page. The binary and full source releases both include the sample jar and scripts plus the latest Byteman programmers guide. I'll also be happy to answer any questions you have or advise on developing rules to track your application behaviour on the Byteman user forum. Enjoy!

Monday 11 October 2010

Allowing full access to objects in rules

Some months back one of the developers at JBoss was trying to use Byteman to test some code and he was rather puzzled that his rule was not compiling correctly. It turned out that his rule action included an assignment to a private field. Up until now Byteman has only allowed rule actions to change public state or call public methods. Rule conditions and actions have no special privileges as far as encapsulation is concerned.

As an example, given this class
class Regulator
{
private float throttle;
public void adjustThrottle(float amount);
}
the following rule fails
CLASS SteamEngine
METHOD adjustPressure(float)
AT CALL Regulator.adjustThrottle
IF ($0.getRegulator().throttle + $1 > 0.95)
DO traceln("Illegal setting for throttle"),
traceStack(),
THROW new BoilerRupture("too much pressure")
ENDRULE
Class Regulator allows clients to increment or decrement the throttle setting but it does not enable access to the current value, presumably because it handles any illegal settings itself. However, when testing or validating the application it is useful to be able to identify erroneous situations such as the one above where a client is making an invalid request. Regulator may handle the error but it is still important to know where errors arise.

Unfortunately, the rule does not typecheck correctly. Byteman complains that the condition is trying to access an unknown field called throttle. The same issue applies with method invocations. If a method is private then any attempt to call it inside a rule condition or action leads to an unknown method error during type checking. Well, the answer is obvious isn't it. Why not just let rules access whatever state they want?

I have modified the trunk release of Byteman to do just this. It is working and there don't appear to be any problems with it. Of course, enabling this feature raises a whole load more opportunities for you to shoot yourself in the foot when using Byteman. In particular, if you use Byteman to modify JVM static and instance data or call private JVM methods you can easily break your Java runtime into shiny little pieces. But then again, this feature makes it possible to do a whole load more tracing and verification of your application's behaviour and to engineeer a lot more unexpected situations during testing. Caveat emptor, as the Romans used to say.

Assuming no issues arise during testing over the next few weeks this feature should be included in the upcoming 1.3.1 Byteman release. If you want an early peek then check out and build the trunk code and give it a try.