2 Replies Latest reply on Jun 29, 2010 10:47 AM by mazz

    support wildcard notation in rules

    mazz

      I submitted this yesterday: https://jira.jboss.org/browse/BYTEMAN-114

       

      Suppose I wanted to do some "across the board" instrumentation - how hard would it be to implement something like  wildcard notation in rules? I don't think it supports this today.

       

      Is there some other way one can ask byteman to do something like that - specifically "instrument all methods that match pattern X from all classes that match pattern Y". Or even "instrument ALL methods of ALL classes in a given package".

        • 1. Re: support wildcard notation in rules
          adinn

          Hi John,

           

          John Mazzitelli wrote:

           

          I submitted this yesterday: https://jira.jboss.org/browse/BYTEMAN-114

           

          Suppose I wanted to do some "across the board" instrumentation - how hard would it be to implement something like  wildcard notation in rules? I don't think it supports this today.

           

          Is there some other way one can ask byteman to do something like that - specifically "instrument all methods that match pattern X from all classes that match pattern Y". Or even "instrument ALL methods of ALL classes in a given package".

           

          Yes, I saw the JIRA. It makes sense to provide this sort of functionality -- although there are a few wrinkles to consider regarding well-definedness. A-and, it's not really hard to implement it. It is hard to implement it efficiently though i.e. so that it does not severely impact the performance of the JVM. That's one of the reasons I have not yet provided wildcard capability. If you (or anyone else) is interested the reasons are set out below.

           

          Oh, and as regards well-definedness, consider this rule

           

          RULE beware of too much rope

          CLASS org.my.*

          METHOD <init>

          AT CALL append

          BIND foo : Foo = $1

          IF foo == $@.[0]

          DO traceln("found the answer" + foo)

          ENDRULE

           

          So, this rule applies to all constructors of all classes in package org.my (or does it also apply to subpackages?). But it only applies if the constructor calls apoend. If not then injection will not be performed --  no error message, it's just a failed match. But if append is called then the first argument had better be of type Foo or else a type error will occur. The condition looks sees whether the target of the append call is the same as the argument to the constructor. However, append may actually be a static method in some cases n which case $@[0] will be null. The ambiguity which is useful in normal rules can make it difficult to foresee the range of applicability, validity or utility of widlcard rules.

           

          Now what about implementing it efficiently? The problem is that the agent is called every time a class is loaded by the JVM. It's first job is to decide whether it needs to transform the bytecode it has been presented with or just let the load continue. It needs to make that decision quickly and cheaply most of the time or else the VM will start to slow down quite drastically -- well at least it will do so during bootstrap of a large app like the AS and that matters a lot. A  more complex algorithm for locating rules which match loaded classes means more time spent getting to the point where you say no. I emphasise the negative case because that is and probably always will have to be the 99% case (even with pattern rules).

           

          Quickly and cheaply is a relative term -- after all the VM has already had to load the file containing the bytecode and will, at the very least, have to verify the bytecode and convert it into an in-memory representation of the class. With the current implementation the cost is not too great during bootstrap -- more dertails below. But it is easy to get this wrong and add enormous overheads. In particular, it is preferable if the matching algorithm only incurs costs when there are complex matches to be made and, contrariwise, is cheap when the rules use precise specifications.

           

          The latter consideration has been the most important one for  me up to now. For normal rules with explicit class names I index the rules using a  hash map based on the value in the CLASS clause. This means that when class org.my.X is loaded I need to hash on thhe strings "X" and "org.my.X" to decide whether I need to look further for a  method match. That's very cheap. Using a boot tie rule script containing arbitrary numbers of rules of this sort which never actually match a loaded class adds a negligible amount to AS startup  -- not even 1%.

           

          However, I recently added support for overriding injectiion down class hierarchies i.e. where you specify CLASS ^Foo and the rule applies to  any matching methods of Foo, Bar extends Foo, Baz extends Bar etc. The problem with this is that it adds a lot more work. If I am presented with class org.my.X extends org.my.Y extends java.lang.Object I have to check for any rules which mention "X" and "org.my.X" and for overriding rules which mention "Y", "org.my.Y", "Object" and "java.lang.Object". If there is a rule for "Y", "org.my.Y", "Object" or "java.lang.Object" then I have to inject it into class X. So we get 6 hashes instead of 2 and, more importantly,  I have to compute the supers list of org.my.X.

           

          I cannot use reflection to do the first super lookup as class org.my.X does not yet exist. So, I have to do a  (partial) scan of the bytecode to deterine that the name of the superclass is Y. Actually, I sometimes face the same problem with class Y and its supers -- X may get loaded first and the VM will call the transformer before loading the rest of the class hierarchy because the transformer is allowed to change the super linkage. So, if I have even one overriding rule for class Foo I cannot decide whether it applies to class X without checking all the supers of X and I cannot precompute, index or cache anything in order to make this decision.

           

          As a consequence using a single overriding rule, one which never matches a loaded class, still adds somewhere between 5% and 10% to the AS startup time. That's the best I could do even after I tuned the algorithm.

           

          So, what happens when we allow pattern in rules? What extra overheads would occur. Well, ignoring overriding, that would mean doing the simple hash lookup against "X" and "org.my.X" and then iterating over each rule which uses a CLASS wildcard doing a pattern match. If overriding is added to the mix (after all, it makes sense to request injection into ^org.my.socket.utils.*) then on top of the hash lookup  against "Y", "org.my.Y", "Object" and "java.lang.Object"  we would also have to pattern match "org.my.Y" and "java.lang.Object" against any overriding rules which use CLASS wildcards

           

          Now, this is not too bad because the cost is proportional  to the number of wildcard rules. If you don't use them  then there is essentially no cost. Also, if you don't combine oevrrifing with wildcards then there is no extra work added to the supers check.

          • 2. Re: support wildcard notation in rules
            mazz

            > So, this rule applies to all constructors of all classes in package org.my (or does it also apply to subpackages?)

             

            Depends on the wildcard semantics. You could use the "Ant"-way (where * is one level, ** denotes all sub-levels), or you can just rely on having the rule author user Java's  Pattern regex rules (i.e. "org\.my\.*"). Quite honestly, if you just supported "?" and "*" (where ? matched one character and * matched  one or more) that would probably capture 80% to 90% of all use-cases.

             

            > If you don't use them  then there is essentially no cost.

             

            This is good, since this new enhancement would not affect performance for those that aren't using this. Those that want this capability should be told that they need to understand it may affect performance, nothing a little documentation can't solve