Problems with proxy classes and a proposed resolution
adinn Mar 16, 2010 11:16 AMThis discussion is prompted in part by Mark Stuberg's note and in part by looking at JASSIST-42 which identifies various problems with the current use of writeRepace to support proxy serialization. The comments on JASSIST-42 contain a prescription for how to solve the serialization problems provided by David Lloyd. I impemented tis solution and, unfortunately, although it almost works is still suffers from a couple of serious problems. Although it might be possible to get round these the solution requires javassist clients to do a lot of extra work and is likely to be error-prone.
I have identifed an alternative solution which should also ensure that cached proxy classes can be reused much more often both when performing serialization and in normal use of proxies. The only drawback is a very small change to the proxy factory/class API which will require existinng client code to be modified very slightly. The proposed fix can be used with the writeReplace style serialization solution to ensure that deserialized proxies can share the same cached class and do not fail to find the correct class. It can also be used to support serialization/eserialization using David Lloyd's proposal which provides much better quality serialiation (e.g. superclass fields are not lost) but is a tiny bit harder for clients to use. I have nearly finished a prototype which enables both alternatives to be made available so, modulo the small factory API change, we can retain compatibility for code which currently uses writeReplace based serialization.
There are two related problems with the current proxy cache. The first is to do with identity, the second with naming.. The identity of a cache entry (a proxy class) associated wiht some given classloader is determined from four sets of data:
- the super class s
- the interface set i1,...,ik
- the proxy filter f
- the proxy handler h
Following the fix for JASSIST-104 the per-classloader cache is a hashmap keyed by a string derived from the names of the classes in the first two data sets: i.e. of the form sn$i1n$...ikn$. where sn is the (String) name of s, i1n is the name of i1 etc All proxy classes with a given super and interface set are stored in a list under this key. The list contains entries each of which stores a filter f1, handler h1 and weak reference to the associated class pc1. So, when creating a class with the specified data set s, i1...,ik, f, h the key is constructed and the relevant list located and traversed looking for an entry where f1 == f and h1 == h.
This already wastes an opportunity for reuse of proxy classes. If f1 and f2 are two filters which behave exactly the same then a class created with filter f1 will not be reused when creating a class withh filter f2 even if all the other elements of the iidentifying set are the same.
The problem with serialization is more severe. The current scheme based on use of method writeReplace serializes a proxy object by substituting an instance of SerializedProxy containing the identifying details sn, i1n..,ikn, f and h. At the deserializing end this results in objects sn', i1n'...,ikn', f' anf h'. The deserialized instance of SerializedProxy uses these details to lookup or craeate a proxy class pc' and create an instance p' which it then initialises with the correct handler. Unfortunately, this also loses all data locate in instance fields iniherited from superclass s.
Even if the deserialize is done in the same JVM as the serialize operation f' and h' will not equal f and h so a cache lookup will fail to find the originating class pc from which the proxy was serialized. If it is done in a different JVM the same problem arises. The instances will be new every time they are read from the stream so they can never match cache entries. So, each time that, say, a remoting call is made serializing an instance of proxy class pc down the wire will cause a new proxy class pc' to be created and the cache soon blows up.
The same problem affects the scheme suggested by David Lloyd which avoids using writeReplace. Instead the serialize and deserialize operations must be done using a subclass of ByteOutputStream and ByteInputStream defined by the javassist package. When a ProxyObjectOutputStream detects that it is serializing a proxy instance p the stream identifies the proxy's class pc, and writes sn, i1,... etc into the stream before using the normal serialization code to serializethe field values stored in p. At the other end the ProxyObjectOutputStream recognises that details of a proxy class have been included. It reads sn', i1n' etc and tries to lookup and retturn a cached proxy class pc' identified by these values.. The parent code in ObjectInpuStream creates an instance p' from this class and initialises its fields using the data serialized from p. Once again this can never reuse a cached class and will always end up creating a new class pc' since f' and h' are new objects so will never identify a cache entry
The second problem is to do with how proxy classes are named. Clearly, there can be more than one class with super s and interface list i1,..,ik. So, the current code creates a name of the form javassist.proxy$sn$i1n$...$ikn_37 whenever it creates a proxy class. The number at the end is derived by incrementing a static counter in class ProxyFactory. So, when an instance p of class pc is serialized and then deserialized the new class created at the other end may have a name like javassist.proxy$sn$i1n$...$ikn_26 i.e. with a different numeric tag at the end. It is very unlikely that class names will actually match up. This change of classname is not fatal for the writeReplace based serialization scheme. However, if kills david Lloyd's alternative. The class pc' returned by the ProxyObjectOutputStream must have the same name as the original class pc. If the names don't match the deserialize throws an exception.
One way to fix this would be to require filter and handler classes to implement equality and naming. If the classes fc and hc of filters f and h implemented method equals() then the cache lookup could succeed if f'.equals(f) and h'.equals(h). The client would have to ensure that for any two filters f and f' f.equals(f') if and only if any method selected by f would be selected by f' and any method rejected by f would also be rejected by f'. This is feasible but probably somewhat error prone. Handler classes would also need to implement a coherent equals method.
It would also be necessary to implement a matching getName() method for each filter or handler class which returned a String to be included in the proxy class name. So, if f.getName() ==> fn, h.getName() ==> hn then the proxy class would have name javassist.proxy$sn$i1n$...$ikn$fn$hn. These methods woudl need to esnure that where f1.equals(f2) then f1.getName.equals(f2.getName()) and similarly for h1, h2. This would ensure that both serialization schemes worked and were able to reuse cache entries albeit at some cost in code and brainpower to clients.
Rather than impose this burden on the programmer I suggest a simpler solution. First, I'll make a couple of important observations.
For a given pair of proxy classes p1 and p2 derived form clases pc1 and pc2 which differ in their handlers h1, h2 but share the same super s, interfaces i1,...ik and filter f the classes pc1 and pc2 are almost identical in behaviour. Since they share the same filter pc1 and pc2 will call the instance's handler in exactly the same cases and will both rely on the methods inherited from s in all other cases.The only point where the classes idffer is that the static handler field in pc1 and pc2 may differ and this affects the initial value of the handler assigned to their innstances So, if p1.handler and p2.handler ar expiicitly set by the client p2 could be derived from pc1 and we could dispense with pc2. If we remove this static default handler field and require the cleint to always call setHandler on a new instance then we can remove any dependence of the proxy class on the handler value. This means we can remove h from the set of identifying data used to ientify a proxy class..
So, to achieve this we need to decommission Proxyfactory.setHandler(), remove the static fields in the handler class and change the generator for the proxy lass constructors so that the instance's handler is initialised using the global defaul in ProxyFactory rather than looking for the proxy class default.
The second observation is that the identity of two filter instances f1 and f2 can be ascertained automatically and can also be encoded automatically into a unique name. The list of methods m1,...,mn impllemented by s, i1,...ik is determinate and of fixed length. So, we can encode a bit vector of length n using a 1 at position i if f1 redirects calls to mi and a 0 if f1 relies on the impllementation inherited from s. If f2 encodes the same bit vector it will generate essentially the same class. We can also translate this bit vector to a filter signaure key f1n, a hex String which will looksomething lik, say, fa08. Any equivalent filter method f2 will also have the same signature key.
This solves our identity and naming problem. If we store the bit vector in the class as an int[] and construct a name for the proxy class wich includes this filter name -- something of the form javassist.proxy.$sn$i1n$...$ikn$fn -- then we can guarantee to be able to identify and reuse equivalent proxy classes both under a call to createClass and also during deserialization.
I would prefer if we were to drop method
ProxyFactory.setFilter()
and instead require clients to call
ProxyFactory.createClass(MethodFilter).
If instead the client calls
ProxyFactory.createClass().
then we generate a class which redirects all methods via the handler. Alternatively, we can retain teh setFilter method on the factory and use it to filter when createClass is called.
So, if we swallow these small API changes then we can fix the current proxy model and we can also implement David LLoyd's alternative serialization model. Note that we don't want to store the filter instance in the class as this may cause classloader issues when we reuse a class whose filter belongs to a classloader which is not known at the deserializing end . However, storing an int[] or the String filter signature will cuase no such problems.
For the serialization fix I believe we can retain compatibility by providing a proxy factpry settting. By default a factory will create classes which implement writeReplace and can be serialized via normal ObjectOutput/InputStreams. We can provide a method
ProxyFactory.setUseWriteReplace(boolean) which can switch off generation of this method when rerating new proxy classes. Instances of these class must be serialized using a ProxyObjectOutputStream and deserialized using a ProxyObjectInputStream. However, this will enable them to be fully serialized and reconstituted, avoidning the problems David mentions in JASSIST-42.
I am very close to having a prototype implementation of tis behaviour (I got most of the way there identifying the two critical failings). I'll complete this and post a patch. However, I;'m interested to hear feedback form anyone who may be affected by the API changes here.
regards,
Andrew Dinn