I looked at this in a bit more detail and i think this can be optimized, although i don't have the exact changes that we need to do. Here's a more detailed version of the flow:
1) The BaseClassLoader is invoked to load a class
2) The BaseClassloader delegates this to the appropriate BaseClassLoaderDomain
3) The domain has knowledge of the EXPORTs and also the package capabilities (i.e. which package is available in which jar) of those EXPORTs.
4) The domain then selects a collection of such EXPORTs to identify the correct classloader out of those.
5) In order to identify the correct loader, it asks each of those, through a getResource(String classResourceName) call whether it contains that resource. Actually, this is more of a isResourceAvailable type of boolean call where it doesn't care of the returned URL.
6) If #5 indicates that the loader contains the necessary resource, then this loader is selected and scheduled for loading the class.
The problem starts in #5 when the VFSClassLoaderPolicy is used. Specifically when the VFSClassLoaderPolicy.getResource(String classResourceName) is called, it starts looking in various "roots" to see if that resource is available within any of its known roots. While doing this, internally the each of the file handlers are invoked which ultimately lead to AbstractVirtualFileHandler.structureFindChild(String classResourceName) first splits the resource path name into multiple tokens (through PathTokenizer). This in itself is expensive because of the various String operations involved:
PathTokenizer.getTokens():
public static List<String> getTokens(String path)
{
if (path == null)
throw new IllegalArgumentException("Null path");
int start = -1, length = path.length(), state = STATE_INITIAL;
char ch;
List<String> list = new ArrayList<String>();
for (int index = 0; index < length; index ++) {
ch = path.charAt(index);
switch (ch) {
case '/': {
switch (state) {
case STATE_INITIAL: {
// skip extra leading /
continue;
}
case STATE_MAYBE_CURRENT_PATH: {
// it's '.'
list.add(CURRENT_PATH);
state = STATE_INITIAL;
continue;
}
case STATE_MAYBE_REVERSE_PATH: {
// it's '..'
list.add(REVERSE_PATH);
state = STATE_INITIAL;
continue;
}
case STATE_NORMAL: {
// it's just a normal path segment
list.add(path.substring(start, index));
state = STATE_INITIAL;
continue;
}
}
continue;
}
case '.': {
switch (state) {
case STATE_INITIAL: {
// . is the first char; might be a special token
state = STATE_MAYBE_CURRENT_PATH;
start = index;
continue;
}
case STATE_MAYBE_CURRENT_PATH: {
// the second . in a row...
state = STATE_MAYBE_REVERSE_PATH;
continue;
}
case STATE_MAYBE_REVERSE_PATH: {
// the third . in a row, guess it's just a weird path name
if (errorOnSuspiciousTokens) {
throw new IllegalArgumentException("Illegal suspicious token in path: " + path);
}
state = STATE_NORMAL;
continue;
}
}
continue;
}
default: {
switch (state) {
case STATE_INITIAL: {
state = STATE_NORMAL;
start = index;
continue;
}
case STATE_MAYBE_CURRENT_PATH:
case STATE_MAYBE_REVERSE_PATH: {
if (errorOnSuspiciousTokens) {
throw new IllegalArgumentException("Illegal suspicious token in path: " + path);
}
state = STATE_NORMAL;
}
}
}
}
}
// handle the last token
switch (state) {
case STATE_INITIAL: {
// trailing /
break;
}
case STATE_MAYBE_CURRENT_PATH: {
list.add(CURRENT_PATH);
break;
}
case STATE_MAYBE_REVERSE_PATH: {
list.add(REVERSE_PATH);
break;
}
case STATE_NORMAL: {
list.add(path.substring(start));
break;
}
}
return list;
}
For each of this token, the control is passed to the FileHandlers. One such handler is the ZipEntryHandler. And from what i see, the main reason why the loading of a class shows up as expensive is a result of our ZipEntryHandler. Here are some of the things it does almost for all operations. But in the current use case let's just consider the getChild(String path) method:
i) ZipEntryHandler.getChild -> First gets the correct ZipEntryContext and then calls the getChild on the context. Any operations on the ZipEntryContext go through a isModified check which involves a call to System.currentTimeInMillis to get the current time and then comparing it with the earlier time. Do we need to do this modification check? This check happens almost on all operations on the context:
VirtualFileHandler getChild(ZipEntryHandler parent, String name)
{
if (parent == null)
throw new IllegalArgumentException("Null parent");
checkIfModified();
String pathName = parent.getLocalPathName();
if("".equals(pathName))
pathName = name;
else
pathName = pathName + "/" + name;
EntryInfo ei = entries.get(pathName);
if(ei != null)
return ei.handler;
return null;
}
private synchronized void checkIfModified()
{
// TODO: if zipSource represents a nested archive we should maybe delegate lastModified to its parent
if (initStatus == InitializationStatus.NOT_INITIALIZED)
{
ensureEntries();
}
else if (initStatus == InitializationStatus.INITIALIZED && getZipSource().hasBeenModified())
{
EntryInfo rootInfo = entries.get("");
entries.clear();
entries.put("", rootInfo);
if (getZipSource().exists())
{
try
{
initEntries();
}
catch(Exception ignored)
{
log.warn("IGNORING: Failed to reinitialize context: " + getRootURI(), ignored);
}
}
}
}
ii) Within this flow, there's even an invocation to ZipEntryHandler.isLeaf(). Similar to the isModified call, this internally checks whether the zip context is closed. The isLeaf() implementation itself is a bit expensive since it's not a simple boolean check and instead does a .equals() on AbstractVirtualFileHandler:
ZipEntryContext.isLeaf:
boolean isLeaf(ZipEntryHandler handler)
{
if (handler == null)
throw new IllegalArgumentException("Null handler");
if (getRoot().equals(handler) == false)
checkIfModified();
EntryInfo ei = entries.get(handler.getLocalPathName());
if (ei == null || ei.entry == null)
return false;
return ei.entry.isDirectory() == false;
}
AbstractVirtualFileHandler.equals()
public boolean equals(Object obj)
{
if (this == obj)
return true;
if (obj == null || obj instanceof VirtualFileHandler == false)
return false;
VirtualFileHandler other = (VirtualFileHandler) obj;
if (getVFSContext().equals(other.getVFSContext()) == false)
return false;
if (getPathName().equals(other.getPathName()) == false)
return false;
return true;
}
So overall, in order to find whether a zip (jar) file contains a given class resource (remember we already know it contains the package, we are just checking whether it contains the specific class) we do a lot of expensive operations. I haven't completely looked at the ZipEntryHandler/Context, but would it be possible to have some sort of a mapping which has the entries contained in that jar? I see that we are doing something similar in the initEntries() but not sure whether it can be used.
So if we wanted to check if a given resource is available in the ZipEntryContext, maybe we could just do a entries.containsKey(resourcePath)? Would it be possible to simplify the zip entry handling?