Monday, December 15, 2008

Writing a custom ClassLoader for jBPM

Within our jBPM process engine we're dealing with dependencies on libraries that are, well..., not completely stable. The code in the node handlers is calling our SOA layer through generated API classes, which automagically take care of several boiler plate tasks (such as security) and are deployed as jars along with our process engine. The SOA layer evolves, so in time a number of versions has come to exist.

We encountered a problem because we're running several process definitions within one engine deployment. This includes both entirely different processes as well as new versions of already running process definitions. Our base of automated business processes has grown over time, with older process implementations relying on the early SOA API and the newer process implementations taking advantage of the later API additions. There are dependencies to different versions of the API - and while strict backwards compatibility might have solved this issue for us, in practice this proved not quite feasible.

So what were the issues we were trying to solve?
  • There are different versions of the generated API classes corresponding to different versions of the SOA services. One deployment of jBPM must be able to run processes that rely on different versions next to each other.
  • We wanted to be able to configure the dependency per process definition, but also for versions of a definition, so that a new incarnation of a process may take advantage of a new (and hopefully improved) version of a web service.
  • Not only the correct version of the API classes needs to be used, also the corresponding web service endpoints has to be available to the code running a process instance.
The configuration had to be external to the process archive, so it can be adjusted at deploy time. We've settled for a simple XML format, which allows for all required information to be present using minimum complexity:

   
   
      
      
      
      
   
   
      
      
      
      
   
   
      
      
      
      
   
   
      
      
      
      
   

This custom configuration consists of the following:
  • One line indicating the directory in which all the API jars are deployed. Take care that this directory and the jars in it are not on the standard classpath, because then you're gonna be stuck with only one version, which is not compatible with all of the calling code.
  • At least one entry for each process definition. A single entry can be used for each separately deployed version of the definition (as for process2) or different entries for the different version ranges (indicated using the min_version and/or max_version attributes).
  • For each process definition (version range) the jar file and endpoint for each required web service is added. The name is used for querying by the client code.
The way to include the correct jars to the classpath of a given process instance (running a certain process definition version) is through a custom class loader - a mechanism that is made available in jBPM version 3.3.0.GA - just set the 'jbpm.classloader' property in jbpm.cfg.xml to 'custom' and indicate the custom class loader by setting its name in the 'jbpm.classloader.classname' property. This custom class loader itself is almost too simple to mention: it extends from java.net.URLClassloader, and in the constructor it determines the name and version of the process definition before reading the applicable jar file names (as URLs) from the custom configuration file.

We've put the actual reading from the XML file in a utility class; for reading from XML we could have gone completely overboard and set up a schema and compiled Java classes from it with JAXB. Instead we simply used the dom4j library and a couple of simple XPath expressions to accomplish the same.

Our utility class has the following interface:
public final class ConfigurationUtil {
   public static URL[] getJarsForProcessDefinition(String processId, int version) throws IOException {...}
   public static String getEndpointForProcessDefinition(
      String processId, int version, String serviceName) throws IOException {...}
}
The first method delivers everything needed by the custom class loader's super class constructor. The second method reuses the XML parsing facility and allows the last requirement mentioned in the issues above to be satisfied efficiently.

In all, writing a custom ClassLoader was not much of a task anymore once we figured out what kind of custom configuration was applicable to our situation...

Wednesday, November 26, 2008

"Too many certificates in chain"? It just may be a corrupt keystore!

Recently we ran into some trouble with the keyword expansion functionality that CVS offers (way too much false positives in the comparisons), and we simply decided to turn it off for our sources. That meant changing the ASCII/Binary property of all files from "ASCII -kkv" to "ASCII -kk" (keyword compression). Problem solved, albeit in a rather crude way.

Well, that sure bit us in the proverbial back end. The point is that when you change this property, you should be careful not to change it for files that were designated "binary". But we did.

One side effect of this was that a keystore file, stored in CVS, now became ASCII as well; effectively corrupting the file for further use. When trying to read a key from it, I got the following stack trace:
Caused by: java.io.IOException: Too many certificates in chain
at sun.security.provider.JavaKeyStore.engineLoad(Unknown Source)
at java.security.KeyStore.load(Unknown Source)
In an attempt to locate the source for this, I stumbled upon this:

http://docjar.com/docs/api/sun/security/provider/JavaKeyStore.html#engineLoad(InputStream,%20char)

(just scroll down a bit, the top bar covers the important part!)

So from that code I gather that the occurring problem is actually an OutOfMemoryError (which I feel is kind of creepy) that is caused by the keystore implementation trying to load a corrupted keystore file. I think it's unlikely that there will be so many chained certificates in any practical keystore that this will really lead to running out of memory, so next time I see this error message I surely will think 'keystore file corruption'!

Friday, October 31, 2008

An inconsistency between explicit and implicit hashing when signing in Java security?

For the connection to a certain other system within our network, the program I'm working on needs to verify that it indeed is what it claims to be: an authorized client. A common way to accomplish is through PKI: it signs the message it sends using a private key, and the other system can verify this signature using the corresponding public key. See e.g. this article for an explanation of how this works.

In our case, there are three steps in signing a message:
  • calculation of the message digest through a hashing algorithm,
  • calculation of the digital signature using the private key, and
  • coding the result to base64.
The last step is not part of the normal signing process, but we need to send the result as a string inside an XML message. Using the 'raw' signature would result in weird characters in the XML, very likely choking up the parser.

As I was coding away, I was lulled into performing each of these steps separately, so I started off with implementing the hashing using the java.security.MessageDigest class. I instantiated it with the "SHA-1" algorithm and simply called the digest method with the message to obtain
its hash. Pretty straightforward stuff.

Then I turned to the java.security.Signature class to supply me with the subsequent signing functionality. It occurred to me that there are algorithm choices that include hashing algorithm names, so I quickly found out that it is possible to let the Signature class take care of
both the first and second step of my signing process. While that struck me as quite convenient, I decided to stick to the original plan and not use the hashing possibility here. I chose the "NONEwithRSA" algorithm, and after feeding the message and private key the sign method provided me with an answer.

Then I encoded it in base64 (using the Apache Commons Codec library, which I also could have used for the hashing functionality) and presto! So I thought, at least...

But then...

The first test we performed immediately indicated something was wrong. And after checking everything else (like making sure code page encodings were correct and what not) we came to the conclusion that the signature itself had to be the culprit.

So I decide to put the 'my' way of creating a signature side by side with the signing method using the implicit hashing possibility, to see whether there might be a difference in the outcome:
public void test(byte[] data, PrivateKey privateKey) throws Exception {
   // Explicit hash and separate signing:
   byte[] hashedData = MessageDigest.getInstance("SHA-1").digest(data);
   byte[] signedData = signData(hashedData, privateKey, "NONEwithRSA");

   // Signing with implicit hashing:
   byte[] signedHashedData = signData(data, privateKey, "SHA1withRSA");

   System.out.println("Encoded data (explicit hashing) = "
           + new String(Base64.encodeBase64(signedData)));
   System.out.println("Encoded data (implicit hashing) = "
           + new String(Base64.encodeBase64(signedHashedData)));
}

private static byte[] signData(byte[] data, PrivateKey privateKey, String algorithm) throws Exception {
   Signature signature = Signature.getInstance(algorithm);
   signature.initSign(privateKey);
   signature.update(data);
   return signature.sign();
}
And even though:
  • from the code it seems that both paths should lead to the same result: no other configuration than the algorithm names are given, so all else should be default, and
  • from the Java security documentation for the signing options "NONEwithRSA" is stated to 'not use a digesting algorithm', so it should act as "SHA1withRSA" without the SHA-1 hashing,
there definitely is a difference between the outcomes!

In our situation, the implicit hashing turned out to deliver the correct result (at least, with regards to what the other system expected), so a minimal code change (getting rid of the explicit hashing step) did the trick. We use an external configuration file to set the signing algorithm through a system property, so changing that was easy.

What's causing this?

Now why is there a difference between the two approaches? I've tried to find an answer using Google, but that quest didn't turn up any answers.
So I did what any self-respecting developer would do: step through the implementation in a debugger. Unfortunately my toolkit didn't allow me to see everything I wanted to; however I could see that the input of the signing step was identical in both cases and the same implementation for signing is used under the hood (sun.security.rsa.RSACore). I cannot see what is happening with respect to internediate padding of the byte arrays, however, so I'm guessing that the 'default' settings of the two approaches - driven by different SignatureSpi implementations - differ in this respect.

If anyone could point out the actual difference to me, that would be greatly appreciated. For now I'll have to be content with knowing that the two approaches do return different results and that picking one at random may lead to problems...

Saturday, August 16, 2008

Blind spot in the Java API: dynamic proxies

Reflecting on the good ol' days

Back in the days that the reflection API was added to Java, it meant there was a new way to do much of the same, yet in a very generic and nifty way. That came of course at a price: it slowed things down considerably - which was significant as the performance of Java as a whole (language + platform) was still quite an issue at the time.

In those days I was still working in the Telecom industries, where Java just began to get some foothold. Most of the applications running on the network nodes were still written in C++ (or even plain C) and were simply faster. The powers-that-be decided against the use of reflection (at least in production code) for exactly those performance related reasons and therefore my attention - after a short first impression - shifted away from that API.

Back to the present

Last week at work I came across some code a co-worker had written and which I was about to use for 'inspiration'. One class implemented some interface named InvocationHandler which I'd never seen before and (obviously) used reflection. I looked it up in the Sun Javadocs and found to my surprise that this interface (along with its partner in crime, the class Proxy) have been around since Java 1.3! That means over eight years - Ouch!

Ever since I read Scott Meyers books on C++ (yes, this one in particular) I know there is great advantage in being aware of the available libraries. And even though Scott's remark only seemed to apply to the standard library for C++, I'm sure he meant this in a much wider context. Any engineer should know his tools.

Knowing all of Java's APIs is another thing altogether, though. The list of classes and interfaces available in the Standard Edition alone seems to be endless, and adding the Enterprise Edition and Micro Edition to the mix makes for a sheer impossible task. I can't imagine there's a soul alive who actually does know about all of them - and knows how to use them in a practical manner, of course.

It has never been my goal to learn all about the Java APIs, just the parts that I need to write efficient code. But this is one I definetely missed...

So what's it all about then?

First off, it's called 'proxy' because it allows you to implement the Proxy design pattern (actually it's not hard to pull off a Decorator implementation either), and it's called 'dynamic' because it implements a given interface at runtime - and that's where reflection comes into play.

This is the 'classic' Proxy pattern:
Proxy patternA dynamic proxy doesn't have the compile time relationship with the interface. As said, this relationship is forged at runtime. The code for such a class might look something like this:
package server;

import java.lang.reflect.*;

public class DynamicProxy implements InvocationHandler {
    private Object realSubject;

    public DynamicProxy(Object realSubject) {
        this.realSubject = realSubject;
    }

    public Object invoke(Object proxy, Method method, Object[] args)
        throws Throwable {
        System.out.println("Before the call...");
        Object result = method.invoke(realSubject, args);
        System.out.println("...after the call.");
        return result;
    }
}
Of course you may want to insert something more useful before and after the method invocation in a practical case...

Now all one needs is the means to let the client get a reference to a Subject in a way that doesn't reveal the dynamic proxy. A factory, which normally may just hide the implementation class and/or static proxy from the client, could be 'upgraded' to do just that:
package server;

import java.lang.reflect.Proxy;

public class SubjectFactory {
    public static Subject getSubjectInstance() {
        return (Subject) Proxy.newProxyInstance(RealSubject.class
            .getClassLoader(), new Class[] { Subject.class },
            new DynamicProxy(new RealSubject()));
    }
}
Notice how:
  • the classloader of the implementation class is given as the first argument
  • the Subject interface is entered as the second parameter; in fact all (distinct) interfaces implemented by the implementation class may be given here, so RealSubject.class.getInterfaces() would be fine too
  • the DynamicProxy is instantiated given a RealSubject instance; this is the instance on which the actual invocation is performed

And this is what the class diagram has become:
Dynamic Proxy with Factory
The definitions of the Client and ServerImpl classes and Server interface haven't changed at all if a factory was used in the first place. So this little mechanism can be used to insert any pre- and post-processing you need at any time, e.g. to surround the call with session or transaction management.

Now do I hear someone say AOP? This construct actually predates AspectJ, and I guess that if you don't need all of the versatility of such a framework, it's good to know this tool is in your standard toolkit!

If this short intro has grabbed your interest, you may want to go on reading this old Javaworld article about dynamic proxies to get your feet wet a little more...