Friday, March 16, 2012

Converting Forms in Restlet to POJOs with Jackson and Guava

Punchline first. This article presents code that you can use to convert HTML forms into POJOs, mapping property names in the natural way, with support for compound objects and arrays/lists. A long-winded set-up for that punchline follows.

Many of my Restlet resources are designed to support both JSON and HTML representations with resource interfaces that look like this:

public interface PersonResource {
    @Get Person getPerson();
}

where Person is a POJO that might look like this:

public class Person {
    public String getName() { ...}
    public List<Address> getAddresses() { ... }
    ... other bean stuff: setters, fields ...
}

and the implementation of the resource looks like this:

public class PersonServerResource 
        extends ServerResource implements PersonResource {
    @Override public Person getPerson() {
        return ... construct Person from domain data ...
    }
}

I register custom ConverterHelpers with the Restlet Engine. One of them knows how to use Jackson to convert Person into JSON, one knows how to obtain a Freemarker HTML template associated with the Person type and render it with the Person as its data model.

It works like magic. When I use a browser to point to the resource, it returns the HTML generated with Freemarker. When I ask for it with, say, jQuery, requesting JSON, I get ... JSON. When I get a Restlet ClientResource proxy for PersonResource at that URI, I get a Person object back when I call getPerson() on the client side; the value returned by getPerson() on the server is serialized to JSON, sent over the wire, and deserialized back into a Person object.

The only problem I had was for PUT and POST method handlers, because there was no automatic way to convert form data to my domain types. I had to create an additional method to handle forms as parameters:

public interface PeopleResource {
    @Post("json:json") Person addPerson(Person person);
    @Post("form:html") Person addPersonForm(Form person);
}

public class PeopleServerResource 
        extends ServerResource implements PeopleResource {


    @Override public Person addPerson(Person person) {
        ...
    }
    @Override Person addPersonForm(Form personForm) {
        Person person = new Person();
        // Extract values from personForm and set
        // them to person.
        return addPerson(person);
    }
}

I suffered this for a long time, and then I began to think that the application/www-form-urlencoded media type was, at least for simple types, not far from JSON. Could I translate form data to JSON and then use Jackson to deserialize that? Of course I could. In fact, it wasn't too hard to define some conventions whereby sequences (lists/arrays) and nested structures could be modeled. An encoded form for Person as produced  by an HTML page might look like this, in part:

name=Tim&address.1.street=Main&address.1.city=Anytown...
     ...&address.2.street=Elm&address.2.city=Smallton...

and the corresponding JSON would be:

{
    "name":"Tim",
    "address":[{
        "street":"Main",
        "city":"Anytown",
        ...
    },{
        "street":"Elm",
        "city":"Smallton",
        ...
    },
    ...
    ]
}

I wrote FormDeserializer to do this. It takes a Jackson ObjectMapper to do the heavy lifting. Here's the Javadoc for the deserialization method:


public  T deserialize(Form source, Class targetType)
Converts a Form to a Java object of the target type, using each name=value pair to set the corresponding property on the target object.

Compound names, using period (.) as the delimiter, are treated as pseudo-dereferences (a la JavaScript or Groovy) to set properties of sub-objects, e.g., a.b=c for bean targets is treated like a call to target.getA().setB(c).

Numeric components of compound names are treated as indices into a sequence named by the preceding components, e.g., a.1=c is treated as target.getA()[1] = c (or target.getA().set(1, c), if the "a" property of the target is a list rather than an array). Unless an element with index 0 is set, the indices are origin 1.

Sequences are also created by names with multiple values, e.g., a=x&a=y is equivalent to a.1=x&a.2=y, with the value of the "a" property in the target being a sequence of two values.
When a name appears both indexed and non-indexed, the last assignment wins: a=x&a.1=a1&a.2=a2 will set the "a" property to a sequence of values [a1, a2], but a.1=a1&a.2=a2&a=xwill set the "a" property to x.

If the top level consists only of integer indices, the subobjects will be interpreted as elements of a sequence (and T must be a List or List subtype instead).

The values are deserialized using the Jackson ObjectMapper that was used to construct this FormDeserializer. Any type that can be deserialized from a string can be used. While it is possible for Jackson to deserialize object graphs that have internal references, it is not possible for the form values to refer outside of themselves.

Runtime exceptions from Jackson conversion are propagated without exception translation. This could be considered a bug.
Parameters:
source - the Restlet Form object to be deserialized
targetType - the type of the target object into which the form is to be deserialized.
Returns:
the deserialized object of the target type


The converter class that uses this machinery is FormConverter. It uses a marker annotation, FormDeserializable, that signals when a class is suitable for deserialization from a form.

The upshot is that I can now write:

public interface PeopleResource {
    @Post Person addPerson(Person person);
}
// And no need for additional method in implementation!


Big reduction in the amount of boilerplate I have to write, and the code is easier to read.

Here are the links to the code in one place:


I have @Inject tags on some constructors, but you can ignore them unless you want to use them for dependency injection.

The implementation makes heavy use of Guava. If you don't or can't use Guava, then you'll have to roll your own machinery. (Good luck!)

Thursday, March 15, 2012

Restlet Guice extension considered ... unnecessary

I wrote some code back in 2008 to help inject my Restlet resources using Guice and then blogged about it. In 2009, the Restlet team added it to their "incubator" as a potential Restlet extension. (Calling it a Guice extension is a bit of a misnomer, since part of it is independent of Guice and applies to any JSR-330-compliant DI framework.) They've talked about promoting it out of the incubator as early as release 2.2. Update: It's now part of Restlet 2.2 and I've updated code references to point to the 2.2 branch.

More recently, I argued (and Restlet's Jérôme Louvel seemed to accept the argument) that Restlet needs a uniform way to create a non-standard Finder for all the Restlet classes that have methods accepting a Class<? extends ServerResource> in place of a Restlet. This would make the use of the Guice extension invisible when wiring up the routing structure: Currently you have to convert the resource type to a Restlet explicity,
    router.attach("/path/to/resource", finderFactory.finder(MyResource.class));

and the extra support would let you write:
    router.attach("/path/to/resource", MyResource.class);

It's all very gratifying, except I realized suddenly this week that for
most people who just want to inject their 
ServerResource 
subclasses it's completely unnecessary. 
It's much simpler to define 
doInit()
to inject the resource's members, and then you can just use the second, simpler form above.
I wrote an abstract base class extending ServerResource that does just that:


SelfInjectingServerResource.java

Notice that SelfInjectingServerResource uses only the standard @Inject annotation and nothing specific to Guice. In order to make it actually work with Guice, you need to tell Guice that you want self-injecting server resources. The following class is a Guice Module that accomplishes this by requesting static injection of the SelfInjectingServerResource class and providing an implementation of its nested MembersInjector interface to perform the injection.

SelfInjectingServerResourceModule.java

Install this module when creating the top-level injector, make sure that the server resources you want injected inherit from SelfInjectingServerResource, and it just works. Here's a JUnit test class that shows a trivial example of the idea in practice:

SelfInjectingServerResourceModuleTest.java

We'd like this technique to co-exist with other techniques that manage resource injection differently (e.g., the current Guice extension), without worrying about whether it's OK to extend SelfInjectingServerResource, so this code uses an atomic boolean to prevent multiple injections. (AtomicBoolean is probably overkill, but it can't hurt to be safe.)

[Deleted obsolete discussion of how to use prior to adoption as part of Restlet.]

I said that the old Guice extension was unnecessary for most users, but there are a few things the old code does that the new code doesn't do:
  1. It does constructor injection, letting you have final fields with injected values.
  2. It lets you create a Finder for a resource interface, decoupling the target of the routing from the resource implementation.
  3. It lets you use qualifiers when looking up resource types, further decoupling the target of an attachment from the implementation.
  4. It doesn't require inheritance from a common abstract base class.
  5. You don't have to remember to call super.doInit() when you override doInit
As it happens, I'm taking advantage of #2 in my current work. (Note that there's no way to avoid the explicit finder call in this case, even with added Restlet support, because the Restlet signatures expect a ServerResource subclass.) So I'm stuck with my mistake for the foreseeable future. But no one else need be, now.

Update (2013 July 9)

I got rid of my dependency on #2 above, and I am now using the new approach exclusively, and I strongly recommend others do the same.

Further update (2014 Sep 1)

The Restlet Guice extension package docs discuss how to use each of three approaches, so you aren't forced to use the approach that I prefer.

Monday, March 12, 2012

Further distributed CacheLoader developments

Updated 2012-Mar-13 and 2012-Mar-20; see notes at end.

In a discussion on the Guava mailing list after my previous post, Charles Fry and Louis Wasserman had some great ideas that spurred me to rewrite the distributed CacheLoader functionality from scratch.

The first part is static factory methods to turn an AsyncFunction into a CacheLoader:

CacheLoaders.java

This is independent of any mention of Hazelcast; it works with any AsyncFunction. The Iterable-ness of the keys passed to loadAll is preserved all the way through, so asynchronous processing can start before the keys have been completely iterated. This could be handy if, for example, the keys themselves are being generated asynchronously.

The other piece is a class with methods to create an AsyncFunction out of a Hazelcast-based Executor:

HazelcastAsyncFunction.java

Note that the public API of this class does not use any Hazelcast types.

The whole implementation is much cleaner and takes full advantage of existing Guava machinery. Here's what it looks like in use in my code:

    AsyncFunction asyncLookup =
        HazelcastAsyncFunction
            .from(syncLookup());
            .onExecutor(hazelcastExecutor())
            .withTaskKeyFunction(mapAccountToServer());

    LoadingCache cache = CacheBuilder.newBuilder().build(
        fromAsyncFunction(asyncLookup, 30L, TimeUnit.SECONDS));
    ...
    // Later on, this call causes the account lookups to be
    // magically distributed across the cluster:
    Map accountNameToInfo = cache.get(accountNames);

Update 2012-Mar-13: Louis Wasserman pointed out (in comments to this post) a race in the implementation of CacheLoaders. I've updated the code to remove the race. It doesn't use as many cool Guava tricks, but it's quite a bit simpler now.

Update 2012-Mar-20: Unpacked the example to make it less frightening.

Saturday, March 10, 2012

An invokeAll for Hazelcast ExecutorServices and a distributed CacheLoader

[A more recent posting describes a much cleaner implementation of the functionality described in this article.]

The ExecutorService returned by Hazelcast[Instance].getExecutorService() does not support invokeAll, but I needed this functionality for work I'm doing, so I rolled my own restricted version:

ExecUtil.java

ExecUtil has several variants of invokeAll. The generic variants take these values:

  • an ExecutorService, which is must have been returned by a Hazelcast getExecutorService call;
  • an Iterable of DistributedTasks<T>, where T is the common result type; and
  • a FutureCallback<T>, which is a Guava interface for specifying what to do with each returned result (and what to do, if anything, with exceptions thrown during task execution).
The non-generic variants use the Void result type. Generic and non-generic have two variants each, one that waits indefinitely and one that waits a given amount of time for all the results to finish.

The use of Iterable<DistributedTask<T>> allows lazy provision of the tasks. (Supporting this feature prevented a simpler implementation with just a CountDownLatch initialized to the number of tasks, because we don't know the number of tasks in advance.)

The main awkwardness is that both the Callable<T> used to create the DistributedTask and T itself must be Serializable, but this is, of course, a requirement imposed by DistributedTask.

Update (2012-Mar-11): I added a static method to ExecUtil that wraps an ExecutorService, implementing invokeAll in terms of ExecUtil.invokeAll if the argument was created by Hazelcast. Also added a ConcurrentFuction that wraps an existing Function for concurrent or distributed application.

A concurrent Guava CacheLoader

I then realized that this machinery could be used to provide a nice implementation of Guava's CacheLoader, used to build LoadingCache instances with CacheBuilder. LoadingCache has a getAll(Iterable<? extends K> keys) method that returns a Map<K, V> from keys to cached values. It calls CacheLoader.loadAll(Iterable<? extends K> keys) -- if it's implemented -- to load the values in order to cache them. If loadAll isn't implemented, it just loads the keys sequentially.

I wrote a CacheLoader implementation that can use ExecUtil.invokeAll to load the values using DistributedTasks. While I was at it, I made it so that if you don't have a Hazelcast-based ExecutorService, you can use a normal ExecutorService to load the values concurrently (in the same JVM).

The result is ConcurrentCacheLoader:

ConcurrentCacheLoader.java

It has a nested Builder class that allows you to specify:

  • the ExecutorService used (including shortcuts for default HazelcastInstance ExecutorServices),
  • a time limit for loadAll,
  • a function to map a cache key to a key object for the DistributedTask, and
  • the actual function that maps cache keys to values.
There's a minimal set of tests here:

ConcurrentCacheLoaderTest.java

Here's how I'm using it:

Sample usage

That's a ConcurrentCacheLoader.Builder used to build an argument to a CacheBuilder method.

The upshot is that my call to getAll runs on all the nodes in my Hazelcast cluster.

Addendum:

The motivation for all this was the desire to make dozens of calls to JClouds' BlobStore.getBlob() in the space of a single HTTP request. The standard for-loop approach was taking too long, and I suddenly realized I could be asking the Hazelcast cluster to do the work.

Monday, January 23, 2012

Deploying Restlet components in Elastic Beanstalk

I spent some time finding a way to deploy a Restlet component in ElasticBeanstalk without giving up the option of deploying it as a standard Java executable for local development and testing. People have expressed interest in seeing the approach, but I don't have the time to create a full-fledged framework, so I've extracted the following template instead (links to code embedded in the prose). Just copy to your own environment, changing package and class names as you see fit, add your Guice Modules, define your Restlet applications and resources, and you should be able to run the resulting component both standalone and (by bundling everything in a WAR) via Elastic Beanstalk. I make no claims that this code is correct, that it is safe to use, or even that it will compile. You have to read it, compile it, and judge for yourself.

The code depends on Restlet, Guice, Rocoto, Guava, and (minimally) SLF4J. I've included commented-out code that uses the Restlet-Guice extension, which is in incubator status in the Restlet codebase.

The Main class is a GuiceServletContextListener and it has a main method. When you run com.example.server.Main from the command line, the GuiceServletContextListener methods are ignored. When you deploy a WAR that contains the associated web.xml file, the main routine is never called. You can inject the current DeploymentMode into your classes if you want to change behavior depending whether you're running standalone or in Elastic Beanstalk (i.e., as a servlet).

The MainComponent class is where your Restlet component logic goes. Its lifecycle is managed by the MainService class, which is where you can manage other services with lifecycles. (If you use JClouds BlobStores, for example, you can ensure that a singleton BlobStoreContext is closed when the MainService is stopped. Another example: a LifecycleService for the Hazelcast data grid) There is some trickiness to handling shutdown in various cases; make sure you nest try-finally clauses so that every one of your services gets a chance to shut down even if the previous shutdown attempt throws an exception or error.

The MainServletModule and MainServlet classes deal with the details of embedding the component in a servlet (which is necessary in order to use Elastic Beanstalk).

AwsCredentialsModule is an example of how you can inject AWS credentials from either of two sources: system properties defined on the command line (or via Ant invocation) and system properties defined by an Elastic Beanstalk configuration.

The flow of control when running standalone is that the main routine creates an instance of Main with the STANDALONE deployment mode, creates the injector, gets the singleton MainService, and starts it, which causes (in a different thread), the MainComponent to be started.

The flow of control when running as a servlet is that a default instance of Main is created (due to the web.xml directive) and gets the SERVLET deployment mode. Its contextInitialized call creates the injector, which injects the MainService and starts it, causing (in a different thread), the MainComponent to be started. 

Files:
  • web.xml - web application descriptor
  • Main - has main() routine for standalone; is context listener for servlet mode
  • MainComponent - the Restlet component we want to be able to deploy in both modes
  • MainService - manages lifecycle of MainComponent and (optionally) other services
  • MainServletModule - included in module list in Main; configures Restlet-Servlet bridge
  • MainServlet - injects MainComponent and returns it as the component it wraps
  • DeploymentMode - enum with two values: STANDALONE and SERVLET
  • AwsCredentialsModule - example of passing AWS credentials for binding with @Named