August 19, 2010

Google Web Toolkit (GWT) + OpenJPA + Tomcat + MySQL

IMOs as Introduction


Google Web Toolkit is going to change the way web clients are developed. At least for those developers who love their Java. In its early days, Java grabbed the developers' mindshare with applets (remember them?), but soon it was JavaScript that reigned the browser runtime as Java spread its wing in the unexpected territory of server side infrastructure middleware and services. Naturally, a large segment of multi-tier web applications adopted a structural layering where the client (web browser) and the server (the application running in a container) spoke two different languages -- JavaScript and Java respectively. A bunch of smart techniques grew organically to bridge the interoperability of two languages. These collective knowledge of the community was waiting for a moniker -- and thanks to Jesse James Garrett -- it got one - it was called AJAX (Asynchronous JavaScript and XML). It was a powerful moniker that could pack a room with audience in any JavaOne 2005 session by the sheer virtue of AJAX appearing somewhere in its presentation title. The term AJAX came to represent a set of programming model and (often asynchronous) communication techniques between a JavaScript based client running in a browser and a remote server written mostly in Java or C#. Google Web Toolkit adopted this core paradigm of AJAX -- but introduced a radical and perhaps far-reaching element. The client is no more written in JavaScript but in plain old Java (albeit a shortened version of it). Though the client was written in Java -- the code ran as a JavaScript code in the browser. The unique contribution of GWT is to introduce a cross-compiler  that compiled Java source code into a JavaScript executable.  The cross-compilation process not only expanded the server-side Java developers' comfort zone but also solved one of the hardest problems of web client development -- namely the cross-browser compatibility.

As part of our ongoing exercise to bring OpenJPA to an wider audience and focus on its feature-rich, highly customizable aspects -- we are always looking for interesting techno-ecosystems where OpenJPA can operate. From that standpoint, we decided to build a multi-tier web application using GWT in client, OpenJPA in server running inside a Tomcat servlet container. In this blog, I will retrace the steps of building this
sample application. But let me be clear upfront. I had hardly ever programmed an web client. So this is not an expert's commentary -- but more of a tyro's journal -- at least as far as GWT is concerned.

Though I had hardly ever programmed an web client -- I remember writing (sheer by chance, way back in 2005) a JavaScript based AJAX application to demonstrate usage of freshly minted JPA 1.0 version of Kodo running in BEA Weblogic Server. For that exercise, Direct Web Remoting (DWR) -- I found it clean and simple -- helped to convert the server side Java objects and client-side JavaScript and vice versa. I must admit my discomfort with JavaScript during that brief encounter -- it was powerful language but made me - a lover of strong typing and verbosity of Java -- uncomfortable with its loose typing and cryptic syntax -- as if a sharp sword in the hand of an unsure novice who may just cut himself.

Five years later -- as I revisited the current state of client side web technologies to look for an ecosystem suitable for OpenJPA -- Google Web Toolkit did attract my attention. The website came with clear, concise description of the framework and usage to get me started quickly. But the major point of my attraction was something else. I am a strong believer of unified representation. I believe the business  domain objects should be expressed once and used uniformly across the application tiers. That was one of the original promises of Object Oriented Analysis and Design to a Fortran-4 programmer. You can imagine my horror when I saw that promise broken and proliferation of "patterns" such as Data Access Object, Data Transfer Object etc. to express the same notion at multiple tiers and endless and useless complexity of copying from one representation to other. Frameworks such as DWR did ease the pain of fractured representation but still it maintained at least a pair of representations for the same notion -- once in Java and other in JavaScript. Looking at GWT, I hoped that it will restore that sanity -- I can define a plain old Java type such as Customer and use the same representation across tiers -- in the server, in the browser, in the communication channel, towards the persistent storage. Of course a set of frameworks will assist to interpret/serialize/marshal/map the state of Customer object between these different operating environments -- and as a developer I do rely and expect them to provide me such conversion without having to resort to multiple representations to express the same notion of a Customer.   


Start Cooking


Enough generalities. Let us get our hands dirty. These are the few ingredients you will need to start cooking (but you do not need all of them at once):

  1. 1 tablespoon of GWT SDK available from the download site. I used the latest version of 2.0.4.  Around the next aisle, you will also find an Eclipse Plugin for GWT. I deliberately avoided the temptation of using the plugin for this recipe. I wanted to understand how different pieces of GWT hold together  -- and was sure that a plugin will smooth out many of those steps so that I can happily click this button or that on my Eclipse IDE without having any clue of what is happening beneath the surface. Simply, a plugin often makes a cooking experience too sugary.
  2. 1 packet of OpenJPA available from the download site. I used the cutting edge (SNAPSHOT) version of 2.1.0. OpenJPA comes in a ready-to-cook package with all its dependencies (such as Java Persistence API definition, Apache Commons utilities etc).  Because of the experimental nature of this exercise I was not too concerned about using a nightly snapshot version over a released version such 2.0. Actually OpenJPA nightly snapshots built from the trunk are as stable as its released version. Because any change injected in OpenJPA trunk is highly monitored both manually and automatically (thanks to JetBrains and TeamCity and some of the excellent people who manages OpenJPA integration test infrastructure)
  3. 1 can of true and trusted Tomcat. Tomcat 7.0 is available but I decided to stick to earlier version of 6.0.29.  
On the marinade side,
  • Java SDK version 1.6.
  • Ant build environment. Again I prefer Ant for such demonstration examples because it not makes each step of compilation, packaging, deployment explicit.



Architecture on a napkin

It always make sense to begin with interfaces. Let us say, ExampleService is the interface to the service. GWT requires that ExampleService to extend from RemoteService - an interface defined by GWT framework in com.google.gwt.user.client.rpc package. Moreover, GWT requires that an asynchronous counterpart of ExampleService interface must be defined. This asynchronous interface must be named ExampleServiceAsync. The naming is important because, at Java language level, ExampleService and ExampleServiceAsync do not know of each other. The GWT framework ensures that the pair of interface exists (by their naming convention) and also confirms that their signature is compatible. The compatibility means that every method of ExampleService has an asynchronous version on ExampleServiceAsync interface. The asynchronous version of an original method always returns void and the return type T of the original method becomes the last argument of generic type AsyncCallback<T> in the asynchronous method. For example, the following method in ExampleService

import com.google.gwt.user.client.rpc.RemoteService;
public interface ExampleService extends RemoteService {
Stock findStock(String symbol);
}


will appear in the asynchronous interface as

 
import com.google.gwt.user.client.rpc.RemoteService;
/**
* Asynchronous counterpart to the original service interface.
*/
public interface ExampleServiceAsync {
void findStock(String symbol, AsyncCallback callback);
}


The GWT client will actually call the asynchronous interface. The diagram below puts all the above together:



June 23, 2010

NOSQL, JPA and Persistence of Generic Graph (Part II)

In the earlier post on this discussion on growing interest on NOSQL and its relation to JPA, I had outlined two main approaches of our inquiry. In this blog, I will discuss the first one of these questions:

how can JPA work with a more flexible, dynamic data model than strongly-typed POJO on a relational database?

To explore the question, let us consider an object graph -- a predominant and powerful construct for any object domain model. An object graph captures relation between a set of elements or nodes. Since Paul Erdos and Alfred Renyi charted out the formal mathematical properties of random graphs (where any pair of nodes being connected is equally likely), graph has remained an important subject of study. New studies on graphs, their topologies and properties are generating fascinating insights on our an increasingly connected world.
The elements of a graph are referred as node or vertex and their relation as link, edge or arc. The nature of the edge essentially characterizes a graph -- for example, whether an edge distinguishes its pair of terminal vertexes as source and target (if they do, it is called a directed graph or digraph, for short), whether an edge can originate and terminate at the same vertex (self-loop), or whether multiple edges can connect the same pair of vertexes (i.e. a multigraph).

Graph is about Relation

To decide on a suitable representation of graph, we notice that unlike other important data structures such as List, Set or Map popularized by excellent interfaces in java.util.* package, there is no strong consensus on a graph representation

in Java library. So let us define a graph interface for the scope of this exercise.

The key choices we make about such an interface are

  • a graph G is a specialized java.util.Set. The specialization is the ability to link a pair of elements of the Set.
  • a graph G can contain any type of element. In other words an element does not need to implement some interface or inherit some abstract class to be a member of a graph. This is in alignment with similar 'untyped' nature of membership in a typical java.util.Set. However, a graph is generically typed by the type of elements it can contain.
    For example, while a Graph<Object> contains all sorts of Objects, a Graph<People> can only contain People or its sub-types.
  • membership of an element e in a graph G will imply association and not composition i.e. lifetime of an element e is independent of that of the graph G and the same element can be a member of zero, one or more graphs at the same time.
  • the edges between elements are represented explicitly as a Relation type and the edge is directed i.e the two terminal elements of an edge is marked as source and target.
  • the edges are attributed. If more than one relation exists between a pair of elements, then instead of having multiple edges connecting the same pair of elements, we will qualify a single edge with different attributes. 

Given these choices, here is the Graph interface: (for the detailed JavaDoc see the FishEye source code repository).
import java.util.Set;
public interface Graph<E> extends Set<E> {
   <V1 extends E, V2 extends E> Relation<V1, V2> link(V1 source, V2 target);
   <V1 extends E, V2 extends E> Relation<V1, V2> delink(V1 source, V2 target);
   <V1 extends E, V2 extends E> Relation<V1, V2> getRelation(V1 source, V2 target);
   Set<E> getTargets(E source);
   Set<E> getSources(E target);
   <V extends E> Set<Relation<V,E>> getRelationsFrom(V source);
   <V extends E> Set<Relation<E,V>> getRelationsTo(V target);

As you can see, we have also defined Relation as a type and Graph interface is expressed in terms of Relation as a type.
The essential aspect of a relation is it is generic, directed and attributed.
public interface Relation<V1,V2> {
   V1 getSource();
   V2 getTarget();
   boolean hasAttribute(String key);
   Object getAttribute(String key);
   Relation<V1,V2> addAttribute(String key, Object value);
   Relation<V1,V2> removeAttribute(String key);
}

In this model, the elements have no intrinsic knowledge of other elements it is linked with. But a graph knows the relationship between its elements and hence can find out all the nodes directly reachable from a given node or all the nodes that are incident on a given target node. Linking a pair of nodes implicitly adds the nodes to the graph, if necessary. Removing a node will remove all relations that include the removed node as a source or target. Relation is also generically typed by the type of its terminal nodes. The careful reading will reveal that, unlike Graph definition, the generic types of a Relation do not inherit from a common root.

 

JPA Mapping a Graph to Relational Database

The interfaces for Graph and Relation say nothing about persistence. The types that will implement these interfaces will be persistence-capable.
These persistence-capable types and how they are mapped in a relational database are the main focus of this part of
our discussion. A Persistent Graph is a persistent (but abstract) version of Graph.

 

First-class Objects

The unusual thing about making Graph a first-class persistent entity is normally the container types (i.e. List or Set or Map) are mapped as second-class
objects. In object persistence nomenclature, a first-class entity carries a persistent identity while a second-class
object does not, though both of them are managed persistent objects. For example, though an instance L of java.util.List does not
have an independent persistent identity but the list L is still managed by the JPA runtime in the sense that if an
element e is added to the list L in a transaction, then a JPA provider will notice that change, mark the owner of the
list dirty and issue appropriate insert/update to the database when the transaction commits. A direct consequence of
having no persistent identity is inability to find or query a second-class objects. Hence, there is no way to query
for all List or Set instances. But making a graph a first class entity as proposed here will allow us to query for
all graph or find a graph with a particular identifier or containing a particular element.

@MappedSuperclass
public abstract class
PersistentGraph<E> extends AbstractGraph<E> {

    @Id @GeneratedValue private long id;

}


The actual states to represent a Graph can vary. A graph can be represented as a set of nodes and set of relations, or as an adjacency or incidence matrix. To begin with, let us consider a representation that stores a Graph as set of its edges or relations.

@Entity
public class
RelationGraph<E> extends PersistentGraph<E> {
    @OneToMany private Set<
PersistentRelation<E,E>> relations;

Relation as a persistent entity

Our chosen representation for Graph uses Relation interface that is generic, directed and attributed. The concrete persistent realization of Relation is PersistentRelation and it is defined as follows

@Entity
public class
PersistentRelation<V1,V2> implements Relation<V1,V2> {
  @OneToOne private V1 source;

  @OneToOne private V2 target;
  @ManyToMany private
Properties attrs;

The complete object model for a generic graph and its relation is depicted diagramatically below

omodel

 

NOSQL, JPA and Persistence of Generic Graph

NOSQL revives an old debate

Rapidly growing NOSQL Movement has revived an old technical debate on scope and applicability of relational database for a class of applications. Mainly stemming from the demands of web-scale applications such as eBay, LinkedIn, FaceBook or Google -- the proponents of these growing movement are exploring next generation of non-relational databases and a non-ACID transaction model that has remained the mainstay of relational databases for several decades. A host of alternative persistent storage Dynamo, BigTable, Cassandra, HBase and many others have already been operational in highly demanding production environments. These non-relational storage systems are designed to be horizontally scalable with distributed partitions, replicated for high availability and often as a schema-free key-value store. Of course, nothing comes free. The gainful characteristics of non-relational databases often come at the cost of lowering consistency warranty (the C in ACID) (redefining it as BASE in the process) or limiting the queries only on their keys and not on attributes/relations of data available that are basic capabilities of any relational database.

 

JPA is no SQL too

Java Persistence API (JPA) is also is a rapidly growing technology, through may not be as popular as NOSQL. Like NOSQL, JPA also grew out of a limitation of relation databases albeit of a different kind. This limitation, commonly known as object-relational impedance mismatch, refers to the problem of representing an object-oriented model to a relational database schema. The common concepts of object models in Java language such as how multi-cardinality relations are expressed as collections or how separate references must be used for bi-directional relationship significantly differ from the equivalent metaphors that express relations in a relational database. JPA solution to this problem was to map the object model to a relational schema. JPA promotes a programming model where the application developer expresses the business logic entirely on a purely object-oriented model while the JPA provider automatically maps these operation to appropriate SQL required to insert, update or query the relational database. Hence, an application developer using JPA does not have to write a single line of SQL either. This different kind of "no SQL" feature of JPA, however, often generates a certain degree of unease among the developers on loss of control of their familiar and beloved SQL. In fact, JPA does provide ability to use SQL directly or ancillary facilities for intercepting the generated SQL for the application developer. The major power of JPA comes from the fact that JPA retains the entire power of ACID transaction model and powerful query capability offered by the underlying relational database systems. But unlike schema-free NOSQL usage scenario where arbitrary data structures can be persisted, JPA requires a strongly-typed a priori Java object model be mapped to relational database schema. On the other side of the coin, JPA only works with a single relational database. JPA neither defines a methodology to work with distributed partitions nor with a persistent storage system that does not talk SQL.

 

JPA Perspective on NOSQL

Given this partial overlap of purpose between NOSQL and JPA and their respective differences, the question I asked the following:

  • how can JPA work with a more flexible, dynamic data model than strongly-typed POJO on a relational database?
  • how can JPA work with new generation of NOSQL databases?

In future installments, I will elaborate on these questions based on some concrete examples.

October 22, 2009

L2 Cache in JPA 2.0 and a OpenJPA plug-in for named partitions of Coherence

JPA 2.0 standardizes L2 cache contract

Everyone loves cache. So does Java Persistence API(JPA). A typical JPA-based application uses two tiers of cache. The top-tier cache, called L1 cache or object cache, is an integral part of JPA specification. Each L1 cache

  • stores instances managed by a persistence context
  • lives in memory
  • its lifetime is the same as that of a persistence context (i.e. an EntityManager)

when you call EntityManager.contains(Object pc), the return value implies whether the input instance exists in L1 cache or not.

The second tier of cache, called L2 cache or Data Cache, is often leveraged by a JPA provider but JPA 1.0 did not specify it. JPA Expert Group did acknowledge the widespread use of these second-tier caches for practical applications and, in JPA 2.0, had specified a contract between a JPA provider and a distributed cache provider. This contract does not mandate a JPA provider such as OpenJPA or Hibernate or EclipseLink to come up with their own distributed cache technology. But the contract standardizes how a JPA application can access and control certain aspects of an L2 cache. The contract is simple and straightforward. A JPA application

  • can get an handle to the L2 cache by EntityManagerFactory.getCache()
  • javax.persistence.Cache interface is specified as

public interface Cache {

    public boolean contains(Class cls, Object primaryKey);

    public void evict(Class cls, Object primaryKey);

    public void evict(Class cls);

    public void evictAll();

}

  • can control whether instances are to be read/written from/to the L2 cache  prior/after corresponding database operations via enumerated properties: CacheStoreMode and CacheRetrieveMode.

If the contract appears partially complete (such as there is no pin()/unpin() functionality), that is a conscious choice by the JPA 2.0 expert group. A partial contract from JPA specification is both pragmatic and wise decision. Because distributed caches are now a mainstream technology by their own accord with many prominent products [1], it would not have been prudent to wrap all their functionality via JPA. Instead the application can get a handle to the L2 cache and can unwrap it to a specific cache provider to access its more specialized features. 

Though the contract is now certified by JPA 2.0 specification, every JPA vendor had supported integration with L2 cache via proprietary mechanics in JPA 1.0.

The basic semantics of these L2 caches as seen from a JPA provider's perspective differs from L1 cache in several ways. An L2 cache

  • stores instances as data not as objects that the application sees. This is because the lifetime of L2 cached data is often much longer than the lifetime of an application object used by a particular persistence context. A well-behaved JPA provider is not supposed to hold a reference to the application objects after they have been dereferrenced by the application itself. 
  • stores data from all the persistence contexts created by a persistence unit or may even be storing data from multiple persistence units
  • not necessarily holds data in memory. In fact, for improved scalability, the actual data is spread across many process memories. 

 

OpenJPA-Coherence Plugin

OpenJPA had always supported integration with L2 cache providers through its plug-in architecture. OpenJPA also provides a native, in-memory, self-healing L2 cache that can synchronize with other in-memory L2 caches in remote processes. With few lines of code, you can plug-in your favorite cache provider to OpenJPA runtime. The default implementation of OpenJPA, however, have not taken fully into account that L2 cache providers often use partitions rather than a gigantic singular memory space. For example, Coherence -- a popular distributed cache that is sometimes knows as the most expensive HashMap -- supports the notion of named partitions. Here is a brief example [2] of a OpenJPA plug-in that will allow an OpenJPA-based application to use Coherence named partitions.

It is simple to use. In JPA application configuration i.e. META-INF/persistence.xml specify the following property

<property name="openjpa.DataCacheManager" value="coherence"/>

Annotate your entities with OpenJPA-specific @DataCache annotation as follows

@Entity
@DataCache(name="Coherence-Cache#1")
public class PObject {...}

@Entity
@DataCache(name="Coherence-Cache#2")
public class QObject {...}

With these simple configuration, now instances of PObject and QObject will be cached in named partitions Coherence-Cache#1 and Coherence-Cache#2 respectively.

The complete source code for the plugin is available for download from here. It is offered merely as a demonstrative example of how easy to use OpenJPA plug-in architecture to add functionality for a useful purpose. This plug-in also demonstrates how the named partitioning feature of a L2 cache provider can be leveraged by OpenJPA (which by default, it did not).

However, the careful reader will recognize two shortcomings on how OpenJPA interacts with partitioned L2 cache.

  • Firstly, the distribution of instances to cache partition is based on entity type. All instances of the same type (and its subtypes) are stored  in the same partition. In real-life, an application may prefer a more fine-grained, attribute-driven, instance-based partition strategy where a CustomerDetails instance is cached in one of the partitions based on its zip code or first name or some such thing.
  • Secondly, the policy is static -- the application can not change the assignment of instances to a particular partition at runtime. You have to make the decision in source code annotation and you have no way to change your mind in runtime.

The good news is that these shortcomings are going to disappear. It is quite probable that OpenJPA 2.0 will support

  • partitioning feature, if provided by the cache provider, out-of-the-box
  • the managed instances can be assigned to the partitions at an instance level based on their attributes
  • distribution policy can be configured at runtime

 

[1] Prominent L2 cache providers: ObjectGrid and DynaCache from IBM, Coherence from Oracle, Terracotta, Gemstone, Ehcache, memcached, Velocity (being planned by Microsoft) among many others.

[2] Download OpenJPA Plug-in Source code

[3] The original implementation of this plug-in were posted in OpenJPA Mailing List