June 23, 2010

NOSQL, JPA and Persistence of Generic Graph (Part II)

In the earlier post on this discussion on growing interest on NOSQL and its relation to JPA, I had outlined two main approaches of our inquiry. In this blog, I will discuss the first one of these questions:

how can JPA work with a more flexible, dynamic data model than strongly-typed POJO on a relational database?

To explore the question, let us consider an object graph -- a predominant and powerful construct for any object domain model. An object graph captures relation between a set of elements or nodes. Since Paul Erdos and Alfred Renyi charted out the formal mathematical properties of random graphs (where any pair of nodes being connected is equally likely), graph has remained an important subject of study. New studies on graphs, their topologies and properties are generating fascinating insights on our an increasingly connected world.
The elements of a graph are referred as node or vertex and their relation as link, edge or arc. The nature of the edge essentially characterizes a graph -- for example, whether an edge distinguishes its pair of terminal vertexes as source and target (if they do, it is called a directed graph or digraph, for short), whether an edge can originate and terminate at the same vertex (self-loop), or whether multiple edges can connect the same pair of vertexes (i.e. a multigraph).

Graph is about Relation

To decide on a suitable representation of graph, we notice that unlike other important data structures such as List, Set or Map popularized by excellent interfaces in java.util.* package, there is no strong consensus on a graph representation

in Java library. So let us define a graph interface for the scope of this exercise.

The key choices we make about such an interface are

  • a graph G is a specialized java.util.Set. The specialization is the ability to link a pair of elements of the Set.
  • a graph G can contain any type of element. In other words an element does not need to implement some interface or inherit some abstract class to be a member of a graph. This is in alignment with similar 'untyped' nature of membership in a typical java.util.Set. However, a graph is generically typed by the type of elements it can contain.
    For example, while a Graph<Object> contains all sorts of Objects, a Graph<People> can only contain People or its sub-types.
  • membership of an element e in a graph G will imply association and not composition i.e. lifetime of an element e is independent of that of the graph G and the same element can be a member of zero, one or more graphs at the same time.
  • the edges between elements are represented explicitly as a Relation type and the edge is directed i.e the two terminal elements of an edge is marked as source and target.
  • the edges are attributed. If more than one relation exists between a pair of elements, then instead of having multiple edges connecting the same pair of elements, we will qualify a single edge with different attributes. 

Given these choices, here is the Graph interface: (for the detailed JavaDoc see the FishEye source code repository).
import java.util.Set;
public interface Graph<E> extends Set<E> {
   <V1 extends E, V2 extends E> Relation<V1, V2> link(V1 source, V2 target);
   <V1 extends E, V2 extends E> Relation<V1, V2> delink(V1 source, V2 target);
   <V1 extends E, V2 extends E> Relation<V1, V2> getRelation(V1 source, V2 target);
   Set<E> getTargets(E source);
   Set<E> getSources(E target);
   <V extends E> Set<Relation<V,E>> getRelationsFrom(V source);
   <V extends E> Set<Relation<E,V>> getRelationsTo(V target);

As you can see, we have also defined Relation as a type and Graph interface is expressed in terms of Relation as a type.
The essential aspect of a relation is it is generic, directed and attributed.
public interface Relation<V1,V2> {
   V1 getSource();
   V2 getTarget();
   boolean hasAttribute(String key);
   Object getAttribute(String key);
   Relation<V1,V2> addAttribute(String key, Object value);
   Relation<V1,V2> removeAttribute(String key);
}

In this model, the elements have no intrinsic knowledge of other elements it is linked with. But a graph knows the relationship between its elements and hence can find out all the nodes directly reachable from a given node or all the nodes that are incident on a given target node. Linking a pair of nodes implicitly adds the nodes to the graph, if necessary. Removing a node will remove all relations that include the removed node as a source or target. Relation is also generically typed by the type of its terminal nodes. The careful reading will reveal that, unlike Graph definition, the generic types of a Relation do not inherit from a common root.

 

JPA Mapping a Graph to Relational Database

The interfaces for Graph and Relation say nothing about persistence. The types that will implement these interfaces will be persistence-capable.
These persistence-capable types and how they are mapped in a relational database are the main focus of this part of
our discussion. A Persistent Graph is a persistent (but abstract) version of Graph.

 

First-class Objects

The unusual thing about making Graph a first-class persistent entity is normally the container types (i.e. List or Set or Map) are mapped as second-class
objects. In object persistence nomenclature, a first-class entity carries a persistent identity while a second-class
object does not, though both of them are managed persistent objects. For example, though an instance L of java.util.List does not
have an independent persistent identity but the list L is still managed by the JPA runtime in the sense that if an
element e is added to the list L in a transaction, then a JPA provider will notice that change, mark the owner of the
list dirty and issue appropriate insert/update to the database when the transaction commits. A direct consequence of
having no persistent identity is inability to find or query a second-class objects. Hence, there is no way to query
for all List or Set instances. But making a graph a first class entity as proposed here will allow us to query for
all graph or find a graph with a particular identifier or containing a particular element.

@MappedSuperclass
public abstract class
PersistentGraph<E> extends AbstractGraph<E> {

    @Id @GeneratedValue private long id;

}


The actual states to represent a Graph can vary. A graph can be represented as a set of nodes and set of relations, or as an adjacency or incidence matrix. To begin with, let us consider a representation that stores a Graph as set of its edges or relations.

@Entity
public class
RelationGraph<E> extends PersistentGraph<E> {
    @OneToMany private Set<
PersistentRelation<E,E>> relations;

Relation as a persistent entity

Our chosen representation for Graph uses Relation interface that is generic, directed and attributed. The concrete persistent realization of Relation is PersistentRelation and it is defined as follows

@Entity
public class
PersistentRelation<V1,V2> implements Relation<V1,V2> {
  @OneToOne private V1 source;

  @OneToOne private V2 target;
  @ManyToMany private
Properties attrs;

The complete object model for a generic graph and its relation is depicted diagramatically below

omodel

 

0 Comments:

Post a Comment

Subscribe to Post Comments [Atom]

<< Home