JSON and object support in NoSQL database

This blog describes one of many features of Oracle NoSQL Database -- how its JDBC driver can store and query plain, old Java objects as JSON.

Oracle NoSQL Database is one of the most high-performing, robust NoSQL databases. It is also the most well-kept secret in NoSQL world.

JSON has become lingua franca of data representation. Accordingly, Oracle NoSQL Database, originally a key-value store, natively supports JSON as a basic database type among other composite types such as RECORD, MAP or ARRAY.

Definition for for database

The core pre-requisite to represent objects in database is to define a table with a column of JSON type. The column would store JSON representation of object.

A database table, OBJECT_TABLE, to store object is defined as:


CREATE TABLE OBJECT_TABLE (
 id LONG,
 object JSON,
 PRIMARY KEY (id)
)

The noteworthy aspects of database definition are

  • a primary key is defined separately.
  • a field of JSON type is declared to store object

Definition required for persistent object

The Java objects to be stored are POJOs -- Plain, old Java objects. No framework to extend, no interface to implement, no annotation.

However, a object must declare a property by bean style gettter/setter method i.e. a getX() and/or setX(X x) method would implictly declare a persistent property x.

Steps to store and query objects

The following code examples show steps to store an object and query to retrieve it using basic JDBC interface.

Getting a connection

To use the driver with a database, use a connection string such as jdbc:nosql://localhost:5000/nosqlstore where localhost:5000 is the host and listening port of a database server and nosqlstore is the name of the store.


   import java.sql.*

   String url = "jdbc:nosql://localhost:5000/nosqlstore";
   Connection con = DriverManager.getConnection(url);

Inserting an object

Once a connection has been obtained, an object is inserted via ResultSet.insertRow() method.
At first, create a result set by a simple query

   Connection con = ...;// the connection obtained in previous step
   String sql = "SELECT * FROM OBJECT_TABLE";
   
   ResultSet rs = con.createStatement().executeQuery(sql);


Next, use the result set to insert an object

   rs.moveToInsertRow();
   
   // create a POJO and set its properties
   SimplePOJO pojo = new SimplePOJO();
   pojo.setName("A blog example");
   pojo.setWordCount(5678);

   // the id must be set separately
   rs.updateLong("id", 1234);
   // update the result set with the object
   rs.updateObject("object", pojo);
   
   // insert the row to the database
   rs.insertRow();
   

The entire object state is stored in JSON field of a record in OBJECT_TABLE. However, the identifier of the record is declared and set separately from the object.

Query an object

Once the object has been stored, next step would be to retrieve the object by query.

Query by identifier

The straightforward way is to query the object by record identifier which is separate from object state.


  String sql = "SELECT * FROM OBJECT_TABLE WHERE id=1234"; // 1234 is the record identifer
  ResultSet rs = getConnection().createStatement().executeQuery(sql);
  rs.next();  
  SimplePojo pojo = rs.getObject("object",SimplePojo.class);

The object can be retrieved from result set column.

Runtime Type preservation

Notice the absence of any casting when the object is retrieved from result set column. No casting is necessary because Oracle NoSQL database stores the object with its runtime type. encoded in JSON representation.

Query by object property

The object is stored in its JSON representation. But it is important to note the object can be queried on its properties i.e. the database is aware of JSON structure and does not store it as a mere string or opaque blob.

The query on object property uses a navigational syntax using dot as separator. As the example shows wordCount is a property of the stored object (though it has not been declared as a database field). So the query on the property succeeds with usual comparison predicate.


  String sql = "SELECT * FROM OBJECT_TABLE t WHERE t.object.wordCount >= 5000"; 
  ResultSet rs = getConnection().createStatement().executeQuery(sql);
  rs.next();  
  SimplePojo pojo = rs.getObject("object",SimplePojo.class);

Object Graph Storage

The object used in the above example SimplePojo is simple, may be too simple. But the driver can store a more complex object graph. The next blog explores how to store and retrieve a complex object graph in Oracle NoSQL database using JDBC driver.

JDBC Driver for Oracle NoSQL Database

JDBC hardly needs any introduction. JDBC spec was released 30 years ago, in 1977 with Java version 1.1. Since then, JDBC has been widely adopted. Mostly used with RDBMS databases. Every RDBMS provides a JDBC driver.

But a JDBC driver for Oracle NoSQL database?
Wait ! There are two questions:
    What is Oracle NoSQL database?
    JDBC is for SQL. Why JDBC driver for NoSQL database?

Oracle NoSQL Database  is the best well-kept secret of NoSQL world. It is one of the best NoSQL database technical capability and performance wise. It is also least known. Though some serious, large-scale, global application runs on Oracle NoSQL database, it is hardly known widely as google or stackoverflow would show.

Oracle NoSQL is a distributed database. It is scalable, elastic. At its core lies robust and time-tested kernel of BerkeleyDB. Oracle NoSQL database overlays a network fabric over a set of BerkleyDB kernels to deliver a high-performance, low-latency distributed database that even provides transactional warranty in limited sense. 

Oracle NoSQL database, though a key-value database at its core, already provides a tabular view. And supports a query language that is quite similar to SQL. So, it is a natural and logical to have a JDBC driver for  Oracle NoSQL database. This blog introduces such a JDBC driver.

As I mentioned earlier, JDBC has been around and does not need to be repeated. Oracle JDBC driver is compliant to JDBC Spec version 4.2.

I would mention only one aspect of new JDBC driver. Its' powerful capacity to store, retrieve and query user-defined objects.

One of the major attractiveness of NoSQL  database is flexible schema. Oracle database natively supports JSON data type to store tree like nested data structure. JDBC driver builds on top of that native JSON facility  to support object support with polymorphic type preservation.

For a developer, it means that database remembers (databases typically have excellent memory) the runtime type of an object being stored. And later retrieve the object with the same type.

Let us assume an interface: Human. The interface is implemented by two concrete classes Male and Female.

Now, when an object is stored, its runtime type is not Human which is an interface but an instance of concrete type: Male or Female. Later the object can be realized in memory as a Male or Female type, not only as a Human.

Moreover, polymorphic relationship is also supported. Let us say, two humans are related by partner relationship. The relationship is declared at  Human level. For example,
     Human getPartner()

But when an object is stored, the partner object is either a Male or Female. Again, the object graph is retrieved with type information preserved. So if a Male who has a Female partner is stored, when retrieved, the partner would be an instance of a Female object.

C# Driver for Oracle NoSQL database

C# Driver for Oracle NoSQL database

Applications written in JavaScript, C, Python can use Oracle NoSQL database. Thanks to the drivers for respective languages.

C# driver is a new addition. C# driver also supports asynchronous   programming model.

Your application issue a request to search and return immediately. The search result, which can potentially be large in number, is returned via callback to an observer  supplied with original search request.

The release candidate of the driver is available from NuGet gallery and can be integrated to any C# application easily from Visual Studio.

An accompanying assembly includes a set of samples with source code to demonstrate how to use the driver.

Using locks in NoSQL database

Locks in NoSQL database

In this article, we explore Read-Modify-Write (RMW) lock which is equivalent to SELECT FOR UPDATE lock often found in RDBMS.

Commonly, NoSQL databases often do not use locking as in traditional RDBMS. But Oracle NoSQL database offers a set of advanced features that can be used to implement locking semantics.

Read-Modify-Write Lock

An RMW lock uses following simple API
  
    RMWLock.update(KVStore store, PrimaryKey pk, Updater updater);
where
  • store is a database connection
  • pk is the primary key of row being locked for update
  • updater is a function that updates the row values pointed by pk
The Updater is a functional interface where the application can implement custom update behavior.
  
    public interface Updater {
       void update(Row row);
    }

It is noteworthy that RMWLock may attempt to update the row more than once. This can happen if another thread is updating the row at the same time. In such cases, RMWLock would invoke the updater function on every retry with the latest state of the updated row.


An use case for RMWLock

Let us consider a typical use case for an RMW lock. Let us create a named, persistent sequence that generates monotonic integral values. A sequence is represented by a name and integer value in a database table.
A client, say A, locks the database row and increments the value exclusively by an amount L. If another thread B simultaneously attempts to do the same, the other thread B will block.
Once client A had updated the value from x to x+L, then only thread B will see the updated value x+L and then B can update the value again to a different value, say x+L+L.

A typical usage of RMW lock in this simple case would be:
        
          new RMWClient().update(store, pk, new Updater() {
              @Override
              public void update(Row row) {
                 int lastUpdatedValue = row.get("NEXT_VALUE").asInt().get();
                 row.put("NEXT_VALUE", lastUpdatedValue&nbsp+ 20);
              }
           };
Here we use the updater function to increment NEXT_VALUE by 20. The RMWLock ensures that NEXT_VALUE is updated atomically and without any other thread updating at the same time.

Resources

The code for RMWLock is available here. This code can directly be used in an application.
A more complete example of RMWLock to realize a persistent sequence is described in a separate blog.

TimeSeries Analysis with Apache Spark and Oracle NoSQL database

Hello What is a Timeseries?

Timeseries  is a sequence of Events, ordered by timestamp of the events. How to store and analyze  Timeseries data has always been interesting for financial and other business domains, but is attracting  more attention recently with IoT applications.

This article describes modelling and analysis of Timeseries data with Apache Spark and Oracle NoSQL database. 

A Timeseries is formally described as a sequence of events {e1, e2, … eN}
such that t(i) ≤ t(j) ∀ i < j where t(i)is the timestamp for i-th event ei

Timeseries model in NoSQL 

Timeseries data characteristics are different from RDBMS record

TimeSeries data is different in characteristics than conventional RDBMS record. The information of a timeseries  is in aggregate properties of its events. 

Timeseries data tends to be immutable, large in volume, ordered by time, and is primarily aggregated for access. 


why NoSQL?


A timeseries is modeled as a table in NoSQL database. The name of the timeseries is same as the table and hence usual restrictions on table name applies. A row in timeseries table represents a Time Slot.

A timeslot is block of events with some metadata attribute such as slot index, start and end time of the slot etc. A time slot is atomic unit of a timeseries  and an entire timeseries is a row of time slots.

A timeslot, as we would see later, provides the basic partition of timeseries and forms the basis for integration with Apache Spark.


Apache Spark

Apache Spark is a powerful cluster computing framework. Apache Spark performs a task on a large dataset by computing over partitions of the dataset where each computation may execute on different hosts of a cluster. Spark framework controls how partial tasks are distributed across the cluster hosts  and how to combine partial results.
Moreover, Spark models a computational task as a graph of actions where actions are lazily evaluated only when a result producing action needs to be computed.

Unique, monotonic sequence in NoSQL using SELECT FOR UPDATE style lock

Persistent Sequence in NoSQL database

What is a Database Sequence?

A database sequence is a monotonic series of unique numbers. A sequence is widely used in database for primary keys and other uniquely identifiable values. 

Database Sequence in NoSQL 

Because NoSQL database does not use locks, and data is maintained in multiple copies distributed across different storage medium, generating a series of numbers that are unique and monotonic across multiple threads (and applications) becomes a non-trivial problem.
To generate a sequence, we need a mechanism that can acquire a lock on a record, update its value and ensures that no other thread/process can update the same row in the meantime. This is typical  SELECT FOR UPDATE  behavior found in in RDBMS. It is also know as Read-Modify-Write pattern in database programming.
In this article, we show how advanced features of Oracle NoSQL database can achieve similar SELECT FOR UPDATE behavior.
We take a concrete example of persistence Sequence to   demonstrate this behavior. The persistennt sequence would generate a  series of values that are unique and monotonic>

From usage point of view, an user application builds a sequence with a SequenceBuilder  .  A  SequenceBuilder employs typical  builder pattern to create a Sequence:
 
     Sequence seq = new SequenceBuider()
                  .withStore(store)
                  .withName("mysequence")
                  .withIncrement(100)
                  .build();

A sequence is created with a database connection and an unique name. Additional configuration such increment i.e. as how many values are cached in a sequence and hence can be served without a network call to database can be specified.

Once a sequence is obtained, an application can get the series of values by:

   long val = seq.next()

For simplicity, without loss of generality, we assume that seq.next() method returns a long value.

In the code example above, the sequence is created to fetch 100 contiguous values from database at a time, and these values are cached in memory. But if seq.next() is called more than 100 times, next batch of values will be fetched from database transparently to user.

A sequence is monotonic across multiple threads, multiple application 

An application may create sequence from different threads, even different applications  may grab a sequence, but a sequence of same name created from same database always generate monotonic, unique sequence of numbers.

NoSQL advanced features to support Database Sequence

The key to the solution lies in special features of NoSQL database, namely 
  • Row Versioning: every database record carries a version 
  • Conditional put operation: a put() operation can be executed conditionally on record version.
  • Access to Previous State: previous state of a record is available

Row Versioning

NoSQL database uses Row as the basic abstraction for record. Every row carries a version.
Row.getVersion() returns a version which is an opaque array of bytes.  The version is changed on every modification of a row.

Conditional put() Operation

NoSQL provides a conditional put semantics where a row can only be put to the database if the row version in the database  match the version being put. The API method is the following: 


This conditional put facility is one of the main features used to implement Read-Modify-Write pattern.

Access to previous state

The ReturnRow argument on the about signature is one of the unique features in Oracle NoSQL database. A return row holds the previous state of a row. A return row can be supplied as an argument to a mutating operation such as putIfVersion(...).
The server would populate the return row with the state of the row before the operation. You can also specify which aspect of previous state be returned: only the previous version or value or both.

READ-MODIFY-WRITE pattern for Database Sequence

A unique, monotonic sequence is modelled with a name (string) and a value (long). We can add more attributes to a sequence, but these two fields are sufficient for current discussion.

To ensure the a sequence generates set of monotonic, unique numbers, the algorithm is to apply READ-MODIFY-WRITE pattern: 

  • READ  a record  by  name with ABSOLUTE read consistency option.
    Because NoSQL maintains multiple copies of data for robustness, a piece of data can be read from any of the copies (termed as Replication Node). Reading with ABSOLUTE consistency ensures that database has confirmed the data to be consistent across all the copies.
    Oracle NoSQL provides different consistency options to read data:  NONE_REQUIRED and ABSOLUTE. You can know about them in more details in the NoSQL documentation.
  • MODIFY update record value by incrementing its current value X by L. This value X denotes the next value of the sequence that has can be used, or in other words, values before X has already been reserved for use.
  • WRITE the value (X+L) back with putIfVersion(), supplying a return row. Again, as in the case of read, write the data COMMIT_SYNC durability warranty that would ensure that data is not only written to durable storage but was synchronized across copies to avoid any other update.

RMW pattern with version-based update

If putIfVersion() succeeds on thread T, it implies that no other thread had updated the record and the range (X, X+L] can be safely reserved for thread T. If another thread had indeed modified the record in the meantime, then the operation will fail (because version would not match), and return row will provide the last updated value X' and version V' of the record. So, thread T can call putIfVersion() again, but this time with X'+L and V'

This simple yet powerful RMW technique using features of Version, ReturnRow and putIfVersion(), it is possible to generate a monotonic sequence that is not only unique to multiple threads but even across different processes.

In code lies the truth...


Now is the time for some code. To summarize the discussion, i have packaged the concepts of  SequenceBuilder, Sequence and RMW lock and a JUnit test for verification in an Eclipse Project.  You can download (~8KB) from   here

Conclusion


This brief article highlights advanced features of NoSQL database to demonstrate how monotonic sequence of values can be generated   that are unique across threads and processes. 

New trillion