Unique, monotonic sequence in NoSQL using SELECT FOR UPDATE style lock
Persistent Sequence in NoSQL database
What is a Database Sequence?
A database sequence is a monotonic series of unique numbers. A sequence is widely used in database for primary keys and other uniquely identifiable values.
Database Sequence in NoSQL
Because NoSQL database does not use locks, and data is maintained in multiple copies distributed across different storage medium, generating a series of numbers that are unique and monotonic across multiple threads (and applications) becomes a non-trivial problem.
To generate a sequence, we need a mechanism that can acquire a lock on a record, update its value and ensures that no other thread/process can update the same row in the meantime. This is typical SELECT FOR UPDATE behavior found in in RDBMS. It is also know as Read-Modify-Write pattern in database programming.
To generate a sequence, we need a mechanism that can acquire a lock on a record, update its value and ensures that no other thread/process can update the same row in the meantime. This is typical SELECT FOR UPDATE behavior found in in RDBMS. It is also know as Read-Modify-Write pattern in database programming.
In this article, we show how advanced features of Oracle NoSQL database can achieve similar SELECT FOR UPDATE behavior.
We take a concrete example of persistence Sequence to demonstrate this behavior. The persistennt sequence would generate a series of values that are unique and monotonic>
We take a concrete example of persistence Sequence to demonstrate this behavior. The persistennt sequence would generate a series of values that are unique and monotonic>
From usage point of view, an user application builds a sequence with a SequenceBuilder . A SequenceBuilder employs typical builder pattern to create a Sequence:
Sequence seq = new SequenceBuider() .withStore(store) .withName("mysequence") .withIncrement(100) .build();
A sequence is created with a database connection and an unique name. Additional configuration such increment i.e. as how many values are cached in a sequence and hence can be served without a network call to database can be specified.
Once a sequence is obtained, an application can get the series of values by:
Once a sequence is obtained, an application can get the series of values by:
long val = seq.next()
For simplicity, without loss of generality, we assume that seq.next() method returns a long value.
In the code example above, the sequence is created to fetch 100 contiguous values from database at a time, and these values are cached in memory. But if seq.next() is called more than 100 times, next batch of values will be fetched from database transparently to user.
In the code example above, the sequence is created to fetch 100 contiguous values from database at a time, and these values are cached in memory. But if seq.next() is called more than 100 times, next batch of values will be fetched from database transparently to user.
A sequence is monotonic across multiple threads, multiple application
An application may create sequence from different threads, even different applications may grab a sequence, but a sequence of same name created from same database always generate monotonic, unique sequence of numbers.
NoSQL advanced features to support Database Sequence
The key to the solution lies in special features of NoSQL database, namely
- Row Versioning: every database record carries a version
- Conditional put operation: a put() operation can be executed conditionally on record version.
- Access to Previous State: previous state of a record is available
Row Versioning
NoSQL database uses Row as the basic abstraction for record. Every row carries a version.
Row.getVersion() returns a version which is an opaque array of bytes. The version is changed on every modification of a row.
Conditional put() Operation
NoSQL provides a conditional put semantics where a row can only be put to the database if the row version in the database match the version being put. The API method is the following:
This conditional put facility is one of the main features used to implement Read-Modify-Write pattern.
Access to previous state
The ReturnRow argument on the about signature is one of the unique features in Oracle NoSQL database. A return row holds the previous state of a row. A return row can be supplied as an argument to a mutating operation such as putIfVersion(...).
The server would populate the return row with the state of the row before the operation. You can also specify which aspect of previous state be returned: only the previous version or value or both.
The server would populate the return row with the state of the row before the operation. You can also specify which aspect of previous state be returned: only the previous version or value or both.
READ-MODIFY-WRITE pattern for Database Sequence
A unique, monotonic sequence is modelled with a name (string) and a value (long). We can add more attributes to a sequence, but these two fields are sufficient for current discussion.
To ensure the a sequence generates set of monotonic, unique numbers, the algorithm is to apply READ-MODIFY-WRITE pattern:
- READ a record by name with ABSOLUTE read consistency option.
Because NoSQL maintains multiple copies of data for robustness, a piece of data can be read from any of the copies (termed as Replication Node). Reading with ABSOLUTE consistency ensures that database has confirmed the data to be consistent across all the copies.
Oracle NoSQL provides different consistency options to read data: NONE_REQUIRED and ABSOLUTE. You can know about them in more details in the NoSQL documentation. - MODIFY update record value by incrementing its current value X by L. This value X denotes the next value of the sequence that has can be used, or in other words, values before X has already been reserved for use.
- WRITE the value (X+L) back with putIfVersion(), supplying a return row. Again, as in the case of read, write the data COMMIT_SYNC durability warranty that would ensure that data is not only written to durable storage but was synchronized across copies to avoid any other update.
RMW pattern with version-based update
If putIfVersion() succeeds on thread T, it implies that no other thread had updated the record and the range (X, X+L] can be safely reserved for thread T. If another thread had indeed modified the record in the meantime, then the operation will fail (because version would not match), and return row will provide the last updated value X' and version V' of the record. So, thread T can call putIfVersion() again, but this time with X'+L and V'.
This simple yet powerful RMW technique using features of Version, ReturnRow and putIfVersion(), it is possible to generate a monotonic sequence that is not only unique to multiple threads but even across different processes.
In code lies the truth...
Now is the time for some code. To summarize the discussion, i have packaged the concepts of SequenceBuilder, Sequence and RMW lock and a JUnit test for verification in an Eclipse Project. You can download (~8KB) from here
Conclusion
This brief article highlights advanced features of NoSQL database to demonstrate how monotonic sequence of values can be generated that are unique across threads and processes.
Comments
Post a Comment