TimeSeries Analysis with Apache Spark and Oracle NoSQL database

Hello What is a Timeseries?

Timeseries  is a sequence of Events, ordered by timestamp of the events. How to store and analyze  Timeseries data has always been interesting for financial and other business domains, but is attracting  more attention recently with IoT applications.

This article describes modelling and analysis of Timeseries data with Apache Spark and Oracle NoSQL database. 

A Timeseries is formally described as a sequence of events {e1, e2, … eN}
such that t(i) ≤ t(j) ∀ i < j where t(i)is the timestamp for i-th event ei

Timeseries model in NoSQL 

Timeseries data characteristics are different from RDBMS record

TimeSeries data is different in characteristics than conventional RDBMS record. The information of a timeseries  is in aggregate properties of its events. 

Timeseries data tends to be immutable, large in volume, ordered by time, and is primarily aggregated for access. 


why NoSQL?


A timeseries is modeled as a table in NoSQL database. The name of the timeseries is same as the table and hence usual restrictions on table name applies. A row in timeseries table represents a Time Slot.

A timeslot is block of events with some metadata attribute such as slot index, start and end time of the slot etc. A time slot is atomic unit of a timeseries  and an entire timeseries is a row of time slots.

A timeslot, as we would see later, provides the basic partition of timeseries and forms the basis for integration with Apache Spark.


Apache Spark

Apache Spark is a powerful cluster computing framework. Apache Spark performs a task on a large dataset by computing over partitions of the dataset where each computation may execute on different hosts of a cluster. Spark framework controls how partial tasks are distributed across the cluster hosts  and how to combine partial results.
Moreover, Spark models a computational task as a graph of actions where actions are lazily evaluated only when a result producing action needs to be computed.

No comments:

Post a Comment

New trillion