TimeSeries Analysis with Apache Spark and Oracle NoSQL database
What is a Timeseries?
Timeseries is a sequence of Events, ordered by timestamp of the events. How to store and analyze Timeseries data has always been interesting for financial and other business domains, but is attracting more attention recently with IoT applications.
This article describes modelling and analysis of Timeseries data with Apache Spark and Oracle NoSQL database.
This article describes modelling and analysis of Timeseries data with Apache Spark and Oracle NoSQL database.
A Timeseries is formally described as a sequence of events {e1, e2, … eN}
such that t(i) ≤ t(j) ∀ i < j where t(i)is the timestamp for i-th event eiTimeseries model in NoSQL
Timeseries data characteristics are different from RDBMS record
TimeSeries data is different in characteristics than conventional RDBMS record. The information of a timeseries is in aggregate properties of its events.
Timeseries data tends to be immutable, large in volume, ordered by time, and is primarily aggregated for access.
why NoSQL?
A timeseries is modeled as a table in NoSQL database. The name of the timeseries is same as the table and hence usual restrictions on table name applies. A row in timeseries table represents a Time Slot.
A timeslot is block of events with some metadata attribute such as slot index, start and end time of the slot etc. A time slot is atomic unit of a timeseries and an entire timeseries is a row of time slots.
A timeslot, as we would see later, provides the basic partition of timeseries and forms the basis for integration with Apache Spark.
Apache Spark
Apache Spark is a powerful cluster computing framework. Apache Spark performs a task on a large dataset by computing over partitions of the dataset where each computation may execute on different hosts of a cluster. Spark framework controls how partial tasks are distributed across the cluster hosts and how to combine partial results.Moreover, Spark models a computational task as a graph of actions where actions are lazily evaluated only when a result producing action needs to be computed.
No comments:
Post a Comment