Zoomdata Version

Overview of Apache Spark as a Processing Engine and Caching Service in Zoomdata

Zoomdata leverages Apache Spark to serve as both a processing engine and a caching service. Specifically, Zoomdata uses Spark's capabilities to:

  • Cache result sets in data frames
  • Perform calculations, totals and pivots on results
  • Execute Fusion joins (a unique feature in Zoomdata that joins disparate, connected data sources to become a new data source)

Zoomdata is able to supplement a data source's capabilities when analytic functionality is not supported directly by the underlying source.   In those situations, Spark can be leveraged to perform filtering and aggregation of the datasets. For example, some NoSQL or Search sources may not natively support aggregation, and as a result Zoomdata leverages Spark to perform those aggregations. Figure 1 illustrates the data flow for data sources.

Figure 1

Some file-based data sources require the use of the SparkIt functionality to ingest their data. An article covering SparkIt is under development. In the meantime, if you have questions about SparkIt, contact Zoomdata Technical Support .

Notes

  • The embedded Spark instance that ships with Zoomdata is based on Spark v1.5.1.
  • The default configuration for the embedded Spark Server is best used for demo or testing purposes. But you can configure this local instance of Spark. Refer to the article Changing the Default Configuration for an Embedded Spark Server for guidance.

Deploying Spark in a Highly Available Environment

For assistance, contact Zoomdata Technical Support .