Overview of Apache Spark as a Processing Engine and Caching Service in Zoomdata
Zoomdata leverages Apache Spark to serve as both a processing engine and a caching service. Specifically, Zoomdata uses Spark's capabilities to:
- Cache result sets in data frames
- Perform calculations, totals and pivots on results
- Execute Fusion joins (a unique feature in Zoomdata that joins disparate, connected data sources to become a new data source)
Zoomdata is able to supplement a data source's capabilities when analytic functionality is not supported directly by the underlying source. In those situations, Spark can be leveraged to perform filtering and aggregation of the datasets. For example, some NoSQL or Search sources may not natively support aggregation, and as a result Zoomdata leverages Spark to perform those aggregations. Figure 1 illustrates the data flow for data sources.
- The embedded Spark instance that ships with Zoomdata is based on Spark v1.5.1.
- The default configuration for the embedded Spark Server is best used for demo or testing purposes. But you can configure this local instance of Spark. Refer to the article Changing the Default Configuration for an Embedded Spark Server for guidance.
Deploying Spark in a Highly Available Environment
For assistance, contact Zoomdata Technical Support .