Zoomdata Version

How Zoomdata Caches  Data

Zoomdata uses different caching methods for the different types of data that it works with:

  • For raw data, Zoomdata offers an optional ingestion using Zoomdata's unique SparkIt cache.
  • For aggregated result sets, Zoomdata leverages Spark to cache aggregated result sets.
  • ​For streaming data, the information is cached and persisted in MongoDB.

RAW DATA CACHING WITH SPARKIT

Zoomdata levereages the capabilities of Apache Spark to provide for a unique 'SparkIt' cache that will ingest raw data so that aggregrations, calculations and other activities are possible when exploring the dataset. As such, Zoomdata recommends that you enable SparkIt when you have data stores that do not support analytical queries (for example, S3, HDFS, and SaaS sources).

When connecting these types of data sources to Zoomdata, you will have the option to enable SparkIt during the connection process. Specifically, during connection setup, you will find the 'SparkIt' toggle switch on the 'Tables' page (as shown in Figure 1).


Figure 1

Creating a Custom SQL with a Group-by Clause with SparkIt enabled would result in aggregated data in SparkIt, not just raw data.

Data Flow with  Both SparkIt and Zoomdata Resultset Caching Enabled

  1. After connecting to a data source, the data starts being loaded into SparkIt.
  2. Once a chart is created, a request is sent to Zoomdata Cache.
  3. If the requested data is not found in Zoomdata Cache, the request is sent to SparkIt.
  4. The retrieved data is sent to Zoomdata Cache and stored there.
  5. The chart displays the requested data.


Figure 2

Data Flow with SparkIt Disabled and Zoomdata Resultset Caching Enabled

Zoomdata Cache stores all the results of aggregated requests from your data source  (Figure 3). In this scenario, when a chart is created the request is first sent to Zoomdata Cache (1). If the required results are found there, they are visualized on your chart (2).


Figure 3

Otherwise, the data flow is as follows (Figure 4):

  1. The request is sent to Zoomdata Cache.
  2. If the required results are not found in Zoomdata Cache, the request is sent to your data source.
  3. From your data source the results are sent to Zoomdata Cache and stored there.
  4. The chart displays the requested data.


Figure 4

For additional information about how Zoomdata implements Spark, refer to the article How Zoomdata Uses Apache Spark .

DISABLING Zoomdata RESULTSET CACHING

Zoomdata resultset cache is a temporary storage of the aggregated data from your data sources. By default, caching is enabled for all data sources. However, you can disable it if your data source is constantly being updated, or you do not want to allocate the required RAM, or performance of your data source is high, so you do not need to store the aggregated queries.

Data Flow with SparkIt Enabled and Zoomdata Resultset Caching Disabled

  1. After connecting to a data source, the data starts being loaded into SparkIt.
  2. Once a chart is created, a request is sent to SparkIt.
  3. The chart displays the requested data.


Figure 5

Data Flow with SparkIt and Zoomdata Resultset Caching Disabled

If you choose not to use any caching options, when working with your charts, the requests are sent directly to the data source:


Figure 6