Zoomdata Version

Visualization Cache

Overview

Zoomdata enables users to easily create and share dashboards they author with other users across their enterprise. Some of these dashboards might be accessed or viewed by large numbers of users. In addition to a single dashboard being viewed by a large number of users, multiple dashboards often share some common visualizations which in combination are made available to large numbers of users.  

The Visualization Cache provides for performance improvements and shortened response times in scenarios involving a large number of concurrent users viewing the same visualizations. The Zoomdata Visualization Cache stores frequently used subsets of query results data in memory. In many cases, it avoids the need to re-query the data sources for a visualization.

Cached data is shared between users only when they have the same data access permissions and security context.

User Experience

The Visualization Cache is transparent to the end user.  End users will not know if data was served from the cache or directly from the underlying data source. An end user can however "force" Zoomdata to bypass the Visualization Cache and query the underlying data source by selecting the "Force Refresh" action on a Zoomdata dashboard.

Datasource administrators can clear data caches for a specific datasource by using the "Clear Cache" action on the “My Sources” page.  Administrators can also clear the data caches for a datasource on a recurring basis using the scheduled refresh option.

In addition to the above mentioned options for clearing or bypassing the cache, the cache entries associated with a datasource are cleared whenever a datasource administrator modifies and saves the definition of a datasource.

Current implementation of the Visualization Cache is an in-memory cache and as a result, the Cache gets purged when Query Engine is restarted.

Configuration

Enabling the Visualization Cache

System administrator can turn the Visualization Cache on or off by changing the vis-data-cache-enabled toggle value to true or false on the Supervisor Advanced settings page.

Switching the setting on and off takes effect immediately and does not require the Query Engine to be restarted.

When the setting is turned off the query engine will ignore the Visualization Cache for query processing. Turning the setting back on makes the data stored in the cache “active” again.

Previously cached data is not automatically deleted when the setting is toggled off.  It is only deleted from memory when the query engine service is restarted.

Additional Parameters

There is a set of parameters and properties that control the behavior of Visualization Cache.

It is strongly recommended to consult with Zoomdata customer support before changing the default settings.

Parameter Description
topology.cache.timeout Specifies the time to live (TTL) for a cache entry. The TTL value is set in minutes, with for the default setting of 60 minutes.
topology.cache.max.size

Specifies the maximum number of elements stored in the cache. When the cache size parameter is exceeded, cache entries are evicted based on usage frequency or recency as described in the Eviction Policy section. To disable the setting topology.cache.max.size should be set to -1. The default value of topology.cache.max.size is 1000.

topology.cache.max.weight

Defines maximum weight of all entries in the cache. Weight is not exact accurate values but rather an estimation based on number of records in visualization data response. When topology.cache.max.weight value number is exceeded, cache entries are evicted based on usage frequency or recency as described in the Eviction Policy section. To disable the setting topology.cache.max.weight should be set to -1. The value for the parameter should be a positive number greater than 100. If a values less than 100 is specified, 100 is used.

Weight- and size-based limits are mutually exclusive, only one algorithm can be used at any given time. Weight-based limit has higher priority than size-based limit. When it is specified, the size-based limit is ignored. It’s recommended to leverage the weight setting only when the cached data entries of drastically different sizes need to be stored. Setting the topology.cache.max.size too high can result in high sporadic CPU consumption due to java garbage collection process. Actual memory required by Visualization Cache can not be calculated precisely, but generally it has a linear dependency on the size setting. If the amount of memory required for storing elements in the cache exceeds the amount of memory allocated for the process, the out-of-memory exception is raised.

DEEPER DIVE

Cache Entries

A cache entry contains the data required to display a visualization. The cache entry is populated only once all events for a visualization have been received.

The Visualization Cache keeps data for successful requests only. Queries which resulted in errors or produced no data (empty result set) are not cached.

The Visualization Cache stores the websocket messages which provide the necessary data for a visualization, these include the data and metadata events, such as VIEWPORT, TIMELINE, START_VIS_DONE, DIRTY_DATA, NOT_DIRTY_DATA.

Redundant websocket messages (such as DIRTY_DATA progress messages or sharpening DATA messages) are not saved to cache.

Each cache entry is associated with a unique cache key.

Cache Key

A cache key is generated when a new query is issued to the Zoomdata Query Engine. The query engine checks for the existence of a matching cache key and cache entry and leverages a match if it exists in formulating the response to the query. There are two different types of cache keys tracked by the Visualization Cache: raw data and aggregated data cache keys.  A generated cache key takes into account the user's security context and other attributes that impact the potential results for a query.

The following context is used to calculate cache keys:

  • Source id
  • Selected raw filters
  • Timebar filter
  • User attributes
  • Forced Filters for current user and source
  • Hash code of all used derived fields
  • Hash code of all used keysets
  • Hash code of all used fused attributes

In addition to the items listed above, keys for aggregated visualization also include:

  • Selected metrics
  • Dimensions (selected groups, sorts, offset and limit)
  • Selected group filters
  • Hash code of all selected formulas

While keys for raw visualizations leverage context about the:

  • Selected fields
  • Selected sort by settings, offset and limit
  • Distinct flag

Excluded from Cache

The following queries are never cached or served from Visualization Cache:

  • Queries for Play mode.
  • Data export requests and queries used for keyset creation
  • Queries with time-based raw filters using the NOW() time preset specified on the timebar or via regular filters
  • Pivot visualization data with more than 200 columns.
  • Force refresh queries

Eviction Policy

Over time the cache may grow to a point where no additional entries can be stored.  The eviction policy is an algorithm that controls the approach Zoomdata will take to replace older cache entries with more recent cache entries. Zoomdata Visualization Cache is leveraging the Window TinyLfu algorithm.

Viewing Cache Statistics

The Zoomdata Query Engine service exposes visualization cache statistics through the following REST endpoint:

http://{query-engine-host}:{query-engine-port : default 5580}/service/metrics_sorted

The following metrics are exposed:

  • cache.visdata.evictions
  • cache.visdata.hit.ratio
  • cache.visdata.requests
  • cache.visdata.size
  • cache.visdata.totalHits
  • cache.visdata.totalMiss

Logging Cache Activities

The Visualization Cache activities performed by the query engine are logged using the Zoomdata activity logging mechanism. You can refer to the Activity Logging article for more information.

 

Was this topic helpful?