Big Data Scalability & Query Pushdown

To accommodate large data volumes, any system needs to avoid moving data around unnecessarily. This is especially true for visual analytics. Unfortunately, many analytic systems require moving data from its original source into a cube, a lens, or another proprietary, intermediate data store before the user can query it.

One of the definitions of “big data” is it’s too big to move. To that end, one of the hallmarks of Zoomdata's approach is leaving the data where it is, and pushing queries down to the underlying data store where it’s possible.

Alex Woodie, Datanami*

This approach presents several problems. First and foremost, the approach has limited big data scalability. Intermediate data stores cannot handle the full volume and velocity supported by sources like Hadoop, MPP databases, and streaming infrastructures. Second, physically moving data imposes latency. Latency creates an unacceptable delay between the time data is available and the time users can ask a question. Business users  cannot afford to wait for insight. Finally, these data stores limit data access. They typically employ proprietary technologies that take custody of data and only make it available to specific access tools.

query pushdown bad example

Zoomdata’s Approach

Zoomdata pushes query processing to the source--taking the query to the data. When the user interacts with a visualization, Zoomdata generates queries and sends them to the source. Zoomdata generates SQL for SQL-on-Hadoop technologies like Impala and Hive. It generates search queries for data in Elasticsearch and Solr. Zoomdata generates native calls to NoSQL stores like MongoDB, pushing aggregation and filtering to the source as much possible. This “query pushdown” approach delivers big data scalability, which is another reason Zoomdata is so well suited to the modern data landscape.

query pushdown

Smart Connectors

Zoomdata pushes query processing through Smart Connectors to the original sources. These Smart Connectors leverage the native APIs of the underlying sources to access functionality not available through lowest-common-denominator interfaces like JDBC and ODBC. For example, Zoomdata leverages faceting and incremental result retrieval when querying search index sources. Zoomdata’s micro-query optimization leverages partitioning information from Impala and other distributed technologies.

As Zoomdata’s query planner seeks to push processing to the original sources, Smart Connectors also understand any limitations of the underlying source. For example, some sources support filtering operations but not aggregation, or vice-versa. In these cases, Zoomdata can compensate for functionality missing from the source by using the analytical capabilities of Apache Spark.



*Datanami, August 11, 2015:

Featured Resources

Big Data Scalability & Query Pushdown

Zoomdata pushes query processing to the source - taking the query to the data. Find out what Zoomdata can do for your big data today!


Sales: +1-571-279-6166

General Inquiries: +1(571-279-6000)