Nearly synonymous with big data and invaluable for big data analytics, Apache Hadoop has proven itself across hundreds of organizations as a scalable, fault-tolerant, distributed platform for the storage and analysis of very large data sets. The core components of the platform are the Hadoop File System (HDFS), a redundant distributed storage system for storing very large files distributed across the cluster, and a distributed processing framework based on Apache MapReduce or Apache Tez. Many enterprise data lakes are built using the HDFS. And it works well alongside other technologies like NoSQL databases and event streaming systems such as MapR.
The Hadoop ecosystem also includes several SQL-on-Hadoop software interfaces including Cloudera Impala, Apache Hive, Drill and Kudu, Spark SQL, and Presto, which provide convenient ways for the analyst to use BI tools on Hadoop. However, Hadoop is built on a batch processing framework not designed for interactive workloads, so the analytic performance of traditional BI tools on these interfaces tends to be slow and frustrating for the enterprise analyst and business user. Zoomdata solves this performance problem via innovative patented technologies such as micro-query based Data Sharpening, which enables response times in seconds versus minutes or hours.
Usually Zoomdata accesses big data in Hadoop via one of the earlier described SQL-on-Hadoop technologies. These analytic engines run on the Hadoop cluster and make data files look like tables that can be queried via SQL. Zoomdata connects to these engines with highly performance optimized Smart Connectors, which take advantage of the differences among them, generating SQL queries that use the full power of the Hadoop cluster and providing access to big data analytics.
As we've explained, Zoomdata excels as a Hadoop analytics and data visualization tool and can connect directly to HDFS as well as to SQL-on-Hadoop technologies. Beyond that, as users interact with data visualizations, Zoomdata takes the query to the data using patented Data Sharpening with micro-queries. This is critical to achieving speed of thought performance on Hadoop. Unlike other BI tools, we don't build cubes or move data to another data store inside of or outside of Hadoop.
Of course, a modern data architecture can include traditional sources as well. With Zoomdata Fusion, you can enrich data from Hadoop with reference data from traditional sources such as relational databases or flat files. Zoomdata Fusion combines and integrates data from multiple sources — making it appear as a single source. You can also correlate real-time with historical data. And you can also correlate it with data from cloud sources such as Amazon Redshift or Google BigQuery.
At the core of Zoomdata's self-service, big data visualization capabilities is a streaming architecture that lets users access their data at the speed of thought, on demand. Data streams from Hadoop to the user through our stream processing engine and a WebSockets connection. Streaming delivers the fastest user experience for real-time and historical data, as can be seen through Data Sharpening. As soon as a user creates a visualization, Zoomdata instantly streams an initial result set. The visualization “sharpens” with data updates as each micro-query completes and more data becomes available. Users can act -- drill, filter, zoom -- without waiting for the fully sharpened results.
As you look under the hood of Zoomdata, you'll see how we leverage Spark, Impala, Kudu, and other technologies for hyperscale performance.
It’s also possible for Zoomdata to connect directly to the Hadoop File System. This option works much like connecting Zoomdata to flat files on a local file system. Zoomdata reads the files into Spark where they become fast, interactive, queryable data sets available to the full interactive visualization capabilities of Zoomdata.
Zoomdata is designed for visualizing big data and excels as a Hadoop visualization tool. See for yourself!