Impala, the SQL analytic engine shipped with Cloudera Enterprise, is a fully integrated, state-of-the-art analytic database architected specifically to leverage the flexibility and scalability of Apache Hadoop, which may contain many types of information and content including click stream, web and call center logs, and ID scans. Although most closely associated with Cloudera, Impala also ships with other Hadoop distributions including MapR, Oracle, and Amazon.
The Impala platform brings scalable parallel database technology to Hadoop, enabling users to issue low-latency SQL queries to big data stored in HDFS and Apache HBase without requiring data movement or transformation.
With Impala came the Parquet columnar data storage format, which stores data more efficiently than row-based formats in HDFS. Although writing Parquet files means you need to determine the schema (tables, columns) in advance and write the data in a specific way, the upside is much faster analysis.
Impala enables analysts and data scientists to perform real-time, interactive analytics on data stored in Hadoop via SQL or business intelligence tools.
Zoomdata was one of the first certified Impala big data analytics and visualization software tools, and the results of this collaboration have been dramatic. While legacy BI tools use JDBC or ODBC to query Impala as if it were a relational database, Zoomdata connects to Impala via native APIs and understands the Parquet partitioning scheme.
It uses this information to break up single logical queries into multiple micro-queries. Micro-queries submitted to Impala return at different points in time. Zoomdata displays a preliminary visualization as soon as the first micro-query returns and then sharpens the visualization as additional micro-queries complete. The result: much faster response time, analysis, and insights.
The video below shows Zoomdata using our micro-query sharpening approach on Impala to analyze and visualize almost instantly a billion rows of sales transactions data from Parquet files. Micro-queries and data sharpening optimize the user experience.