Three years in the making, Apache Kudu is an open source complement to HDFS and HBase. It is designed to complete the Hadoop ecosystem storage layer, enabling fast analytics on fast data. A Kudu cluster stores tables that look just like tables from relational (SQL) databases. This simple data model makes it easy to port legacy applications or build new ones. Tables are self-describing. You can use standard tools like SQL engines or Apache Spark to analyze your data. Its random access APIs can also be used in batch for machine learning analytic use cases.
Working with a combination of streaming and historical big data at scale is challenging for data-driven enterprises doing analytics on top of Hadoop, regardless of the distribution they're using -- including Cloudera and Hortonworks. Organizations often resort to parallel infrastructures. For example, they persist real-time or most-recent data -- such as IoT or time series data -- in HDFS using Avro and periodically (daily) convert the most recent data into Parquet format for analytic queries. The real-time analysis runs against Avro storage, but the historical data queries need to run against Parquet for fast analytics.
Using Kudu eliminates the need for parallel analytic infrastructures. Instead, a purpose-built storage subsystem handles fast inserts and columnar scans for analytics. Kudu integrates with Impala, and Zoomdata connects to Impala tables backed by Kudu storage. Then Zoomdata can monitor fast inserts of data in real-time. When users see something unusual happen in the data stream, they can use Zoomdata’s Data DVR to pause, rewind, and replay the stream to see what happened.
What’s more, Zoomdata can point to the same table and run queries over the full data set. Zoomdata runs analytic queries on the fly, so as soon as data is inserted into Kudu it’s available to users for visual data analytics. There’s no latency due to batch processing to convert storage formats. Zoomdata’s Data Sharpening technology presents results to the user in seconds, querying even huge Kudu tables with hundreds of millions to billions of rows, sharpening the image as the query completes.
This mix of real-time and analytic queries is made possible through the Zoomdata user experience and the simplified, underlying storage architecture provided by the open source Kudu data store.