Big Data Blending & Fusion

Blended Data Analysis

Interacting with billions of rows of data in seconds from a single source is exciting, but big data exploration gets really interesting when you join data from multiple sources -- without needing to move it into a data warehouse or mart. Maybe you have application usage data in Hadoop, but you’d like to enrich that data with some customer demographic reference information or transactional data stored in an Oracle database. Or suppose you have a stream of sensor readings and you want to perform calculations across that real-time stream and include historical metrics. Or you might have product reviews indexed in Elasticsearch that you’d like to correlate with product purchase history in an enterprise data warehouse.

Do It Yourself: Fuse Data From Multiple Sources

Zoomdata Fusion

All of these scenarios require data blending across multiple sources to get the best insights. And get them fast while increasing the intelligence of your big data analytics efforts while improving the user experience.

Not Your Father’s Data Blending

Extract-Transform-Load

In the past, to blend and analyze data from multiple sources required two steps. First, you would need to create data warehouses or data marts, then physically copy and aggregate data into them from the original systems. This two-step process has several limitations:

  • Data freshness -- warehouses and marts are usually updated on a batch schedule, nightly or at best a few times per day.
  • Summary data only -- it's too expensive and slow to copy all of the data, so only aggregate summaries of data are moved into the warehouse. But what happens when you want to use your dashboard to drill through to details such as underlying transactions, events and observations?
  • Limits Analytics -- by aggregating data into a warehouse you make decisions about what and how data gets moved, inherently constraining the range of questions that business users can ask of the data. New questions may force a time-consuming and expensive re-engineering of schema and batch integration processes.
  • Expensive -- complex data integration (ETL) projects are expensive in terms of time and human resources. Using a virtualized data blending approach can determine whether a specific ETL project is worth undertaking.

In addition, the optimum performance of many mobile applications requires blending data from multiple sources.

Zoomdata Fusion: Join Multiple Data Sources without Moving Data

Zoomdata Fusion

The availability of modern, very scalable in-memory data storage and processing frameworks such as Apache Spark have made it possible to take a radically fresh approach to data blending. Zoomdata Fusion makes multiple sources, including relational data sources, appear as one data set without physically moving the data to a common data store. Best of all, everyday users and business analysts can join sources without having to wait for a data architect to set it up.

Once defined, a fused source can be used like any other source in Zoomdata. All the interactive visualizations are available for a blended source. Zoomdata Fusion automatically determines when and how to access the different sources to derive a given visualization without the end user having to worry about how it works. Fused sources can also be used within dashboards and embedded applications, providing users with seamless access to disparate data.

Zoomdata comes bundled with Apache Spark pre-configured out of the box, or if you already have a Spark cluster Zoomdata can be configured to use your existing Spark cluster.

Featured Resources

Big Data Blending & Fusion

With Zoomdata Fusion, you can quickly combine, blend, and interact with multiple big data sources without moving data.

Contact

Sales:
+1-571-279-6166

General Inquiries:
+1(571-279-6000)

[email protected]