In the new enterprise data architecture, not all data will be stored in a single framework; disparate data sources are a fact of life. The real value of big data lies in combining multiple sources. But the traditional approach to this will not work with big data. Businesses can no longer wait to build a data warehouse, extract data from the original source, and move it into a centralized data store. Even if it’s a data lake or virtual data warehouse, if a data architect or some other specialist has to set it up before business users can ask questions, it’s a non-starter. The time to insight is simply too long.
And even if you had a single place to put all your data, you still can’t afford to simply move the data physically from its original big data store to some other intermediate store. The data transfer time alone imposes too much latency for the business, especially as data volumes grow to big data proportions.
Zoomdata Fusion makes multiple data sources appear as a single source without physically moving the data to a common data store. Zoomdata Fusion can combine data from relational and nonrelational sources, from real-time and historical sources, and from structured and unstructured sources.
There are two general approaches used for data federation (sometimes also called "data virtualization"). The first is a database-centric approach, which grew out of the database technology offered by RDBMS vendors. Of course, Zoomdata can leverage these technologies just like any other data source, so if you’re already using one of them you can continue to do so. But this approach requires a data architect or specialist to define a semantic layer across the sources--not something that’s within the expertise of business users and analysts.
The second approach is query-tool centered, sometimes called “data blending” or “data mashup.” These technologies allow end users to mashup multiple sources, but desktop tools have severe scaling limitations. Plus, they do not cover all types of big data, including search, NoSQL, and streaming sources.
Zoomdata Fusion provides the end user experience of the query tool approach, but with an underlying architecture that scales to big data proportions. Queries are pushed to original sources to minimize data movement. When intermediate datasets need to be joined, Zoomdata uses the speed and scale of Spark to perform cross-platform joins.