Data Fusion Processing
With data fusion, Zoomdata can perform Group By operations using fields that are available across tables. A variety of table structures residing in data repositories can be fused, including lookup tables, fact tables, star and snowflake schema structures. See Data Fusion Table Structures.
For example, if a table in one data store contains the IDs and address information of sellers and another table in a second data store contains IDs, events, and sales information for sellers, these disparate fields can be fused into one sellers table with the three fields joined and accessible (as shown below).
Using data fusion, you can join disparate data sources that are connected to Zoomdata. Multiple data sources (three or more) can be fused into a single Fusion data source.
To fuse these disparate sources together, you must join matching fields from the different data sources on the Editor tab of the Data Fusion data source configuration. This is the key step to data fusion and must adhere to specific rules. Joins are defined on the Editor tab of the Data Fusion data source configuration. See Data Fusion Join Rules and Creating a Fusion Data Source.
Joins are usually performed in-memory. However, if a data connector supports push down joins and the data to be joined comes via the same data source connection, Zoomdata pushes the join operation to the underlying data engines and allows those data stores to join the data instead. In addition, if the data connectors support push down joins, the Zoomdata engine intelligently pushes aggregate queries to the underlying data engines when joining data from different data sources as well, thus reducing the amount of data that needs to be processed. This capability is currently supported only for Impala and Hive data stores.
Because most joins are performed in-memory, a configurable limit has been placed on the number of records that can be processed from each joined source. This limit is initially set at 1,000,000 records per joined data source and can be configured by your Zoomdata administrator or supervisor using the
qe.zengine.edc.rows.limit property in the
query-engine.properties file. See Managing the Zoomdata Query Engine. When this threshold is exceeded, no data is shown on the charts containing the fused data and a message appears indicating that the threshold (maximum row number) is exceeded. If you find you are hitting this limit, use filters on the chart or dashboard to reduce the number of records processed and shown.
Once you have joined the necessary fields from the data sources and saved your Fusion data source, you can visualize and explore the fused data in charts and dashboards.
Was this topic helpful?