Source Connection Workflow
The steps to connect Zoomdata to your data source follow a fairly standard workflow that is applicable across most supported data sources. This article walks through this general workflow to help prepare you to successfully connect Zoomdata to your data source(s). Before starting, Zoomdata recommends you review the capabilities and limitations quick reference sheet to understand the features and options available to you based on the data source you are connecting.
Prior to connecting your data sources, review the following resources to understand how Zoomdata works with your source:
- Data Sources Quick Reference Sheet: Capabilities and Limitations
- Date and Time Formats Supported By Zoomdata
- How Zoomdata Caches the Data
- Overview of the Data Sources Page
The number of steps that you have to complete varies slightly depending on the type of data source being connected. Figure 1 illustrates the general steps:
Connecting a New Data Source
Select the Settings button ( ) to open the Data Sources page.
To connect a new source to Zoomdata, select the appropriate icon from the list of supported data sources shown in Figure 2.
You will navigate to the first screen in the connection process: the General page. Review each page listed below to help you understand the necessary configurations and settings needed.
First, create a unique name to identify this data source and, optionally, add a description. The Source Name field is required (see Figure 3).
- Search Function : enable or disable; if enabled, the search box will be displayed on every chart for this data source (as shown in Figure 4)
- Request Handler : a Solr-specific plugin that defines a logic when executing a search request (available in both Cloudera Search and Solr sources)
In this step, specify the connection details and, if applicable, authentication credentials in order to connect Zoomdata to the data source (see Figure 5). The fields will vary depending on the selected data source. Once all the information has been supplied, select the Validate button to substantiate the connection. If the connection is successful, you will be able to continue to the next page in the connection process. Save your settings.
Optionally, if you have access to already-validated connections, you can use one of them (see Figure 6).
PAGE: Tables (Indices, Collections)
The settings on this page differ depending on the data source you want to add. The table below lists the options available for each connector.
|Data Source||Schema||Object type||Custom SQL Support||Additional Filters|
|Hive on EMR||✔||Table||✔||-|
|Hive on Tez||✔||Table||✔||-|
In this step, you have to select the table from the available collection in the data source:
- Schema —In this list, all the available schemas for the data source are displayed. When you select a schema, all the corresponding tables that fit this criteria will display. If there are many results, use search to quickly locate the desired table (see Figure 7).
- Table (index or collection) —Select the desired table (see Figure 8).
—Available for certain data sources; if you want to use specific fields from the table, you can run SQL query to get them listed (see Figure 9).
Zoomdata will wrap your SQL query into a SELECT statement. If specific statements inside the wrapped query are not supported by your data source, the query will not be executed.
- Additional filters —For specific data sources (for example, SendGrid or Google Analytics), you can specify the time period for which you want to use the data (see Figure 10).
- Fields —When you select a table, the list of fields and their types (as defined in the database) is displayed. By default, all of them are selected. You can clear the checkboxes near the specific fields to exclude them (see Figure 11).
- Records to Sample —If it is not possible to define the fields types from the data source, you can define them via sampling. Select a table and specify the number of records from the data source to be analyzed. The type of data in each field will be defined. By default, 100 records are sampled. Change this value as needed.
- Preview —This section contains the preview of your dataset (first 10 records) (see Figure 12).
—If SparkIt feature is configured for your data source, you can enable it.
When enabled, raw data is copied into memory.
—You can enable or disable caching of aggregated query results for your data source.
If this option is enabled, the results will be stored in Spark.
—If you enable this option, Zoomdata will search the distinct values in the metadata store (MongoDB). This will speed up retrieving the values for filtering.
You can configure a scheduled job to refresh the distinct values in the metadata store.
Otherwise, each time Zoomdata will query the data source directly.
On this page, you can configure the settings for the fields from the tables (indices or collections) that you have selected in the previous step. These fields will be used as attributes and metrics on your chart.
In this step, Zoomdata samples the dataset (1,000 records) in order to define the fields' metadata (such as min and max values, cardinality, and more).
You can configure the following:
- Visible —By default, all the fields are visible. This means that you can visualize the data from these fields on your charts. If you do not want to use specific fields, clear the corresponding checkboxes.
- Label —By default, the names of the fields from your data source are used as labels. You change them in the Label column.
- Type —When you select tables (indices or collections), each field type is defined by Zoomdata. The default data types are displayed in the 'Type' column. You can change them, if required.
- For metrics, you can define the default value: SUM, AVG, MAX, MIN
For the TIME data type, you can set date and time patterns. If required, you can define the time and date pattern by selecting the
item from the list and specifying it in the corresponding field. Otherwise, you can proceed with the
You can also set the default time granularity for the selected field. To do so, select the required item (Second, Minute, Hour, Day, Week, Month, and Year) from the corresponding list.
- Distinct Count — Enable or disable distinct count for the fields.
—You can set the range of data that will be
available in the filter
for all data types.
- To set the range of values that will be available in the filter for the INTEGER, MONEY, and NUMBER types, click Custom Range and specify 'Min' and 'Max' values (see Figure 14).
- Info —Click the icon to view additional info about the data element. The Info pane will be displayed listing the settings (see Figure 16).
It contains the Refresh button. You can refresh the metadata of the selected field by clicking this button (see Figure 17).
In this step, you can configure the data refresh settings for your data source.
As the last step of your data source configuration, define the default settings for each chart type . When you visualize the data from a data source, the default settings are applied to a chart.
Saving Your Connection
Once you have completed the process, click Save . You will receive a confirmation that the connection was successfully saved and be taken back out to the main Data Sources page. You should see your new connection listed in the My Data Sources section of the page.