Source Connection Workflow
The steps to connect Zoomdata to your data source follow a fairly standard workflow that is applicable across most supported data sources. This article walks through this general workflow to help prepare you to connect Zoomdata to your data source(s) successfully. Before starting, Zoomdata recommends you review the capabilities and limitations quick reference sheet to understand the features and options available to you based on the data source you are connecting.
Prior to connecting your data sources, review the following resources to ensure that Zoomdata is able to connect to your source:
- Data Sources Quick Reference Sheet: Capabilities and Limitations
- Date and Time Formats Supported By Zoomdata
- How Zoomdata Caches the Data
- Overview of the Data Sources Page
The number of steps that you have to complete varies slightly depending on the type of data source being connected. Figure 1 illustrates the general steps:
Connecting a New Data Source
Select the Settings button ( ) to open the Data Sources page.
To connect a new source to Zoomdata, select the appropriate icon from the list of supported data sources (see Figure 2 ).
Navigate to the first screen in the connection process: the General page. Review each page listed below to help you understand the necessary configurations and settings needed.
First, create a unique name to identify this data source name and, optionally, add a description. The Source Name field is required (see Figure 3 ).
- Search Function: enable or disable; if enabled, the search box will be displayed on every chart for this data source (as shown in Figure 4 )
- Request Handler: a Solr-specific plug-in that defines a logic when executing a search request (available in both Cloudera Search and Solr sources)
In this step, specify the connection details and, if applicable, authentication credentials in order to connect Zoomdata to the data source. The fields will vary depending on the selected data source. Once all the information has been supplied, select the Validate button to substantiate the connection (see Figure 5 ). If the connection is successful, you will be able to continue to the next page in the connection process. Save your settings.
Optionally, if you have access to already-validated connections, you can use one of those (see Figure 6 ).
PAGE: Tables (Indices, Collections)
The settings on this page differ depending on the data source you want to add. The table below lists the options available for each connector.
|Data Source||Schema||Object type||Custom SQL Support||Additional Filters|
|Hive on EMR||✔||Table||✔||-|
|Hive on Tez||✔||Table||✔||-|
In this step, you have to select the table from the available collection in the data source:
—in this list, all the available schemas for the data source are displayed. When you select a schema, all the corresponding tables that fit these criteria will display. If there are many results, use search to quickly locate the desired table (see
—available for certain data sources; if you want to use specific fields from the table, you can run SQL query to get them listed.
Zoomdata will wrap your SQL query into a SELECT statement. If specific statements inside the wrapped query are not supported by your data source, the query will not be executed (see Figure 9 ).
—for specific data sources (for example, SendGrid or Google Analytics), you can specify the time period for which you want to use the data (see
—when you select a table, the list of fields and their types (as defined in the database) is displayed. By default, all of them are selected. You can clear the checkboxes near the specific fields to exclude them (see
Records to Sample —if it is not possible to define the fields types from the data source, you can define them via sampling. Select a table and specify the number of records from the data source to be analyzed. The type of data in each field will be defined. By default, 100 records are sampled. Change this value as needed.
—if SparkIt feature is configured for your data source, you can enable it.
B y default, SparkIt is disabled. A ccess the article How Zoomdata Uses Apache Spark for more information about this option .
—you can enable or disable caching of aggregated query results for your data source.
If this option is enabled, the results will be stored in Spark. Access the article How Zoomdata Caches the Data for more information about this option.
—if you enable this option, Zoomdata will search the distinct values in the metadata store (PostgreSQL). This will speed up retrieving the values for filtering.
You can configure a scheduled job to refresh the distinct values in the metadata store.
Otherwise, each time Zoomdata will query the data source directly.
On this page, you can configure the settings for the fields from the tables (indices or collections) that you have selected in the previous step. These fields will be used as attributes and metrics on your chart.
In this step, Zoomdata samples the dataset (1,000 records) in order to define the fields' metadata (such as min and max values, cardinality, and more).
You can configure the following:
- Visible —by default, all the fields are visible. This means that you can visualize the data from these fields on your charts. If you do not want to use specific fields, clear the corresponding checkboxes.
- Label —by default, the names of the fields from your data source are used as labels. You change them in the Label column.
- Type —when you select tables (indices or collections), each field type is defined by Zoomdata. The default data types are displayed in the Type column. You can change them, if required.
zoomdata.detect.type.attribute.max.lengthproperty in the
zoomdata.propertiesfile as required. For example:
- For metrics, you can define the default value: SUM, AVG, MAX, MIN
For the TIME data type, you can set date and time patterns. If required, you can define the time and date pattern by selecting the
item from the list and specifying it in the corresponding field. Otherwise, you can proceed with the
You can also set the default time granularity for the selected field. To do this, select the required item (Second, Minute, Hour, Day, Week, Month, and Year) from the corresponding list.
- Distinct Count — enable or disable distinct count for the fields.
—you can set the range of data that will be
available in the filter
for all data types.
For fields that are of type ATTRIBUTE, you can select the
Only Allow Custom Values
If this checkbox is selected, you will have to enter the names of the attributes manually while working with the Filters pane , since the list of attributes will not be displayed (see Figure 13 ).
- For fields that are of type ATTRIBUTE, you can select the Only Allow Custom Values checkbox.
To set the range of values that will be available in the filter for the INTEGER, MONEY, and NUMBER types, click
and specify 'Min' and 'Max' values (see
To set the range of values that will be available in the filter of the TIME type, click
and specify 'From' and 'To' time range (see
—this column will be available for all data sources except Amazon Kinesis, Twitter, Upload API, and flat files when you edit the data source details.
It contains the Refresh button. You can refresh the metadata of the selected field by clicking this button (see Figure 16 ).
While editing the data source settings, click
refresh the list of fields available from the data source (that is, add new fields). Keep in mind that the metadata of the fields won't be refreshed (see
icon to view additional info about the data element. The Info pane will be displayed listing the settings (see
In this step, you can configure the data refresh settings for your data source.
As the last step of your data source configuration, define the default settings for each chart type . When you visualize the data from a data source, the default settings are applied to a chart.
Saving Your Connection
Once you have completed the process, click Save . You will receive a confirmation that the connection was successfully saved and be taken back out to the main Data Sources page. You should see your new connection listed in the My Data Sources section of the page.