Zoomdata Version

Source Connection Workflow

The steps to connect Zoomdata to your data source follow a fairly standard workflow that is applicable across most supported data sources. This article walks through this general workflow to help prepare you to connect Zoomdata to your data source(s) successfully. Before starting, Zoomdata recommends you review the capabilities and limitations quick reference sheet to understand the features and options available to you based on the data source you are connecting.

This article provides a high-level overview of the common configuration options. For more specific information, refer to the specific data source article available in this Administrator Guide.

Preparation

Prior to connecting your data sources, review the following resources to ensure that Zoomdata is able to connect to your source:

Workflow Overview

The number of steps that you have to complete varies slightly depending on the type of data source being connected. Figure 1 illustrates the general steps:


Figure 1

Connecting a New Data Source

Select the button to open the Data Sources page.

To connect a new source to Zoomdata, select the appropriate icon from the list of supported data sources (see Figure 2).


Figure 2

Navigate to the first screen in the connection process: the General page. Review each page listed below to help you understand the necessary configurations and settings needed.

General Page

First, create a unique name to identify this data source name and, optionally, add a description. The Source Name field is required (see Figure 3).


Figure 3

Connection Page

In this step, specify the connection details and, if applicable, authentication credentials in order to connect Zoomdata to the data source. The fields will vary depending on the selected data source. Once all the information has been supplied, select the Validate button to substantiate the connection (see Figure 4). If the connection is successful, you will be able to continue to the next page in the connection process. Save your settings.

The connection information is also stored in a connections library so that admins can manage all the connections in one convenient location. Review the Managing Connections section in the article Overview of the Data Sources Page .


Figure 4

Optionally, if you have access to already-validated connections, you can use one of those (see Figure 5).


Figure 5

Tables (Indices, Collections) Page

The settings on this page differ depending on the data source you want to add. The table below lists the options available for each connector.

Data Source Schema Object type Custom SQL Support Additional Filters
Flat Files - - - -
Amazon S3 - - - -
Amazon Redshift Table -
Aurora Table -
Apache Phoenix Table -
SparkSQL Table -
HDFS - - - -
Cloudera Impala Table -
Cloudera Search - Collection - -
Elasticsearch Index - -
Google Analytics - Category -
Hive on Tez Table -
Marketo - Collection - -
MemSQL Table -
Oracle Table -
PostgreSQL Table -
SalesForce - Object - -
SendGrid - Collection -
Solr - Table - -
SQL Server Table -
Zendesk - View - -
Apache Drill Table -
Vertica Table -
Presto for Cassandra Table -
Teradata Table -
BigQuery Table - -

In this step, you have to select the table from the available collection in the data source:

  1. Schema —in this list, all the available schemas for the data source are displayed. When you select a schema, all the corresponding tables that fit these criteria will display. If there are many results, use search to quickly locate the desired table (see Figure 6).

    Figure 6
  2. Table (index or collection) —select the desired table (see Figure 7).

    Figure 7
  3. Custom SQL —available for certain data sources; if you want to use specific fields from the table, you can run SQL query to get them listed.
    Zoomdata wraps your SQL query into a SELECT statement. If specific statements inside the wrapped query are not supported by your data source, the query will not be executed (see Figure 8).


Figure 8

  1. Additional filters —for specific data sources (for example, SendGrid or Google Analytics), you can specify the time period for which you want to use the data (see Figure 9).

    Figure 9
  2. Fields —when you select a table, the list of fields and their types (as defined in the database) is displayed. By default, all of them are selected. You can clear the checkboxes near the specific fields to exclude them (see Figure 10).

    Figure 10

Records to Sample —if it is not possible to define the fields types from the data source, you can define them via sampling. Select a table and specify the number of records from the data source to be analyzed. The type of data in each field will be defined. By default, 100 records are sampled. Change this value as needed.

  1. Preview —this section contains the preview of your data set (first 10 records ) (see Figure 11).

    Figure 11
  2. For search-based data sources (for example, Solr), there is additional option to configure a Request Handler : a Solr-specific plug-in that defines a logic when executing a search request (available in both Cloudera Search and Solr sources).
  3. Spark It —if Spark It feature is configured for your data source, you can enable it. By default, SparkIt is disabled. Access the article How Zoomdata Uses Apache Spark (in the Admin Guide) for more information about this option .
Spark It capability is planned for deprecation in a future release.
  1. Caching —you can enable or disable caching of aggregated query results for your data source.
    If this option is enabled, the results will be stored in Spark. A ccess the article How Zoomdata Caches the Data (in the Admin Guide) for more information about this option.
  2. Lookup values —if you enable this option, Zoomdata will search the distinct values in the metadata store (PostgreSQL). This will speed up retrieving the values for filtering.
    You can configure a scheduled job to refresh the distinct values in the metadata store.
    Otherwise, each time Zoomdata will query the data source directly.
This option is available for the following data sources: Solr , Elasticsearch , and Cloudera Search .

Fields Page

On this page, you can configure the settings for the fields from the tables (indices or collections) that you have selected in the previous step. These fields will be used as attributes and metrics on your chart.

In this step, Zoomdata samples the dataset (1,000 records) in order to define the fields' metadata (such as min and max values, cardinality, and more).

If the data type is not defined in the database, it is defined in this step.

You can configure the following:

  • Visible —by default, all the fields are visible. This means that you can visualize the data from these fields on your charts. If you do not want to use specific fields, clear the corresponding checkboxes.
  • Label —by default, the names of the fields from your data source are used as labels. You change them in the Label column.
  • Type —when you select tables (indices or collections), each field type is defined by Zoomdata. The default data types are displayed in the Type column. You can change them, if required.
You can set the maximum character length of attribute fields. By default, it is limited to 200 characters and the field is recognized as Attribute. If this limit is exceeded, the field will be recognized as Text field.
Modify the FieldsTypeDetector​ property in the application.properties file as required.
  • The Partitioned column shows if the field is partitioned. The following connectors support partitioning: Cloudera Impala, Hive on Tez, Spark SQL, Apache Drill, and BigQuery
  • Configure
    • For metrics, you can define the default value: SUM, AVG, MAX, MIN
    • Set the time pattern and granularity for the fields of the Time type. If required, you can define the time and date pattern by selecting the Pattern item from the list and specifying it in the corresponding field. Otherwise, you can proceed with the Default pattern.
      You can also set the default time granularity for the selected field. To do this, select the required item (Second, Minute, Hour, Day, Week, Month, and Year) from the corresponding list.


Figure 12

  • 'Time' fields containing granularities down to the hours, minutes and seconds may also apply a time zone label. To apply a time zone: The time zone label will be displayed in charts and dashboards for this data source.
    1. Select the current label (most likely, UTC (UTC+00:00)) to access the Time Zone pane.
    2. Then you can either (a) manually input the desired time zone in the text field or (2) select the drop down to choose one from the provided list.


Figure 13

  1. Tick the checkbox if this label should be applied to all time fields for the data source.
  2. Select the Apply button.
  • Distinct Count enable or disable distinct count for the fields.
  • Filter Display —you can set the range of data that will be available in the filter for all data types.
    • For fields that are of type ATTRIBUTE, you can select the Only Allow Custom Values checkbox.
      If this checkbox is selected, you will have to enter the names of the attributes manually while working with the Filters pane , since the list of attributes will not be displayed (see Figure 14).

      Figure 14
  • To set the range of values that will be available in the filter for the INTEGER, MONEY, and NUMBER types, click Custom Range and specify 'Min' and 'Max' values (see Figure 15).

    Figure 15
  • To set the range of values that will be available in the filter of the TIME type, click Custom Range and specify 'From' and 'To' time range (see Figure 16).

    Figure 16
Time is displayed in UTC.
  • Statistics —this column will be available for all data sources except created via upload API and flat files when you edit the data source details.
    It contains the Refresh button. You can refresh the metadata of the selected field by clicking this button (see Figure 17).

    Figure 17

While editing the data source settings, click Refresh Fields to refresh the list of fields available from the data source (that is, add new fields). Keep in mind that the metadata of the fields won't be refreshed (see Figure 18).

Figure 18

  • Info —click the icon to view additional info about the data element. The Info pane will be displayed listing the settings (see Figure 19).

    Figure 19

Refresh Page

In this step, you can configure the data refresh settings for your data source.

Charts Page

As the last step of your data source configuration, define the default settings for each chart type . When you visualize the data from a data source, the default settings are applied to a chart.

Saving Your Connection

Once you have completed the process, click Save . You will receive a confirmation that the connection was successfully saved and be taken back out to the main Data Sources page. You should see your new connection listed in the My Data Sources section of the page.