Connecting to Data Sources
You can connect to a variety of data sources within Zoomdata. If you are new to Zoomdata, review the following topics to better understand how the data is cached and what time formats are supported by Zoomdata.
By default, your environment comes pre-configured with certain connectors enabled. These connectors are displayed on the Sources page. However, additional connectors are available. For more information about what connectors are available in version 2.6, as well as general information about each connector, see the Data Sources Reference Sheet.
Certain connectors require a JDBC driver, which is obtained through a separate download. This allows to you select and add a driver that meets your operation needs or certain policies. The following connectors need a JDBC driver to be installed before configuring and connecting:
|Zoomdata supports only underscores and dashes in data store field names. No other special characters or white space are supported. If your data store uses special characters other than underscores and dashes in field names, please remove them before attempting to create a data source configuration.|
Connecting a New Data Source
At a high level, the steps for setting up a new data source are as follows:
- General configuration
- Connection configuration
- Table configuration
- Fields configuration
- Refresh configuration
- Select Settings and then select Sources.
- From the Sources page, select the data source you want to connect to. If the data source you want is not available, it may be due to the fact that the data source is not enabled. For more information, see Managing Connectors.
- On the General page for the connector, enter a unique name in the Source Name field. You can also add a description, if desired.
- Select Next.
- On the Connection page, you can enter information for a new connection or select a validated connection.
- To start a new connection, do the following:
- Select Input New Credentials.
- Specify the connection details and, if applicable, authentication credentials in order to connect to the data source. The fields vary depending on the selected data source.
- Select Validate to substantiate the connection . If the connection is successful, you are able to continue on in the connection process. Save your settings.
- To use an existing connection, do the following:
- Select Use Validated Credentials.
- From the drop-down list, select the connection you want to use.
- Select Next.
- On the Tables page, select the table you want to use from the list of available options. Each data source and connector has different options.
- For more information about what is available for the data source you have select, see that particular topic for more information.
Select the table from the available collection in the data source:
- Table (index or collection)
- Custom SQL —available for certain data sources. If you want to use specific fields from the table, you can run SQL query to list them.
- You can enable or disable caching of aggregated query results for your data source. Select On or Off. For more information about how the data is cached, see How Zoomdata Caches the Data.
- If you are using Solr, ElasticSearch, or Cloudera Search as your data source, you can enable Lookup Values for your connection to reduce the retrieval time for filtering values. You can configure a scheduled job to refresh the distinct values in the metadata store.
Otherwise, the data source is queried directly each time.
- Once you have selected the table you want to use, select the fields and types you want displayed. By default, all of them are selected. You can see a sample of your first 10 records of your data in the Preview window.
- For search-based data sources, there is additional option to configure a Request Handler : a plug-in that defines a logic when executing a search request (available in both Cloudera Search and Solr sources).
- Select Next.
- On the Fields page, you configure the settings for the fields from the tables for your data source. These fields are then used as attributes and metrics. To define the fields metadata, 1,000 records of data are sampled.
Configure the settings as follows:
- Visible —by default, all the fields are visible. This means that you can visualize the data from these fields on your charts. If you do not want to use specific fields, clear the corresponding checkboxes.
- Field ID -- The name of the field in the data from the data store. Note that Zoomdata supports only underscores and dashes in data store field names. No other special characters or white space are supported. If your data store uses special characters other than underscores and dashes in field names, please remove them and then try to create this data source configuration again.
- Label —by default, the names of the fields from your data source are used as labels.
- Type —when you select tables, each field type is defined by Zoomdata. The default data types are displayed in the Type column. You can change them by selecting another option from the drop-down list.
You can set the maximum character length of attribute fields. By default, it is limited to 200 characters and the field is recognized as Attribute. If this limit is exceeded, the field is recognized as Text field. Add or modify the
FieldsTypeDetectorproperty in the
zoomdata.propertiesfile as required.
- If you are using Cloudera Impala, Hive on Tez, Spark SQL, or Apache Drill as your data source, the Partitioned column shows if fields within your source are partitioned.
- For metrics, you can define the default value: SUM, AVG, MAX, MIN
Set the time pattern and granularity for the fields of the
type. If required, you can define the time and date pattern by selecting the
item from the list and specifying it in the corresponding field. Otherwise, you can proceed with the
- You can also set the default time granularity for a particular field by selecting the required item (Second, Minute, Hour, Day, Week, Month, and Year) from the corresponding list.
- You can apply time zone labels for the time fields within your data source.
- Select the current label to access the Time Zone pane.
- Then either manually input the desired time zone in the text field or select from the drop-down list.
- Check the box if this label should be applied to all time fields for the data source.
- Apply your changes.
- Choose to enable or disable distinct count for the fields.
For Filter Display, you can set the range of data that is
available in the filter
for all data types.
For fields that are of type ATTRIBUTE, you can select the Only Allow Custom Values checkbox.
If this checkbox is selected, you have to enter the names of the attributes manually while working with the Filters pane , since the list of attributes is not displayed.
- To set the range of values available in the filter for the INTEGER, MONEY, and NUMBER types, select Custom Range and specify 'Min' and 'Max' values. You can also set the TIME type range by specify the 'To' and 'From' range.
- In the Configure column, you can define the type of number format you want to use for that particular field. For information and steps, see Number Formatting for Data Sources.
- To set the refresh out your metadata, select the Statistics column which is available for all sources except Upload API and Flat Files. Select Refresh.
- Once you are finished setting up the refresh for certain fields, select Next.
- On the Refresh page you can step up and configure the refresh rates for your data. For detailed steps and more information about the scheduler, see Using the Zoomdata Scheduler.
- Select Next.
- As the last step of your data source configuration, define the default settings for each chart type . When you visualize the data from a data source, the default settings are applied to a chart.
- Select Save. Once your data source is saved, you are brought back to the main data source page. Your new connection is listed under My Data Sources.
Was this topic helpful?