Zoomdata Version

Managing HDFS Connectors

Zoomdata offers connection to Cloudera’s open source Apache Hadoop platform - CDH (Cloudera Distributed Hadoop)*. CDH provides unified batch processing, interactive SQL, interactive search, and role-based access controls. In addition, it offers enterprise-grade continuous availability. Specifically, Zoomdata connects to CDH’s fault‐tolerant storage system called the Hadoop Distributed File System (HDFS).

By default, the HDFS connector is not included with Zoomdata. You or your administrator need to download and enable it before configuring the connector.

The table below lists information on the features that are supported by CDH-Cloudera:

Supports Distinct Count? Yes
Supports Live Mode/ Playback? No
Supports Group-by Time? Yes
Supports Multi Group-by Charts? Yes
Supports Histogram? Yes
Supports Box Plot? No
Custom SQL Capable? No
Supports Last Value? No
Supports Partition? No

CONFIGURING your hdfs connections

You need to define the connection source for Zoomdata to be able to access the data source. Perform the steps below.

  1. From the Remote File Settings list, select the number of entries to be displayed in the file preview.

  2. Specify the path to your remote file that you want to upload into Zoomdata.
  3. Select the Read Headers checkbox to use the first row of your data source as the column names.
  4. Specify the value separator that is in your data source in the corresponding field. Standard separators include commas (,) and semi-colons (;).
  5. Select Preview . From the Entries list, select the number  of entries to be displayed in preview.
  1. In the Preview section, you can configure fields properties. Click Next .

Fields Page

The Fields page lets you configure attribute options, create custom labels for the fields in your data source, manage the Volume metric, and work with Calculations.

  1. Determine whether the field should be visible or not to the user.
  2. Create unique label names, as needed, for each Label field.

Refresh settings for HDFS

The Refresh page lets you schedule asynchronous jobs to update the source metadata. 

For version 2.6, scheduled reloads of newly added data is not supported for the HDFS connector. To add new data, do the following:

  1. Navigate to the Fields page of your data source.
  2. Enabled the Refresh Fields option. This forces the connector to reload any new data discovered at the level of the data source.
  3. Save your changes.

Charts Page

On the Charts page, you can:

  1. Edit Global Default Setting
  2. Select the Standard and, if available, Custom chart styles to be used with the data source
  3. Set default parameters (group, sub-group, colors, sorting, and so on) for each chart style
Learn more about how to customize a chart .

Select Finish to save your changes. Once your data connection has been established, it will be listed under the My Data Sources section of the page.

Was this topic helpful?