Zoomdata Version

Connecting to CDH-Cloudera

Zoomdata offers connection to Cloudera’s open source Apache Hadoop platform - CDH (Cloudera Distributed Hadoop)*. CDH provides unified batch processing, interactive SQL, interactive search, and role-based access controls. In addition, it offers enterprise-grade continuous availability. Specifically, Zoomdata connects to CDH’s fault‐tolerant storage system called the Hadoop Distributed File System (HDFS). Keep in mind that the connection to CDH requires Apache Spark (which is automatically enabled in the Zoomdata environment).

HDFS is compatible with CDH versions 4 and 5.

CONFIGURING THE HDFS CONNECTOR

To configure the connector, perform the following steps:

  1. Log into Zoomdata.
  2. Click the Sources menu item.

Figure 1

  1. Click the HDFS connector icon.
  2. Specify the name of your source and add a description (if desired).

Figure 2

  1. Click Next .
  2. On the File Path page,  specify the path to your remote file that you want to upload into Zoomdata.
    To use the first row of your data source as the column names, select the Read Headers checkbox.
    Specify the value separator that is in your data source in the corresponding field. Standard separators include commas (,) and semi-colons (;).
    Click Preview . From the Entries list, select the number  of entries to be displayed in preview.

Figure 3

  1. In the Preview section, you can configure fields properties. Click Next .
When you click Next it may take some time to load the dataset into memory depending on its size (from several minutes to possibly over half an hour).
  1. On the Fields page, create unique label names, as needed, for each Label field. If necessary, change the Type and Default options, select the checkboxes in the Distinct Count column. If you do not want to use specific fields from the data source, clear the checkboxes in the Visible column. Configure Filter Display settings for the required fields. Click Next .
    You can also add calculations in the Calculations section.
    Click Next .

Figure 4

  1. On the Refresh page, you can schedule asynchronous jobs to refresh fields in your data source. Refer to Using the Zoomdata Scheduler article for more information.
  2. On the Charts page, you can enable the charts that will be available for the data source and edit the settings for your charts.
    That is, select the styles that will be available for the data source, change the global default settings, and more.
    Learn more about how to customize a chart .
    ​Click Finish to save your changes.

Figure 5