Zoomdata Version

Connecting to Impala

The Cloudera Impala™ connector allows users to visualize huge volumes of data stored in their Hadoop/HDFS cluster in real time and with no ETL.

Connection Instructions

The Zoomdata Server supports Impala v1.2.4 and later versions. Perform the following steps to configure the connector:

  1. Log into Zoomdata.
  2. Click the Sources menu item.

Figure 1

  1. Click the Cloudera Impala connector icon.
  2. Specify the name of your source and add a description (if desired). Click Next .

Figure 2

  1. On the Connection page, define your connection source. You can select an existing connection, if available, or create a new one. To create a new connection, select the Input new credentials option button and fill in required parameters which are Connection Name and a JDBC URL.
    In current Zoomdata version, you can connect to your Impala data source  using either simple user credentials authentication or Kerberos authentication with optional SSL encryption. This article describes how to connect to Impala using simple authentication. Refer to Connecting to Impala on Kerberized CDH or Connecting to Impala with TLS (SSL) articles for more details on the configuration.
    Zoomdata enables you to connect either to a single Impala node or to multiple nodes within a cluster.
    To connect to a single Impala node, specify a JDBC URL in the following format:
    jdbc:hive2://<impala_host>:<port>/;auth=noSas​l


    Figure 3

    In case Impala authentication is enabled, specify the necessary username and password.

    To connect to multiple Impala nodes, specify more than one JDBC URL separated by commas in the corresponding field. The URLs will be utilized in a round-robin fashion. However, if Zoomdata fails to reach at least one node from the specified list, then the connection won't be validated.

    Specify the user name and the password in case Impala authentication is enabled.

  2. Click Validate . After successful validation, the values will be saved. Click Next .

  3. On the Tables page, you can select the schema and the data collection or create a custom SQL query to get the required data that will be visualized on your charts.
    Select the schema and the collection in the Collections section. The selected fields will be displayed in the Preview section. If you don't want to use specific fields, clear the checkboxes in the Fields section.

Figure 4

To add a custom SQL query and retrieve the data, click Custom SQL and add your query. Click Preview . The results of your query will be displayed in the Preview section.
Click Next .

Zoomdata wraps your SQL query into a SELECT statement. If specific statements inside the wrapped query are not supported by your data source, the query will not be executed.
  1. By default SparkIt is disabled. You can enable it if required. Click Next .

  2. On the Fields page, you can create unique label names for the available fields in your data source. These labels will be displayed in the charts.

    Specify unique label names, as needed, for each Label field.

Figure 5

  1. If necessary, change the Type , Partition , and Default options and select the Distinct Count checkbox to enable this option . For the Filter Display column, you may configure a Custom Range, if available. Click Next .

  2. On the Refresh page, you can schedule asynchronous jobs to refresh fields in your data source. Refer to Using the Zoomdata Scheduler article for more information.

Figure 6

  1. On the Charts page, you can enable the charts that will be available for the data source and edit the settings for your charts. That is, select the styles that will be available for the data source, change the global default settings, and more. Learn more about how to customize the chart settings .

  2. Click Finish to save your changes.

Figure 7