Zoomdata Version

Managing the Impala Connector

The Zoomdata Cloudera Impala™ connector allows you to visualize huge volumes of data stored in their Hadoop cluster in real time and with no ETL. Zoomdata supports Impala versions 2.5.0 - 2.12.0.

Before you can establish a connection from Zoomdata to Cloudera Impala storage, a connector server needs to be installed and configured. See Managing Connectors for general instructions and Connecting to Impala for details specific to the Cloudera Impala connector.

After the connector has been set up, you can create data source configurations that specify the necessary connection information and identify the data you want to use. See Managing Data Source Configurations for more information. After data sources are configured, they can be used to create dashboards and charts from your data. See Creating Dashboards and Creating Charts.

Zoomdata Feature Support

The Cloudera Impala connector supports all Zoomdata features, including Progress reporting. Progress reporting support allows the connector to report the progress of a running query. On the UI, this shows as Reading nn% in the upper left corner of a chart.

In addition, Cloudera Impala connectors can receive only a single distinct count field in a query.

Impala Authentication

Support is provided for passing along credentials for users with access privileges to Impala source. Delegation allows for Impala queries to be issued with the privileges from a specified user. This is available in the Connection page and is set as the Do As User field. See Enabling User Delegation and Applying User Delegation to a Connection.

Connecting to Impala

When setting up an Impala connection, you need to provide the following.

  1. Specify the JDBC URL. You can connect to your Impala data source using either simple user credentials authentication or Kerberos authentication with optional SSL encryption. Refer to Connecting to Impala on Kerberized CDH or Connecting to Impala with TLS (SSL) for more details on the configuration.

    Zoomdata enables you to connect either to a single Impala node or to multiple nodes within a cluster. To connect to a single Impala node, specify a JDBC URL in the following format:

    jdbc:hive2://<impala_host>:<port>/;auth=noSas​l

    To connect to multiple Impala nodes, specify the required JDBC URLs separated by commas in the corresponding field. The URLs will be utilized in a round-robin fashion. Keep in mind that such a connection will be valid as long as there is at least one available node. If all the nodes can not be reached, then the connection won't be validated.

  2. If Impala authentication has been set up, provide a user name and password.
  3. To allow for Impala user delegation, select the appropriate custom user attribute from the Do As User drop-down list (set up by the Zoomdata supervisor or administrator). This field basically allows Zoomdata to pass along credentials for the specified user with access rights to Impala. See Enabling User Delegation and Applying User Delegation to a Connection.
  4. Select Validate. If successfully validated, the connection is saved.

Impala Table Settings

Time-based fields can be configured for partitioning in an Impala data source configuration using the Partition column on the Fields tab of the data source configuration wizard. The following options are available:

  • No (partitioning to be done)

  • Date - this option is available for the Time field type. If you select this option, the list of the partitioned columns will be displayed in the Configure column.

  • Function - If you select this option, the list of the partitioned columns and supported MURMUR3_HASH function will be displayed in the Configure column.

Numeric and time-based fields can be edited using the Configure column of the Fields tab:

  • Numeric types including Money, Number and Integer - ability to select a default aggregation function
  • Time fields - ability to define the default time pattern and granularity; if the time field provides granularities of hour, minute and second, then a time zone label may be applied.

Select the checkbox in the Distinct Count column for any fields if a distinct count is needed. For more information, see Working with Distinct Counts on Cloudera Impala.

Was this topic helpful?