Zoomdata Version

Connecting to Impala

The Cloudera Impala™ connector allows users to visualize huge volumes of data stored in their Hadoop/HDFS cluster in real time and with no ETL.

Zoomdata supports Impala v2.0.0 - 2.6.0 .

PREREQUISITES

Before you can establish a connection from Zoomdata to Impala, an Impala connector server needs to be installed, configured and enabled first by the Zoomdata Account Administrator.

The table below lists information on the features that are supported by Cloudera Impala:

Supports Distinct Count? Yes
Supports Live Mode/ Playback? Yes
SparkIt Capable? Yes
Supports Group-by Time? Yes
Supports Multi Group-by Charts? Yes
Supports Histogram? Yes
Supports Box Plot? Yes
Custom SQL Capable? Yes
Supports Last Value? Yes
Supports Partition? Yes

Impala Authentication

Starting with Zoomdata v2.5, support is provided for passing along credentials for users with access privileges to Impala sources (referred to as 'delegation'). The delegation features allows for Impala queries to be issued with the privileges from the specified user. This feature is available in the Connection page and displays as the 'Do As User' field.

For information about this Delegation functionality, refer to Cloudera's article Configuring Impala Delegation .

CONFIGURING THE CONNECTION

For details about what is provided on each page of the connection process, review the article Source Connection Workflow . Depending on your needs, you can either follow the steps in order from start to finish or jump to a specific section in the connection process:

Start

  1. Log into Zoomdata.
  2. Click the Sources menu item.

    Figure 1
  3. Click the Cloudera Impala connector icon.

General Page

  1. Specify the name of your source and add a description (if desired).


Figure 2

  1. Click Next to continue to the next setup page.

Connection Page

Enter the connection details on this page to enable Zoomdata to access your Impala source. The connection page for first time connections will display a blank form for new credentials to be entered. If a validated connection already exists, you have the option to either use it or input new credentials.

When establishing a new connection, complete the following fields:

  1. Enter a unique name for the connection (to help distinguish between other connections in this Zoomdata account).
  2. Specify the JDBC URL. In current Zoomdata version, you can connect to your Impala data source using either simple user credentials authentication or Kerberos authentication with optional SSL encryption. This article describes how to connect to Impala using simple authentication. Refer to Connecting to Impala on Kerberized CDH or Connecting to Impala with TLS (SSL) articles for more details on the configuration.
    Zoomdata enables you to connect either to a single Impala node or to multiple nodes within a cluster.
    To connect to a single Impala node, specify a JDBC URL in the following format:
    jdbc:hive2://<impala_host>:<port>/;auth=noSas​l
    To connect to multiple Impala nodes, specify required JDBC URLs separated by commas in the corresponding field. The URLs will be utilized in a round-robin fashion. Keep in mind that such a connection will be valid as long as there is at least one available node. If all the nodes can not be reached, then the connection won't be validated.
  3. If Impala authentication has been set up, provide the 'User Name' and 'Password'.
  4. If allowing for Impala Delegation, select from the 'Do As User' drop-down list (which would have been set up by the Zoomdata Administrator).
    This field basically allows Zoomdata to pass along credentials for the specified user with access rights to Impala.
  5. Click Validate .
    If successfully validated, the connection is saved.


Figure 3

Tables Page

The Tables page lets you select the schema and table to connect with and provides a preview of the selected table. In addition, caching options and toggling the availability of the fields can be done on this page.

  1. Select the schema and then select the desired collection to connect to Zoomdata.
  2. Select the schema and the collection in the Collections section.


Figure 4

  1. Create a Custom SQL query, if needed.
Zoomdata wraps your SQL query into a SELECT statement. If specific statements inside the wrapped query are not supported by your data source, the query will not be executed.
  1. Toggle the caching options (SparkIt and Caching), as needed.
Spark It capability is planned for deprecation in a future release.
  1. Toggle the availability of the fields, as needed.
  2. Click Next to continue.

Fields Page

The Fields page lets you (1) configure attribute options, (2) create custom labels for the fields in your data source (that will be displayed in the charts), (3) manage the Volume metric, and (4) work with Calculations.

  1. For the Visible column, uncheck the box to hide that particular field from users.
  2. For the Label column, create custom names, as desired, that will be displayed in charts and dashboards (otherwise, the Field ID will be used).
When you create a data source, the specific number of distinct values for the attribute fields are saved in Zoomdata depending on the data sample from your data set. You can filter the data on your chart by these values. While editing a data source, if you want to use all distinct values in the filter (that is from whole data source), click the Refresh button in the Statistics column.
  1. For the Type column, you have the option to edit the field type (although usually you won't need to do this).
  2. For the Partition column, time-based fields may be configured for partitioning. The following options are available:
  • No (partitioning to be done)
  • Date - this option is available for the Time field type. If you select this option, the list of the partitioned columns will be displayed in the Configure column.
  • Function - If you select this option, the list of the partitioned columns and supported MURMUR3_HASH function will be displayed in the Configure column.
  1. For the Configure column, numeric and time-based fields may be edited:
    • Numeric types including Money, Number and Integer - ability to select a default aggregation function
    • Time fields - ability to define the default time pattern and granularity; if the time field provides granularities of hour, minute and second, then a time zone label may be applied
  2. For the Distinct Count column, tick the checkbox for any fields if Distinct Count is desired for it.
    A chart cannot be created using two different metrics that have distinct counts enabled. Access the article Enabling Distinct Counts on Cloudera Impala for additional information.
  3. The Statistics column lets you manually refresh Zoomdata's connection to that particular field.
  4. The Filter Display column lets you: Configure Filter Display settings for fields.
    The next section in this page covers the Volume Metric.
  5. In the Volume Metric section, (a) determine whether this standard metric should be visible to users and (b) provide a custom label, if desired.
    The last section in this page gives you the ability to create custom Calculations (visible only after the initial Save of this connection).
  6. In the Calculations section, create custom formulas (which becomes new metrics that users can access in the chart canvas).
    If you are setting up a new connection, the Calculations section will not be available until after the connection is saved.
  7. Click Next to continue with the connection process.


Figure 5

Refresh Page

The Refresh page lets you schedule asynchronous jobs to update the source metadata. For guidance to set up a refresh schedule, refer to the article Using the Zoomdata Scheduler .

Figure 6

Charts Page

On the Charts page, you can:

  1. Edit Global Default Setting
  2. Select the Standard and, if available, Custom chart styles to be used with the data source
  3. Set default parameters (group, sub-group, colors, sorting, and so on) for each chart style


Figure 7

Learn more about how to customize a chart .

Click Finish to save your changes. Once your data connection has been established, it will be listed under the My Data Sources section of the page.