Connecting to Impala
The Cloudera Impala™ connector allows users to visualize huge volumes of data stored in their Hadoop/HDFS cluster in real time and with no ETL.
Zoomdata supports Impala
v2.0.0 - 2.6.0
Before you can establish a connection from Zoomdata to Impala, an Impala connector server needs to be installed, configured and enabled first by the Zoomdata Account Administrator.
The table below lists information on the features that are supported by Cloudera Impala:
|Supports Distinct Count?||Yes|
|Supports Live Mode/ Playback?||Yes|
|Supports Group-by Time?||Yes|
|Supports Multi Group-by Charts?||Yes|
|Supports Box Plot?||Yes|
|Custom SQL Capable?||Yes|
|Supports Last Value?||Yes|
Starting with Zoomdata v2.5, support is provided for passing along credentials for users with access privileges to Impala sources (referred to as 'delegation'). The delegation features allows for Impala queries to be issued with the privileges from the specified user. This feature is available in the Connection page and displays as the 'Do As User' field.
CONFIGURING THE CONNECTION
For details about what is provided on each page of the connection process, review the article Source Connection Workflow . Depending on your needs, you can either follow the steps in order from start to finish or jump to a specific section in the connection process:
- General Page
When the connector server is set up by the Zoomdata Administrator, certain parameters on the Connectors page may be customized. The instructions below assumes that the default configuration parameters were kept. If this is not the case, then your Connection page may differ slightly from the screen captures and references provided.
- Tables Page
- Fields Page
- Refresh Page
- Charts Page
Log into Zoomdata.
Administrators and users with appropriate access privileges can connect data sources in Zoomdata.
- Specify the name of your source and add a description (if desired).
- Click Next to continue to the next setup page.
Enter the connection details on this page to enable Zoomdata to access your Impala source. The connection page for first time connections will display a blank form for new credentials to be entered. If a validated connection already exists, you have the option to either use it or input new credentials.
When establishing a new connection, complete the following fields:
- Enter a unique name for the connection (to help distinguish between other connections in this Zoomdata account).
Specify the JDBC URL. In current Zoomdata version, you can connect to your Impala data source using either simple user credentials authentication or Kerberos authentication with optional SSL encryption. This article describes how to connect to Impala using simple authentication. Refer to
Connecting to Impala on Kerberized CDH
Connecting to Impala with TLS (SSL)
articles for more details on the configuration.
Zoomdata enables you to connect either to a single Impala node or to multiple nodes within a cluster.
To connect to a single Impala node, specify a JDBC URL in the following format:
To connect to multiple Impala nodes, specify required JDBC URLs separated by commas in the corresponding field. The URLs will be utilized in a round-robin fashion. Keep in mind that such a connection will be valid as long as there is at least one available node. If all the nodes can not be reached, then the connection won't be validated.
- If Impala authentication has been set up, provide the 'User Name' and 'Password'.
If allowing for Impala Delegation, select from the 'Do As User' drop-down list (which would have been set up by the Zoomdata Administrator).
This field basically allows Zoomdata to pass along credentials for the specified user with access rights to Impala.
If successfully validated, the connection is saved.
The Tables page lets you select the schema and table to connect with and provides a preview of the selected table. In addition, caching options and toggling the availability of the fields can be done on this page.
- Select the schema and then select the desired collection to connect to Zoomdata.
Select the schema and the collection in the Collections section.
- Create a Custom SQL query, if needed.
- Toggle the caching options (SparkIt and Caching), as needed.
- Toggle the availability of the fields, as needed.
- Click Next to continue.
The Fields page lets you (1) configure attribute options, (2) create custom labels for the fields in your data source (that will be displayed in the charts), (3) manage the Volume metric, and (4) work with Calculations.
- For the Visible column, uncheck the box to hide that particular field from users.
- For the Label column, create custom names, as desired, that will be displayed in charts and dashboards (otherwise, the Field ID will be used).
- For the Type column, you have the option to edit the field type (although usually you won't need to do this).
- For the Partition column, time-based fields may be configured for partitioning. The following options are available:
- No (partitioning to be done)
Date - this option is available for the Time field type. If you select this option, the list of the partitioned columns will be displayed in the Configure column.
Function - If you select this option, the list of the partitioned columns and supported MURMUR3_HASH function will be displayed in the Configure column.
column, numeric and time-based fields may be edited:
- Numeric types including Money, Number and Integer - ability to select a default aggregation function
- Time fields - ability to define the default time pattern and granularity; if the time field provides granularities of hour, minute and second, then a time zone label may be applied
column, tick the checkbox for any fields if Distinct Count is desired for it.
A chart cannot be created using two different metrics that have distinct counts enabled. Access the article Enabling Distinct Counts on Cloudera Impala for additional information.
- The Statistics column lets you manually refresh Zoomdata's connection to that particular field.
The Filter Display column lets you: Configure
settings for fields.
The next section in this page covers the Volume Metric.
section, (a) determine whether this standard metric should be visible to users and (b) provide a custom label, if desired.
The last section in this page gives you the ability to create custom Calculations (visible only after the initial Save of this connection).
section, create custom formulas (which becomes new metrics that users can access in the chart canvas).
If you are setting up a new connection, the Calculations section will not be available until after the connection is saved.
- Click Next to continue with the connection process.
The Refresh page lets you schedule asynchronous jobs to update the source metadata. For guidance to set up a refresh schedule, refer to the article Using the Zoomdata Scheduler .
On the Charts page, you can:
- Edit Global Default Setting
- Select the Standard and, if available, Custom chart styles to be used with the data source
- Set default parameters (group, sub-group, colors, sorting, and so on) for each chart style
Click Finish to save your changes. Once your data connection has been established, it will be listed under the My Data Sources section of the page.