Managing Impala Connectors
The Cloudera Impala™ connector allows users to visualize huge volumes of data stored in their Hadoop/HDFS cluster in real time and with no ETL.
Zoomdata supports Impala
2.5.0 - 2.11.0
What does Impala support?
The table below lists information on the features that are supported by Cloudera Impala.
|Supports Distinct Count?||Yes|
|Supports Group-by Time?||Yes|
|Supports Multi Group-by Charts?||Yes|
|Supports Box Plot?||Yes|
|Supports Derived Fields?||Yes|
|Custom SQL Capable?||Yes|
|Live Mode & Playback||Yes|
|Supports Last Value?||Yes|
- Impala versions prior to 2.5.0 have a known issue with the unix_timestamp() function. This issue affects the Zoomdata TEXT_TO_TIME row level function and can cause the TEXT_TO_TIME function to return incorrect results. This issue was fixed by Cloudera in versions of Impala 2.5.0 and greater. To learn more see the Impala project web site for the issue details.
Support is provided for passing along credentials for users with access privileges to Impala source. Delegation allows for Impala queries to be issued with the privileges from a specified user. This is available in the Connection page and is set as the Do As User field. See Enabling Impala User Delegation.
Managing your impala connectors
When setting up an Impala connection, you need to provide the following.
Specify the JDBC URL. You can connect to your Impala data source using either simple user credentials authentication or Kerberos authentication with optional SSL encryption. See Connecting to a Kerberized CDH Cluster and Connecting to Impala with TLS (SSL) Enabled for additional details on the configuration.
Zoomdata enables you to connect either to a single Impala node or to multiple nodes within a cluster. To connect to a single Impala node, specify a JDBC URL in the following format:
To connect to multiple Impala nodes, specify the required JDBC URLs separated by commas in the corresponding field. The URLs will be utilized in a round-robin fashion. Keep in mind that such a connection will be valid as long as there is at least one available node. If all the nodes can not be reached, then the connection won't be validated.
- If Impala authentication has been set up, provide a user name and password.
- To allow for Impala user delegation, select the appropriate custom user attribute from the Do As User drop-down list (set up by the Zoomdata supervisor). This field basically allows Zoomdata to pass along credentials for the specified user with access rights to Impala. See Enabling Impala User Delegation.
- Select Validate. If successfully validated, the connection is saved.
Time-based fields can be configured for partitioning using the Partition column. The following options are available:
- No (partitioning to be done)
Date - this option is available for the Time field type. If you select this option, the list of the partitioned columns will be displayed in the Configure column.
Function - If you select this option, the list of the partitioned columns and supported MURMUR3_HASH function will be displayed in the Configure column.
Numeric and time-based fields can be edited using the Configure column:
- Numeric types including Money, Number and Integer - ability to select a default aggregation function
- Time fields - ability to define the default time pattern and granularity; if the time field provides granularities of hour, minute and second, then a time zone label may be applied
Select the checkbox in the Distinct Count column for any fields if a distinct count is needed. For more information, see Working with Distinct Counts on Cloudera Impala.
On the Charts page, you can:
- Edit Global Default Setting
- Select the Standard and, if available, Custom chart styles to be used with the data source
- Set default parameters (group, sub-group, colors, sorting, and so on) for each chart style
Select Finish to save your changes. Once your data connection has been established, it is listed under My Data Sources.
Was this topic helpful?