Zoomdata Version

Managing the Elasticsearch Connector

The Zoomdata Elasticsearch connector lets you access the data available in the Elasticsearch storage for visualization and exploration using the Zoomdata client. The Zoomdata Elasticsearch connector supports the following Elasticsearch versions.

  • Elasticsearch 5.0 - 5.6
  • Elasticsearch 6.0 - 6.1

Before you can establish a connection from Zoomdata to Elasticsearch storage, a connector server needs to be installed and configured. See Managing Connectors for general instructions and Connecting to Elasticsearch for details specific to the Elasticsearch connector.

After the connector has been set up, you can create data source configurations that specify the necessary connection information and identify the data you want to use. See Managing Data Source Configurations for more information. After data sources are configured, they can be used to create dashboards and charts from your data. See Creating Dashboards and Creating Charts.

Zoomdata Feature Support

The Elasticsearch connector supports all Zoomdata features, except for the following features:

  • Admin-defined functions
  • Group by UNIX time
  • Kerberos authentication
  • Partitions
  • User delegation
  • Wild card filters, case-insensitive mode
  • Wild card filters, case-sensitive mode

Connecting to Elasticsearch

When establishing a connection to Elasticsearch, make sure you:

  1. Specify the Connection String: you may use HTTP/HTTPS or Transport (TCP)/Transports protocols to connect to your data source.

  2. For HTTP/HTTPS protocol specify the base URL, whereas for Transport/Transports, specify the list of nodes. Keep in mind, that you must specify the nodes within one cluster.

  3. Provide the connection string in the corresponding format:

    Protocol Connection String Format Example
    HTTP/HTTPS <schema>://<host1>:<port1>,...,<hostN>:<postN>/<prefix> http://ip-10-2-2-241.ec2.internal:80/es
    Transport/Transports <schema>://<node>,<node>,<node> transports://10.2.2.2:9010,10.2.2.3:9010
  4. <schema> - stands for the protocol that you want to use:

    • HTTP or HTTPS (with SSL support)
    • Transport/Transports (with SSL support)
    • <node> - an address of a node within a cluster in the following format: host:port
  5. If required, specify your Elasticsearch User Name and Password.

  6. Select Validate to confirm your connection.

Data Source Configuration Notes

When setting up an Elasticsearch data source configuration, consider the following notes for the Indices tab.

You select the indices and types to be queried, and select the fields to be handled. You can do this in three steps:

  1. Select indices and aliases to be queried.

  2. You can select indices Manually or Automatically.

    • If you want to get the data only from specific indices, select the Manually option and choose the corresponding indices from the list below.

    • The Automatically option is more flexible. It lets you set the pattern by which the indices will be selected automatically.

      For this option, you can select one of the pattern types. Note that when no indices match the pattern while querying, your charts are returned empty.

      • Native - specify the pattern for index names. Use asterisk (*) to replace one character or a set of characters.

        For example, you want to get all the indices whose name starts with log and ends with 16. In this case, specify the following pattern:

        log*16 
      • Time-Based - set the time pattern to get the matching indices. Check the supported date and time patterns.

        For example, the time pattern YYYY-MM will return all the indices, whose name will match the pattern in the following examples. Note that if the Index Name includes text with the time and date pattern, you need to enclose the text portion in brackets [ ]:

        Examples:

        Index name Pattern
        2016-01 YYYY-MM
        2016-3 YYYY-Q
        10:23:11 HH:MM:SS
        logstash-2016-06-14 [logstash-]YYYY-MM-DD
  3. Keep in mind, the fields for indexes will not be refreshed. If new fields are added to your data source, they are added to Zoomdata only after you click the Refresh Fields button on the Fields tab of the data source configuration. If there are some changes in the existing fields (for example, if a field has been removed) they won't be applied.
  4. Optionally, configure filtering by type. If you need to filter by type, select the Enable Filter By Type checkbox. The type by which filtering will occur is shown. Click Edit to alter the filter by type by selecting one from a list of types available for the selected index

    If the Enable Filter By Type checkbox is cleared, all the types that refer to the selected indices are selected.

    If some fields have different data types in types, you are not able to use them for grouping, filters, and so on. However, the option is still available for raw export.

When you connect to your Elasticsearch data source, the additional service field _type is added. The _type field contains all the selected Elasticsearch types you can visualize as attributes on your charts.

Working with Elasticsearch

Distinct Counts and Percentiles

Distinct count and percentiles metrics return approximate values in Elasticsearch. The precision of the result returned by distinct count metric depends on precision threshold setting (default value is 1000).

You can change the value of precision threshold by setting the elasticsearch.query.cardinality.precision.threshold property in the zoomdata.properties file.

See Elasticsearch's documentation on the following for more information:

The table below lists all available properties that you can modify to work with Elasticsearch.

Property Default Use Notes
elasticsearch.query.cardinality.precision.threshold 1000 control the level of accuracy of the distinct counts The maximum supported value is 40000. However, Zoomdata does not recommend to set such value as it may result in performance issues and the data source itself may return errors. For more info, refer to the Precision Control section by Elasticsearch.
elasticsearch.query.limit.nongrouped 10000 set the limit for the number of non-grouped records (per shard) to execute on.
elasticsearch.query.limit.grouped 10000 set the limit for the number of grouped records (per shard) to execute on.

If you need to change the default settings, you can add the corresponding properties (listed above) to the zoomdata.properties file and assign the required values. For more details about working with the zoomdata.properties file, refer to the topic Managing Configurations in Zoomdata .

Tokenization

Keep in mind that Elasticsearch, by default, tokenizes or analyzes 'Fields' that are of type 'string' (or attribute). As a result, strings consisting of two or more words may become separate fields when connected to Zoomdata (for example, city names like Las Vegas ). To disable this process and ensure that a string field is not tokenized, enter the following code for that field:

   index: "not analyzed"

Example:


    City: {
type: "string"
index: "not_analyzed"
}

IP Addresses

The IP Address data type is supported for Elasticsearch data connectors. Fields of this type are treated as ATTRIBUTEs and can be used in:

  • An Elasticsearch text search box. When searching via the text search, Zoomdata also supports the CIDR notation for IP addresses as described in the Elasticsearch documentation (https://www.elastic.co/guide/en/elasticsearch/reference/current/ip.html).
  • The Group By selection box.
  • Filters, although Zoomdata does not support CIDR notation in filters for an IP address field. An exact match is required.
  • Row-level expressions. In row-level expressions, Zoomdata treats IP addresses as strings and expect an exact match.

Was this topic helpful?