Zoomdata Version

Connecting to Elasticsearch

Starting from Zoomdata v2.4, three connectors that support different versions of Elasticsearch are available. To connect your data sources, you may use the following Elasticsearch connectors:

  • Elasticsearch 1.4.1 - 1.7.5
  • Elasticsearch 2.0.0 - 2.4.3
  • Elasticsearch 5.0 - 5.4

The table below lists information on the features that are supported by Elasticsearch:

Supports Distinct Count? Yes
Supports Live Mode/ Playback? Yes
SparkIt Capable? No
Supports Group-by Time? Yes
Supports Multi Group-by Charts? Yes
Supports Histogram? Yes
Supports Box Plot? Yes
Custom SQL Capable? No
Supports Last Value? No
Supports Partition? No

CONFIGURING THE CONNECTION

For details about what is provided on each page of the connection process, review the article Source Connection Workflow. Depending on your needs, you can either follow the steps in order from start to finish or jump to a specific section in the connection process:

Start

  1. Log into Zoomdata.
  2. Click the Sources menu item.

    Figure 1
  3. Click the Elasticsearch icon.

General Page

  1. Specify the name of your source and add a description (if desired).


Figure 2

  1. Click Next to continue to the next setup page.

Connection Page

This page defines the connection source for Zoomdata to be able to access the data source. If this is the first time setting up a connection, then you need to input the necessary credentials. If a validated connection already exists, you are given the option to use it.

  1. To create a new connection, select the Input New Credentials option.
  2. Enter a unique name for the connection (to help distinguish between other connections in this Zoomdata account).
  3. Specify the Connection String - you may use HTTP/HTTPs or Transport (TCP)/Transports protocols to connect to your data source. For HTTP/HTTPs protocol specify the base URL, whereas for Transport/Transports, specify the list of nodes. Keep in mind, that you must specify the nodes within one cluster.
    Provide the connection string in the corresponding format:
    Protocol Connection String Format Example
    http/https <schema>://<host1>:<port1>,...,<hostN>:<postN>/<prefix> http://ip-10-2-2-241.ec2.internal:80/es
    transport/transports <schema>://<node>,<node>,<node> transports://10.2.2.2:9010,10.2.2.3:9010

<schema> - stands for the protocol that you want to use:

  • HTTP or HTTPs (with SSL support);
  • Transport/Transports (with SSL support)

<node> - an address of a node within a cluster in the following format: host:port

  1. If required, specify your Elasticsearch User Name and Password .
  2. Click Validate .
    If successfully validated, the connection is saved.


Figure 3

Indices Page

The Indices page lets you select the indices and types to be queried, and select the fields to be handled. You can do this in three steps:

  1. Select indices and aliases to be queried.


Figure 4

  1. You can select indices Manually or Automatically .
    If you want to get the data only from specific indices, select the Manually option and choose the corresponding indices from the list below.
    The Automatically option is more flexible. It lets you set the pattern by which the indices will be selected automatically. This means that if a new index has been added to your data set and it matches the specified pattern, such index will be queried by Zoomdata.
Keep in mind, that in such case the fields for the indexes will not be refreshed. That is, if the new fields are added to your data source, they will be added to Zoomdata only after you click the Refresh Fields button on the Fields page. If there are some changes in the existing fields (for example, if a field has been removed) they won't be applied.
Note that in case no indices match pattern while querying, you will get an empty chart.

For this option, you can select one of the pattern types:

  • Native - specify the pattern for index names. Use asterisk (*) to replace one character or a set of characters.
    For example, you want to get all the indices whose name starts with log and ends with 16 . In this case, specify the following pattern: log*16
  • Time-Based - set the time pattern to get the matching indices. Check the supported date and time patterns .
    For example, the time pattern YYYY-MM will return all the indices, whose name will match this pattern (as shown in the Figure 5 example). Note that if the Index Name include text with the time and date pattern, you need to enclose the text portion in brackets [ ]:


Figure 5

Examples:

Index name Pattern
2016-01 YYYY-MM
2016-3 YYYY-Q
10:23:11 HH:MM:SS
logstash-2016-06-14 [logstash-]YYYY-MM-DD
  1. Configure filtering by type. This step is optional. If you need to filter by the type, select Enable Filter By Type and click Filter . When you click Edit , the list of types available in the selected indices is displayed. In case where types have different mapping in different indices, you will see all fields present in both types.
    If this checkbox is cleared, all the types that refer to the selected indices are selected.
    If some fields have different data types in types, you are not able to use them for grouping, filters, and so on. However, the option is still available for raw export.


    Figure 6

  2. Configure the fields settings if needed. If your data set contains multi-field types, they are recognized and listed under the select fields section.
Due to specifics of ElasticSearch v1.7, the fields of the multi_field type are detected as raw data only. The data from the raw data field is displayed only in the Details dialog box, for a specific element on your chart, and in the exported file.
Their sub-fields are detected according to mapping. The fields of the token_count type cannot be used in raw export and are not shown in details and the text-search results.
  1. Enable or disable caching and lookup values for your data source. Click Next .

Fields Page

The Fields page lets you (1) configure attribute options, (2) create custom labels for the fields in your data source (that will be displayed in the charts), (3) manage the Volume metric, and (4) work with Calculations.

  1. Determine whether the field should be visible or not to the user.
  2. Create unique label names, as needed, for each Label field.
  3. Create unique label names, as needed, for each Label field.
When you create a data source, the specific number of distinct values for the attribute fields are saved in Zoomdata depending on the data sample from your data set. You can filter the data on your chart by these values. While editing a data source, if you want to use all distinct values in the filter (that is from whole data source), click the Refresh button in the Statistics column.
  1. For the Type column, you have the option to edit the field type (although usually you won't need to do this).
  2. For the Configure column, numeric and time-based fields may be edited:
    • Numeric types including Money, Number and Integer - ability to select a default aggregation function
    • Time fields - ability to define the default time pattern and granularity; if the time field provides granularities of hour, minute and second, then a time zone label may be applied
  3. Select fields for Distinct Counts as needed.
  4. Refresh the connection to a particular field, as desired.
  5. Configure Filter Display settings for fields.
  6. Edit the Volume Metric settings, as needed.
  7. Work with Calculations, if available and as needed.
  8. Select a checkbox in the Faceted filter column for the corresponding field if you want to perform a search by a word or phrase on your chart
  9. Work with Calculations , if available and as needed.
    If you are setting up a new connection, the Calculations section will not be available until after the connection is saved.
  10. Click Next to continue.


Figure 7

Refresh Page

The Refresh page lets you schedule asynchronous jobs to update the source metadata. For guidance to set up a refresh schedule, refer to the article Using the Zoomdata Scheduler .

Charts Page

On the Charts page, you can:

  1. Edit Global Default Setting.
  2. Select the Standard and, if available, Custom chart styles to be used with the data source.
  3. Set default parameters (group, sub-group, colors, sorting, and so on) for each chart style.

Learn more about how to customize a chart .

Click Finish to save your changes. Once your data connection has been established, it is listed under the My Data Sources section of the page.

SERVICE COLUMNS

When you connect to your Elasticsearch data source, the additional service column _type is added.

The _type column contains all selected Elasticsearch types that you can visualize as attributes on your charts.

WORKING WITH ELASTICSEARCH

Distinct Counts and Percentiles

Distinct count and percentiles metrics return approximate values in Elasticsearch. The precision of the result returned by distinct count metric depends on precision threshold setting (default value is 1000).

You can change the value of precision threshold by setting the elasticsearch.query.cardinality.precision.threshold property in the zoomdata.properties file.

See Elasticsearch's documentation on the following for more information:

The table below lists all available properties that you can modify to work with Elasticsearch.

Property Default Value Use the property to Notes
elasticsearch.query.cardinality.precision.threshold 1000 control the level of accuracy of the distinct counts The maximum supported value is 40000. However, Zoomdata does not recommend to set such value as it may result in performance issues and the data source itself may return errors. For more info, refer to the Precision Control section by Elasticsearch.
elasticsearch.query.limit.nongrouped 10000 set the limit for the number of non-grouped records (per shard) to execute on.
elasticsearch.query.limit.grouped 10000 set the limit for the number of grouped records (per shard) to execute on.

If you need to change the default settings, you can add the corresponding properties (listed above) to the zoomdata.properties file and assign the required values. For more details about working with the zoomdata.properties file, refer to the article Managing Configurations in Zoomdata .

Tokenization

Keep in mind that Elasticsearch, by default, tokenizes or analyzes 'Fields' that are of type 'string' (or attribute). As a result, strings consisting of two or more words may become separate fields when connected to Zoomdata (for example, city names like Las Vegas ). To disable this process and ensure that a string field is not tokenized, enter the following code for that field:

index: "not analyzed"

Example:

City: {
type: "string"
index: "not_analyzed"
}