Zoomdata Version

Support of Nested Data Structures in Zoomdata

Zoomdata supports aggregations for nested (or hierarchical) data structures for the following data stores: Elasticsearch , MongoDB , and Apache Solr.

There are two ways to store nested structure:

  1. Store all hierarchy as a single document, for example, in JSON format (nested documents)
  2. Store hierarchy items as separate documents and additional info on hierarchical links internally (block join)

The following data sources support nested data structures:

Approach Solr Elasticsearch MongoDB
(all versions)
Nested documents
Block join

Nested Documents

Hierarchical structure can be represented in JSON format. In MongoDB and Elasticsearch, storing such structures is supported.

Consider the following example. We need to store a hierarchy of divisions by country with two divisions in country. Also we need to store some general country information, for example, foundation year.

In this case, the following JSON is sent to the index document:

{
"country":"Germany",
"foundation year":2008,
"divisions":[
{
"city":"Berlin",
"sales":200,
"manager":{
"first name":"Robert",
"last name":"Simmons",
"years in company":4
}
},
{
"city":"Munich",
"sales":200,
"manager":{
"first name":"Robert",
"last name":"Simmons",
"years in company":4
}
}
]
}

In MongoDB, you can store such documents and then query them as is, without any restrictions. However, the performance may be slow if the document contains a lot of arrays.

In Elasticsearch, we recommend using the "nested" type for complex objects before the document is indexed.

Block Join Support

There is another way to store hierarchical structures. All hierarchy items are stored as separate elements, with information about the hierarchical links stored internally. Apache Solr supports this approach.

Consider the following example. We need to store a hierarchy of divisions by country with two divisions in country. Also we need to store some general country information, for example, foundation year.

In this case, the following JSON is sent to the index document:

{
"country":"Germany",
"foundationYear":2008,
"_childDocuments_":[
{
"city":"Berlin",
"sales":200,
"managerFirstName":"Robert",
"managerLastName":"Simmons",
"managerYearsInCompany":4
},
{
"city":"Munich",
"sales":200,
"managerFirstName":"Robert",
"managerLastName":"Simmons",
"managerYearsInCompany":4
}
]
}

As a result, there are three documents in the index. Information on hierarchical linking of these objects is stored internally in Solr.

   {
      "country":"Germany",
      "foundationYear":2008
      },
      {
      "city":"Berlin",
      "sales":200,
      "managerFirstName":"Robert",
      "managerLastName":"Simmons",
      "managerYearsInCompany":4
      },
      {
      "city":"Munich",
      "sales":200,
      "managerFirstName":"Robert",
      "managerLastName":"Simmons",
      "managerYearsInCompany":4
      }
  

You must specify what fields are used in parent documents. To do this, you must select the checkbox in the Parent Field column on the Fields tab while creating or modifying the data source configuration .

Was this topic helpful?