Zoomdata Version

Support of Nested Data Structures in Zoomdata

Zoomdata supports aggregations for nested (or hierarchical) data structures for the following data sources: Elastic Search , MongoDB , and Solr .

There are two ways to store nested structure:

  1. Store all hierarchy as a single document, for example, in json format (nested documents)
  2. Store hierarchy items as separate documents and additional info on hierarchical links internally (block join)

Following data sources support corresponding approaches:

(except v.5.2)

Approach Solr Elastic Search
(all versions)
MongoDB
(all versions)
Nested documents
Block join

NESTED DOCUMENTS

Hierarchical structure can be represented in json format. In MongoDB and Elastic Search, storing such structures is supported.

Let's consider the following example. We need to store hierarchy of divisions by country: there are two divisions in country. Also we need to store some general country info, for example, foundation year.

In this case, the following json is sent to the index document:

{
"country":"Germany",
"foundation year":2008,
"divisions":[
{
"city":"Berlin",
"sales":200,
"manager":{
"first name":"Robert",
"last name":"Simmons",
"years in company":4
}
},
{
"city":"Munich",
"sales":200,
"manager":{
"first name":"Robert",
"last name":"Simmons",
"years in company":4
}
}
]
}

In MongoDB, you can store such documents and then query them as is, without any restrictions. However, the performance may be slow if the document contains lots of arrays.

In Elastic Search,  it is recommended to use "nested" type for complex objects before document is indexed.

BLOCK JOIN SUPPORT

There is one more way of  storing hierarchical structure. All hierarchy items are stored as separate elements, info on hierarchical links is stored internally. Solr supports this approach.

Let's consider the following example. We need to store hierarchy of divisions by country: there are two divisions in country. Also we need to store some general country info, for example, foundation year.

In this case, the following json is sent to index document:

{
"country":"Germany",
"foundationYear":2008,
"_childDocuments_":[
{
"city":"Berlin",
"sales":200,
"managerFirstName":"Robert",
"managerLastName":"Simmons",
"managerYearsInCompany":4
},
{
"city":"Munich",
"sales":200,
"managerFirstName":"Robert",
"managerLastName":"Simmons",
"managerYearsInCompany":4
}
]
}

As a result, there are 3 documents in the index. Information on hierarchical linking of these objects is stored internally in Solr.

{
"country":"Germany",
"foundationYear":2008
},
{
"city":"Berlin",
"sales":200,
"managerFirstName":"Robert",
"managerLastName":"Simmons",
"managerYearsInCompany":4
},
{
"city":"Munich",
"sales":200,
"managerFirstName":"Robert",
"managerLastName":"Simmons",
"managerYearsInCompany":4
}

You must specify what fields are used in parent documents. To do this, you must select the checkbox in the Parent Field column on the Fields tab while creating or editing the data source .

Was this topic helpful?