Make Sure Your Analytics Tool Can Query Any Data Type
In this video, you’ll see how analytic tools can handle SQL, NoSQL, and search-based data sources.
Most data-rich applications combine structured and unstructured data. A lot of valuable multi-structured data lives in non-relational data stores and search engines. In fact, by some estimates, as much as 80 percent of an organization’s data may reside outside of relational databases. So, when you're considering analytic tools to support your big data efforts, make sure they can connect to and analyze all types of data. There are a variety of methods analytic tools can use to query modern and legacy data sources.
Hi, everyone. I want to talk to you about another important characteristic of a big data analytics tool and that is the ability to query all types of data. You see, in the past most analytics tools only queried relational data using SQL. And that was great for the last 20 or 30 years because most of our operational data was stored in relational databases. Well, that’s no longer true. A lot of our data, in fact some people say 80 percent of our data, is stored in non-relational systems. So, the first place we have to look or start is the ability to support NoSQL systems like Mongo and Hadoop and Cassandra.
Not All Data Is Stored On Premise
We also have to realize that a lot of our data is no longer on premises in our own data centers. A lot of data is moving into the cloud. So, we need analytic tools that also query cloud data or exist in the cloud and can query data there or data on premises, kind of a hybrid analytics environment as people are saying.
Also, we have to realize that a lot of data is being stored in search engines like ElasticSearch and Solr. So, good analytic tools need to support and query those types of datasets as well.
Accessing Modern Data Sources with SQL-based Tools
So, traditional SQL-based analytics tools are doing a number of things to access this non-traditional, unstructured, semi-structured data stored in search engines, stored in NoSQL systems. One, traditionally, they’re using the BLOB feature, binary large objects in databases, relational databases, that store text and other unstructured data in that format. They pull them out and then process them using perhaps text mining tools to put them in a relational environment so they can query them. Or just present that unstructured data as is.
The second thing they’re doing is supporting the NoSQL API, so they can access JSON data in Mongo, bring it down, parse it out and query it natively, which is a great new extension to relational databases. And, finally, they’re also supporting search based API, so they can actually query search engines and pull data out of them.
SQL--the Lingua Franca of Analytics
Conversely, NoSQL databases and NoSQL-based analytics tools are adopting SQL based strategies because, as I said, that’s the lingua franca of analytics. So, SQL on Hadoop was introduced a number of years ago by Cloudera when it introduced Impala. Now, we’ve got 20 or so SQL-on-Hadoop type analytics tools. These are not SQL-92 necessarily type of SQL environments, but they’re getting close. And they’re also providing support for search based APIs.
Finally the thing that a great big data analytics tool does is support query virtualization. What do I mean by that? That is the ability to go access any data anywhere without the user having to know how it’s formatted or where it’s stored. The tool takes care of all that. It queries the various data sources on the fly, pulls the data back, joins it and presents it to the user.
Voila, perfect. So, that’s all for big data analytics tools today.