State of the Market

What Are The Big Data Access Methods for Spark and Hive?

Check out this video for insight into big data access methods.

Spark SQL, Hive, and Hive QL offer different approaches to accessing data stored in Hadoop. The Hive alternatives are preferred by those accustomed to typical query languages. Amazon Redshift and Google BigQuery also have adherents in this corner of the big data space. Organization’s show data access preferences based on size and industry. Tech and financial services often adopt Spark SQL, but size plays into this dynamic. Smaller organizations tend to lean more towards Spark SQL, while larger organizations lean towards Hive and Hive QL even though they also show use of Spark SQL.

Transcript

One of the other questions that we asked respondents is the data access methods that they prefer to use when accessing big data.  

Figure 30 – Big data – data access
Source: Dresner Advisory Services Big Data Analytics Market Study;  Copyright 2017 -- Dresner Advisory Services

And at the top of the list is Spark SQL, no surprise with the growing dominance of Spark, followed by Hive and Hive QL, which is a much more traditional approach for accessing Hadoop structures and much more comfortable for people with--accustomed to more typical query languages.  But, it's also worth noting that things like Red Shift from Amazon and also Google BigQuery have grown in their importance as part of that ecosystem.

Preferred Access Methods

One of the questions that we asked our respondents is how they prefer to access data stored within big data systems.  So, when we look at this from a year-over-year comparison, most notably, Spark SQL has now moved into the number one position followed by Hive and Hive QL, and it's also notable to see that Red Shift has improved their position.  

Figure 31 - Big data - data access 2015 to 2016
Source: Dresner Advisory Services Big Data Analytics Market Study;  Copyright 2017 -- Dresner Advisory Services

It's also the first year that we were covering Google Big Query and it came in at number five.  My guess is that it'll improve its performance when we track this again in 2017.

Figure 33 – Big data – data access by vertical industry
Source: Dresner Advisory Services Big Data Analytics Market Study;  Copyright 2017 -- Dresner Advisory Services


Spark SQL

Looking at vertical industries, not surprisingly, high tech adopts Spark SQL, being the early adopters of technology, but also financial services embracing Spark SQL, once again because of its higher capacity capabilities and its speed of processing and its support of things like machine learning. Whereas healthcare, being more traditional and a late adopter, still prioritizes Hive and Hive QL over things like Spark.

Organizations Size Influences Access

Organization size definitely has an impact on the data access method that organizations are going to employ.  Small organizations, also early adopters, they're embracing Spark SQL, but it's also interesting that they are the highest users or the largest users of Red Shift over all other organizations.  Larger organizations, although it's Spark SQL is certainly prominent, it's not nearly as prominent as things like Hive and Hive QL.

Organization size has an impact on data access.  If we focus on small organizations, of course being early adopters, trying to find some competitive advantage, they're embracing Spark SQL.  But, they're also more likely than other sized organizations to embrace Red Shift, and that's because Red Shift is cloud ready and it's also a low cost alternative to other big data solutions. As you go to larger organizations, you're more likely to see more traditional approaches.  So, you see a mix of certainly Spark SQL, but much more likely to find things like Hive, Hive QL and access directly to HDFS.

What Are The Big Data Access Methods for Spark and Hive?

What are the big data access methods for Spark and Hive? Watch this video for insight into methods and more.

Contact

Sales: +1-571-279-6166

General Inquiries: +1(571-279-6000)

sales@zoomdata.com