State of the Market

Big Data Infrastructure: Spark, MapReduce and Hadoop

Big Data Infrastructure: Spark, MapReduce and Hadoop with Howard Dresner

Watch this video to see who the major players are in big data infrastructure and why.

Of course, Spark and MapR are battling it out for the top spot when it comes to Hadoop. Both have increased their footprint, but Spark is the clear leader. Spark’s powerful, in-memory data processing engine is gaining in popularity. When it comes to big data analytics, speed is king. And among the early adopters like technology, finance, and healthcare it’s the clear choice. Yet MapReduce is holding it’s own thanks to its utility with legacy applications.

Transcript

From an infrastructure perspective, there's been an interesting dynamic over the last couple of years. Spark emerged and has actually now eclipsed MapReduce as the dominant approach for Hadoop.

Figure 24 – Big data infrastructure
Source: Dresner Advisory Services Big Data Analytics Market Study;  Copyright 2017 -- Dresner Advisory Services
Figure 25 - Big data infrastructure 2015 to 2016
Source: Dresner Advisory Services Big Data Analytics Market Study;  Copyright 2017 -- Dresner Advisory Services

Spark Overtaking MapReduce

When we look at this from a two year perspective, the increase in Spark is rather striking. Although MapReduce has also increased, Spark has actually moved into the number one position as being the highest priority for infrastructure within organizations followed by MapReduce.  

From a year-over-year perspective, we can see on the chart that Spark has eclipsed MapReduce.  It's become the number one priority, and that's because Spark being an in memory structure is so much faster.  It actually lends itself to different applications such as machine learning.  Of course, MapReduce is still there.  It's number two, no loss of legacy applications or early adopters still using MapReduce followed by Yarn, which is a prominent part of the Hadoop 2 ecosystem.

Spark Popular in Tech, Finance, and Healthcare

When we look at vertical industries, it's no surprise to see that Spark is at the top of the list for technology organizations, once again being early adopters.

Figure 27 – Big data infrastructure by vertical industry
Source: Dresner Advisory Services Big Data Analytics Market Study;  Copyright 2017 -- Dresner Advisory Services

Also at the top of the list are financial services and healthcare followed closely behind by MapReduce, once again because of those legacy applications.

So, the big takeaway when we're looking at infrastructure is, of course, the shift to the Hadoop 2 ecosystem and most notably Spark being faster, higher capacity, enabling more applications, eclipsing MapReduce.

So, I think the big takeaway from an infrastructure perspective is the emergence of the Hadoop 2 ecosystem, which includes things like Spark and includes things like YARN and TEZ and Mesos, as well.  But, the biggest thing is Spark eclipsing MapReduce and becoming the dominant approach to Hadoop.

Big Data Infrastructure: Spark, MapReduce and Hadoop

Watch this video to discover who the major players are in big data infrastructure and why.

Contact

Sales: +1 888-564-4965