Data Platform Variety: NoSQL or No Schema
This video explains the importance of variety among modern data platforms.
When comparing modern data platforms with legacy relational databases, it’s useful to think in terms of schema. Traditional analytic databases, which are schema-on-write, are great for answering questions that the data was modeled to answer -- questions you knew you wanted to ask. But when the data, queries, or applications change, newer data platforms like Hadoop and NoSQL offer more flexibility. In fact, Hadoop and NoSQL, which are schema-on-read, work very well together. Their combination of high volume storage and high performance make them ideal for interactive applications.
So, if we think about modern data platforms, then we really have to think about that in the context of polyglot persistence. And Martin Fowler, the Chief Scientist with ThoughtWorks, really described this really well when he talked about the fact that if you're starting a new enterprise application today, you can no longer assume that the persistence will be relational, which is what has been the case for perhaps the last 30, 40 years.
NoSQL or No Schema
In terms of the reasons for that and the benefits for that, there's a lot of focus on sort of SQL or NoSQL - really, it's about schema. And we've talked about the fact that NoSQL really potentially ought to have been called No Schema.
And particularly in the case of Hadoop versus analytic databases, you can see the benefits in terms of schema-on-read and Hadoop versus schema-on-write in an analytic database. So, traditional relational analytic database, fantastic at answering the questions you knew you wanted to ask when you modeled the data -- not so good at allowing change in that data, new applications, new queries. Hadoop, on the other hand, fantastic at enabling that flexibility, enabling new queries to be brought to the existing data and that data to be updated obviously as it changes.
Better Together: NoSQL and Hadoop
Now, it's important perhaps to think about NoSQL and Hadoop and what they're good for and indeed how they can be used together. So, we see NoSQL, primarily, it's about random reads and writes, real time interactive applications and low predictable latency.
In comparison, Hadoop is mostly read heavy environments. Initially, it was all about batch. Obviously, that's changed with the emergence of things like Spark, but still, that's in its roots--and really optimized for analytics.
High Volume Storage and High Performance
Now, it's that combination of sort of high volume storage and high performance interactive data processing that means that that combination of Hadoop and NoSQL is really useful for the emerging new breed of interactive applications. However, those interactive applications don't exist standalone. What we've seen, and again, Larry Feinsmith from JPMorgan Chase really sort of implored the Hadoop community in 2011 at Hadoop World to ensure that these new data platforms actually worked with the existing tools and products that companies already have. And this is what we've seen as sort of the next stage in the evolution of both Hadoop and NoSQL.