Picking the Right Data Store: Generic or Specialized?
In this video, you’ll find out how we got from the invention of magnetic storage and fast computers to today’s complex world of modern data platforms and streaming analytics.
There were a lot of steps along the way starting with mainframes and hierarchical databases. Progress has occurred in every area: database design, query languages, scalable systems, and distributed architectures. Now open source projects like Hadoop and AI technologies like machine learning continue to advance the way organizations consume, store, and analyze data.
Now that we've moved into the world where we've decided to collect everything that happens, we have a lot of data. We no longer have the world where all that data is very well organized according to a schema, all the data is consistent so that the sales of pencils aren't mixed up with people's favorite colors. It's all tightly organized in the past and in the future. It's big chaos.
Types of Data
So, there's lots of different types of data. There's structured data, there's unstructured data, there's data that feels like documents, there's data that feels like tables of numbers, there's data that is video, there's data that is logs, there's data relationships and so on.
So, when it comes to picking the right data store to answer your analytic questions, one of the big questions is are you going for something generic or are you going for something very specific or is it in fact a mix of many different specific systems tied together to create the ultimate question answering machine.
Specialized Data Stores
In terms of specialized data stores, there are many. There are specialized data stores for key value parameters such as Cassandra, documents such as MongoDB, logs such as--Logs such as Splunk, graph stores such as Neo4j, schemaless stores, relational stores, modern relational stores, noSQL stores, new SQL stores. So, the question is how do you pick?
Well, one of the things you pick is based on your data and the questions you want to ask. But, the other is what is the complexity that you are gonna be introducing by bringing too many different systems with too much flexibility to the table.
So, the answer really is you're gonna have to pick the right store for the right data, but you have to think about cost.
Cost Versus Capability
So for me, the crux of my story is it's all about cost versus capability. You could potentially design and build the most amazing data store that was perfect for your data types and the questions that you want to answer now. You could design the perfect library. Little do you know that somebody in 100 years' time is gonna invent carbon dating, and that's gonna be a lot of data stored in a different way, and it's gonna totally change the way you think about all of these bone samples you're examining.
So, the cost versus capability - there are many questions you need to consider. First is, what is the length of your commitment? Do you want to buy some licenses from someone and run the same system for 10 years, or are you thinking I'm gonna do this for a month, and then I'm gonna try something else? Am I in a business that's fairly stable, or am I in a business that's evolving very quickly?
Secondly, how quickly do I need this system to be up and running? Maybe I can make it very cheaply if I buy a bunch of old servers and run them under my desk. But, that's probably gonna take a long time, whereas I could just go to a vendor who runs everything in the cloud, click a few menus and have the data up and running.