Legacy Storage No More: Navigating Data in the Digital World
Finding information today isn’t the simple endeavor it was when we kept all of our knowledge in libraries. Nowadays, we keep information on magnetic media: hard drives, cloud, slow drives, and memory. Most of us are aiming to answer analytical questions, but we need to decide how to store the data that’ll get us our answers. If this is you, be prepared to consider cost, data types, and questions to be asked.
Cost Considerations and Defining Your Questions
Start by asking yourself the cost of the storage you need. Storage is relatively cheap today, but it can add up depending on how much data you have and the type of storage you use.
You’re going to have to choose analytical software too, but not before you figure out the kinds of questions being asked and who’ll be asking them.
- Is it a person engaged in scientific or technical research who has data science libraries and all sorts of advanced software to use machine learning for finding the answers to difficult questions?
- Is it ordinary business users who have simple questions they want simple answers to in the form of graphics?
- Is it computers? Is it algorithms that are mining through your data and searching answers and outliers, helping you solve problems?
Query Engines vs. Preconfigured Systems
The latest generations of query engines are designed for distributed systems, data stored in the cloud, and different data types. They're not just built for relational data. Impala, Presto, and Drill are all examples, and they're all very powerful, yet very complicated. To set them up, understand them, and configure them requires a lot of expertise. If you don’t have that expertise available to you, you may want to consider a preconfigured system.
If that’s the case, you have the option to have someone run workloads for you and configure a system, whether it's RedShift, Amazon Athena, Snowflake, or Big Query, where most of the heavy lifting is done. You simply upload your data, parse the questions, and set a few configurations parameters.
At the end of the day, your BI software solution decision depends on:
- DIY, or outsource?
- Open source, or no?
- Prefer a cloud solution?
- Would you rather pay to store the data or pay per transaction?
Remember Scalability and Security
Whatever you choose, don’t forget that what works for you now should work for you in the future if you don’t want to totally upend what you’re doing as business needs change. Choose a solution that will scale with you down the line. Security’s key, too. It’s a complex endeavor depending on what you’re storing and who should have access, so evaluate carefully.
Need Some Help Figuring out Your Data Store and Big Data Platform?
We get it, there’s a lot to this, and we’re here for you if you need help figuring it out.
First off, we created an eBook for you that goes into way more detail. It’s kind of a fairy tale and it has cool pictures, so you should check it out for sure: Darwin Goes to the Library – Selecting the Right Data Store.