Modern Analytics Trends: Move the Compute Power Not the Data
This video explains why the trend in modern analytics is to move computing power to the data and not the other way around.
In the past, we've had to move data from the source where it originated to the place where it would be processed and analyzed. For example, LambdaRail was a high-speed network designed for moving large data files over long distances. Grid FTP was developed for the same purpose. Sometimes moving data was as simple as getting in your car with a hard drive. Not anymore. Now we bring the compute power to where the data resides.
My name is Mike McCarty. I'm a senior software engineer, focused on big data application and visualizations.
LambdaRail -- Moving Data From the Source
In the past, we've had to move data from the sources where it's produced to the computing facilities. So, we used to have a number of different methods for doing this across organizations, even spanning the continent. So, for example, LambdaRail was an old high-speed network, and you could get time on LambdaRail to transfer your large data files from one side of the coast to the other. And you'd get all the bandwidth during that time to transfer massive data sets.
Also, GridFTP, for example, was developed for transferring the data sets. If the session would time out for whatever reason, you could pick it back up later and continue the transfer. Even resorted to shipping hard drives -- I've literally driven hard drives in my car to transfer data from the source from where it was produced to the computing facility where it's gonna be processed.
Bring Computing to the Data in Spark and Hadoop
So, today, it's a bit different. We're now bringing the compute power to the data and where the data is living. And that's one of the great things about the cloud and having your data in the cloud using platforms like Hadoop and Spark to process the data where it lives so we don't have to move it. Traditional BI tools aren't really gonna scale to the data volumes that we're talking about here. So, in my opinion, organizations need to focus on building the next generation BI tools and deploy them on top of these new computing environments in order to scale.