Accessing Data Using Custom Connectors
This video explores how applications connect to data sources and what that means to an embedded application.
Database query languages like Structured Query Language (SQL) and the Open Database Connectivity Protocol (ODBC) have been around a long time. SQL since the early 1970s and ODBC since 1986. And for as long as people have been querying data, reducing the length of time it took to get answers back -- query latency -- has been a problem. As databases have changed and new types emerged, solving the problem has become even more complex. Custom data source connectors are a solution.
I'm Ryan Haber. I work at Zoomdata with our APIs. I help document them, I help market and sell them, support them with customers, build samples with them. I like to think of myself as our API ambassador.
In my previous videos, I talked about how you deploy your data using white labeling or iFrames or using an SDK to build your own application based on your data. In this video, I'm gonna talk about where you get your data from.
Some Background on Data Access Languages
A little background knowledge will help you - SQL, the Structured Query Language, SQL, was developed in 1970--1972, and ODBC, the Open Database Connectivity Protocol was developed and released in 1986. And the basic, the traditional process would look like this - you'd get your data from all your different sources, and you would use a process called ETL, extract, transform and load, to put it into the single database. And then at this point, your data visualization software or your data reporting software would insert itself into the process.
You would ask it what's the average age of all my male employees, and it would translate that into a SQL request, something like get age of all employees where employee gender is male. And then it would build a table and then send all that data with all the ages and maybe some other ancillary data too back to the visualization software, which then crunches, crunches, crunches for five minutes or five hours and spits out finally a number, 32, 32 is the average age of all your employees.
The Choke Point: Query Latency
So, there's a bit of a lag there, and the lag is connectivity. It's a choke point, and it always has been. And we've been on this quest to widen that choke point to get more data through and to make the whole thing move faster because nobody really wants to wait five hours for their data. And it also limits the utility of your data, too.Instantaneous data is often, although not always, but often much more useful than data that's lagged six or 12 or 24 hours.
Software Architecture from the 1970s
Now, here's the thing - a lot of data visualization software, it's using that same basic process, that same like meta architecture from the 1970s--the 1970s. But, most of the people actually using the data aren't quite that old, maybe a little older. And they're still using that traditional model.
Connecting to Databases
There's other problems, too. Databases are always coming into fashion and going out of fashion, and the tried and true ones that stick around for 30 years, they stick around because they're evolving. So, they kind of keep up with the new developments in the industry. So, a lot of--pretty much all the traditional data visualization applications that I know, they have a one size fits all approach to connecting to databases. They request the data using SQL over an ODBC connection. They get the data, and then they do all this crunching.
There's a security issue there, too. For instance, what if I'm asking for ages. That's not such a big deal. But, let's say I'm asking for the number of distinct social security numbers, and the only way I can get that is by getting a list of all the social security numbers, and that's going over the internet every time I make that request. I think we can see the problem here. So, a lot of databases have evolved to try to help mitigate this, but if you're still using the old style data visualization software, they haven't kept up with it. They're still using the old style request - hey, just give me all the social security numbers, I'll do the rest, don't worry about it.
Modern Databases Routinely Do Calculations
But, the database might be sitting there already knowing how many unique social security numbers there are because a lot of new databases do this calculation and number crunching in the background 24 hours a day as long as they're plugged in and running. So, when you add a new employee, it takes the employee's age and just averages it into the existing average. And when you add a new social security number, it just increases the count of social security numbers by one. So, if you say what's the number of social security numbers in my employees, it just tells you the number. It knows it already if your data visualization software is able to make that kind of a query.
Many Databases. Many Operating Methods.
A difficulty with this, though, is that every one of these databases is going differently. We're really witnessing a time with 10,000 different databases, and each one has its own kind of way of doing things, so you can kind of see why traditional data visualization software would stick with that tried and true model of just getting the data and doing its own crunching.
But, there's another approach. What if you could take away that one size fits all connection and replace it with a connection that's custom and smart for each of these different databases? So, if you're using an older version of say My SQL, then you could use its older functions. If you're using a newer version, you could put in a different connector and take advantage of the newer functions, as well. And if you're using a NoSQL database, then you don't have to worry about translating all of your SQL requests into some other protocol. You can just do what works, what you've always been doing on your data visualization end and use this custom connector to bridge the gap so that however the database wants to present the data, whether it says I don't know the average age but let me give you all the ages and you calculate, or if it says, oh, hey, I know the average age, I'll just tell you that because that's really what you want, and then the only thing you have to send over the internet is just that one number that's useless to any hacker or pirate in the world, and you can work with any of those scenarios because you just need to plug in a different connector.
A Platform for Developing Data-Driven Software
And a really good application, data visualization application to go the whole distance to be not just a data visualization application but to really be a platform for developing your own data-driven software, you want to have a connector development kit, as well, so either you or the vendor can develop a connector just right for your particular database deployment. This is especially important with a lot of very large organizations and agencies that might have their own homebrewed database that doesn't follow a lot of the traditional ones and isn't just a customized version of one of the newer, fancier ones. But, it still has specifications, and if you can make a connector that's customized to it, then you can take full use of all its cool powers, all of its cool special abilities and magic tricks and open up that choke point so you can get what you need very quickly. And I've seen it with our customers reduce lag time from -- this isn't an exaggeration -- three or four hours down to seven or eight seconds.