High Volume Analytics

Securing Data in a Distributed Data Ecosystem

EXTRA (Securing Data in a Distributed Data Ecosystem)

Watch this video to learn why the more aggressively organizations pursue data-driven business strategies, the more individuals need access to data. And, hand-in-hand with access goes security. In a large organization with lots of data and lots of data sources, controlling who has access to what presents quite a challenge.

Allowing access to be controlled at the data source can cause problems with caching and memory management that degrade performance, especially in systems with many concurrent users. Bringing access authorization up to the BI layer can minimize performance issues while avoiding inadvertent security failures caused by users mixing data from multiple sources.


Access to data is one of the key areas that is under innovation and evolution today. And so, what’s happening right now is there’s a lot of database tooling around handling data authorization, what table you want to have access to within the table, which columns you have access to within the table, which rows, and which column row combinations you have access to. All of that can be a nightmare for anyone thinking to administer and deal with those authorizations. If you think about a typical organization of a thousand users or more, who has access to what piece of data when you’re looking at a table with N number of records and X number of attributes can be quite daunting.

Simplifying Secure Data Access

So, there’s a lot of tooling now on making that simple. Centralize it to the data store who will provide tooling around how to administer that and define that. And so, there’s a lot of great, great work being done on that. The reality is most enterprises don’t have all their data in one system. And the mechanisms to administer data authorization aren’t uniformly applied across system-to-system.

So, if you’re in an actual organization that doesn’t have all their data in one place then you’re left with the challenge of administering data authorizations across multiple different systems.

Start with Your BI Architecture

So, you need to take a different approach. Look at your end-to-end architecture, especially from a BI perspective, and see at what level you’re going to actually tackle data authorization. You still want to make sure that you’re handling things appropriately with identity. You still want identity to percolate through even down to the data source. You still are maybe in a regulated environment. So, you want to understand who has access to what data, when they saw it, what permissions they had when they saw it. All of that rich audit trail you want available. But, what you don’t want to do is hinder your ability to leverage economies of scale by putting the data authorizations in the wrong place in your architecture.

Data Authorization Schemes Affect Performance

So, what’s the problem with putting your data authorizations at the data source from the performance perspective? From the performance perspective you really hinder the applications that are accessing data on behalf of the users, the ability to do things like caching if you think about--and to facilitate cache reuse. If you think about a thousand users accessing a system, if the system is--does not have any knowledge of how the data source is restricting data, it can’t really cache the data in any other way than on a per user basis. If you think about high concurrent systems that means you’re in memory footprint for whatever data your caching is going to expand on a user-by-user basis. That’s going to limit how many users you can have on the system with a given set of hardware. That’s going to extrapolate your IT costs, perhaps beyond the administration expense of managing your data authorizations as an appropriate layer.

Access Authorization at the BI Layer

So, just on performance alone, we’d be beneficial to bring the data authorizations up into the BI layer, even if you’re looking at a single system. But, let’s take the case of when you’re looking at multiple systems. Now, you’re looking at multiple systems or multiple data sources. So, now you’re looking at multiple data sources and each data source has a different way of defining its data authorizations. So, you’ve lost your centralized administration of data authorization. You’re now having to administer data authorization across multiple different systems. You may be even exposing inadvertent security holes given that paradigm. Take into account an authorization to data that actually exposes data that is restricted from another system. And so, if you have a platform that allows users to mix and match and mash up data and bring those into a uniform view you could be inadvertently providing inadvertent access to information that a user shouldn’t have had access to.

Economies of Scale in User Concurrency

So, bringing that data authorization policy up into the tool that’s responsible for mixing and matching, allowing users to explore and hop from dataset to dataset that’s really where the data authorizations belong. It allows for the economies of scale in terms of managing user concurrency on a dataset, efficient caching mechanisms and strategies and all of that, but also prevents having you to think through all the different ways you have to restrict data so as not to allow security vulnerability to exist, but in addition it unifies the way you’re defining that. Every single DBMS platform or tool to restrict data at the data source level will have a different way of defining those restrictions.

So, when you bring them up into a unified layer you’ve got that singular way to define data authorizations across those different data sources. You can really easily see and identify any vulnerabilities that you may be exposing through mash ups of the data and otherwise. You can rely on those systems to expose that data appropriately to the user while leveraging the economies of scale and reducing the load and impact on your underlying systems and the hardware footprint you may need to have to do the caching and other kind of performance improvements that these systems provide.

Securing Data in a Distributed Data Ecosystem

Level up your knowledge on securing data. Bringing access up to the BI layer can minimize performance issues and avoid security failures.



General Inquiries:

[email protected]