Watch this video to learn why self-service and data governance have to go together -- and how it’s possible to balance the two.
Users love self-service data discovery and analytics tools. And the terms self-service analytics are thrown around as if they represent a simple concept that’s easy to implement with no downside. But, unfortunately, for IT it’s not that simple. Self-service tools remove one burden for IT, but replace it with another just as complex. Controlling and cleaning up the chaos that self-service tools can create. Likewise, tightly governed tools need to expand their self-service capabilities.
Hi, everyone, I want to talk to you about another important characteristic of a big data analytics tool and that is its ability to support governed self-service. See, traditional BI tools and analytics tools have all been focused on governance, really controlling and restricting what users can access based on controls managed by the IT department. And, in fact, a semantic layer is of great advancement in terms of governed analytics because in most companies users are completely dependent on IT for custom reports and for datasets. So, a semantic layer that provides a business view of back-end resources is a real enhancement. But, for many power users especially even a semantic layer is kind of confining. They bump up against the sides of that semantic layer. They want more data. They want a different granularity. They want new dimensions. They want to create their own analytics and metrics and the tools don’t support that.
Visual Data Discovery Tools
So, in recent years we’ve seen the advent of visual discovery tools or visual analysis tools that allow users to get what they want, how they want it and when they want it without having to rely on IT. of course, the downsides of those tools is that when users can do whatever they want willy nilly sometimes it creates chaos in an organization, a lot of spreadmarts, spreadsheets on steroids, everyone talking, everyone publishing data but no one communicating because there’s no consistency in the information and the analytics.
Combining Data Governance with Self-Service BI
So, what we really need in a big data analytics environment is a combination of these two, of governance plus self-service. So, a lot of self-service tools, if we look at them first, have been trying to implement ways to add more governance without undermining the speed and agility that users crave from these tools.
So, one move is to go to a completely thin client environment with a thin client users are less apt to get in trouble by creating reports and dashboards on their desktop that only they can access and they can publish without any kind of oversight from an administrator. Another thing is to create a server environment that allows users to publish things with authorization to a number of other users and reuse those workbooks or shared views as a starting point for creating other analytics.
And, finally, to create a more granular permissioning structure that gives users access to the BI and analytical functionality that they’re authorized to use and the publishing functionality, as well as to the data on row and column basis. Now, from a governance side there’s a lot of things that traditional tools are using or doing to improve the self-service capabilities of those environments.
One is that they’re adopting visual discovery tools as an add-on to their environments and sometimes as the flagship to their environments. Second, they’re giving users these analytical sandboxes, kind of carving out a space inside their environment where users can go access data and combining it with what’s been corporately approved and designed and do their analysis in this little sandbox. And if they have authority and permission, to then share that and publish that more broadly.
And also some are creating more flexible semantic layers that allow a number of users, particularly power users, to go in and adjust that semantic layer and then resolve any differences or conflicts via a repository like a Github or something of that sort.
Governed, Self-Service Workflows
Now, the ideal is that we have these workflows, these self-service workflows that work in a governed manner. So, let me give you a picture of what that looks like. And this is from a report I’ve recently written called A Reference Architecture For Self-Service Analytics. As you can see in here we’ve got a left to right workflow from curate the data to create the data to consumer the data and then a right to left workflow to propose, to prototype and promote artifacts for broad-based consumption. So, in the left to right workflow, which is best done on a single platform ideally, it used to be that IT did all the curation and all the creation and all the business users did was consume. But, you can see here that power users are now taking a more central role in this workflow. They’re not only able to consume, but they do a large portion of the creation, as well as some of the curation using a triumvirate of tools, I call the visual discovery tools, the data prep tools and the data cataloguing tools.
Now, to take what they’ve done we need a mechanism, kind of a backflow or a right to left workflow that takes what they’ve created and promotes it back into the curated environment for a team of people, not necessarily IT, but a team of people to review and approve and then add to the curated environment that’s safe for everyone to consume. A lot of companies actually add watermarks to the artifacts that are produced by the curated environment to distinguish them from things that get produced in an ad hoc manner. Both are important, both are necessary, but it’s equally important for users to understand the difference what kind of data that they’re looking at. Has it been curated? Is it using a standard metrics and dimensions that the company has authorized or is it some new insight based on new metrics and new views that have yet to be validated but may someday be?
So, I hope this provides some insight in how you provide both self-service and governance is a real challenge for many organizations, getting it right is difficult. It involves both good technology and good organization and good process.