Watch this video to learn the importance of data structures and their relationships to how we use data.
From a data standpoint, for example, business intelligence is about mapping things to tables. And when you think of a relational database, you think in terms of sets. Data is either in the set or it's not in the set. But there is a structural form that is natural to any kind of data. Form describes a collection of data. And not everything can be described in tables and sets. You may have the same data in three different forms, but each form requires a different processing engine because of what you want to do with the data is different.
Watch this video to find out how the differences between streaming data and database data change how data is managed.
For a long time, enterprise data and database data were synonymous. For example, an application writes transactions into a database. When we want to use the data we read it out of the database. Or we extract it and put it in another database for business intelligence purposes. The model is very different for streaming data. For one thing, there’s no place to retrieve your data. It’s a push model rather than a pull model. And that creates a lot of changes to the architecture you use to collect and persist that data.
In this video, you’ll see how many data platforms of the past have returned to fit new applications. And how it helps to understand the historical progression of data platforms.
When we say database today, we typically mean an SQL relational database. But there were databases before the relational database. Quite a few in fact. When the relational model emerged, it was adopted in large part because it had the optimal set of tradeoffs, and every data model has tradeoffs, for the use cases of the time. Many of the pre-relational data models have returned in reinvented forms like NoSQL to serve different applications and application models.
Watch this video to find out the tradeoffs between speed and scalability that all analytical tools must balance.
For big data analytics, some tools emphasize speed. Others place a priority on scalability. Tools that focus on speed typically use an "extract and query"
Watch this video to learn why providing ad hoc data access for business users is a great idea -- in theory. But providing it on a budget of limited resources is a balancing act that can be hard to pull off.
While adopting horizontally scalable data storage with a BI front end gives IT the infrastructure to handle increasing user loads, it doesn’t necessarily solve other issues.
For example, in high-variety, big data environments, users need a self-service way to explore data without knowing the precise query they want to answer. But when many concurrent users explore data in this way, it can degrade response time.
In this video, you’ll learn that the root cause of data analytics performance issues can often be traced to differences between traditional business intelligence/data warehousing (BIDW) environments and today’s distributed computing environments.
In terms of data modeling, what worked for BIDW won’t fly in the distributed compute world. For one thing, network throughput wasn’t much of an issue with BIDW. But, it’s very important when compute resources and data stores reside on different machines strung together via a network.
In a distributed scenario, it makes sense to denormalize data because every join operation generates heavy network traffic, and it also pays to focus on query optimization.
It’s great to be able to work with large scale data in distributed systems. But this video explains why the more data you have, the more important it is to optimize data stores for queries at scale. Appropriate partitioning schemes can help although every scheme has inherent limitations.
Likewise, data layout can also substantially reduce seek times for your analytic systems. Hadoop and SQL-on-Hadoop will accommodate a variety of layout options beyond traditional row oriented. Columnar formats like Parquet are more suitable for denormalized data than the row-oriented formats used in relational databases. Parquet functions effectively regardless of the data processing framework, data model, or programming language.
Watch this video to learn why the more aggressively organizations pursue data-driven business strategies, the more individuals need access to data. And, hand-in-hand with access goes security. In a large organization with lots of data and lots of data sources, controlling who has access to what presents quite a challenge.
Allowing access to be controlled at the data source can cause problems with caching and memory management that degrade performance, especially in systems with many concurrent users. Bringing access authorization up to the BI layer can minimize performance issues while avoiding inadvertent security failures caused by users mixing data from multiple sources.
Watch this video to find out how cloud managed services can be used to query large amounts of data at low cost.
A good example is the use of Amazon EMR, S3, and Hive with Facebook’s Presto. Presto was designed for interactive analytics and approaches the speed of commercial data warehouses while scaling to the size of organizations like Facebook. Facebook uses Presto for interactive queries against several internal data stores, including a 300PB data warehouse. Airbnb and Dropbox also use Presto.
In this video, you’ll learn that there are several benefits to using cloud managed services. The Amazon EMR, S3, and Hive configuration with Presto enables clusters of varying sizes to be deployed rapidly with very little data duplication.
Watch this video to get a quick overview of how to get the most out of cloud managed services architecture with Presto. You’ll want to consider the granularity of your queries. Using partitions and WHERE clauses, you can manage the number of nodes in your cluster.
This video will briefly explain the factors that may determine your method of compression. Of course, compression is a common way to store more data at lower cost. But the right compression method will depend on your use case and how your data is structured. There are several considerations for using GZIP and Snappy with S3 and Presto.
Watch this video to learn the five reasons why a cloud architecture with Presto, S3 and Hive is the most cost efficient across industry use cases regardless of your data definition language.
In this video, you’ll see the steps required to set up a Presto/S3-based cloud architecture. This walkthrough shows how easy it is to configure and stand up a Presto cluster and start querying data.
Watch this video to learn when the cloud might not be the best solution for your company.
Many companies have personally identifiable information (PII) that they need to protect, especially in tightly regulated industries like financial services and healthcare. Companies may decide to house that information behind the corporate firewall. On the other hand, putting operational data in the cloud makes a lot of sense. The cloud can scale to support huge data volumes more quickly and cost effectively than any on-premise solution. With the cloud, it’s not all or nothing. It’s what makes the most sense for your business.
In this video, you’ll learn how early misgivings about cloud data security have gradually subsided.
With any new technology, there are risks. The cloud is no exception. Early on, people were concerned about the security of their data in the cloud. Were cloud providers seriously investing in data security? Because when you put data in the cloud, you’re giving up some control. But when the C2S contract was awarded to Amazon, it when a long way toward putting a stamp of approval on cloud security. After all if cloud security is good enough for the CIA, it should be good enough for businesses in the private sector.
In this video, we explore how the software procurement model has changed thanks to the cloud.
It used to be that software procurement was a long and involved process. when you were done, you had a software license. And, you were stuck with it for three to five years. The cloud has blown that model out of the water. Now someone from IT can test software for a couple of hours or days. They can “buy by the drink.” It’s a consumption-based not a procurement-based model. That means among other things, no long support contracts -- and the option to shift costs from CapEx to an OpEx. An important factor to consider when navigating a consumption model is who owns the data?
Watch this video to learn how the cloud has increased the number and variety of applications designed to solve business problems.
One of the things that many of these applications have in common is their dependence on the real-time availability of data via the cloud. In the past, when data resided in data warehouses, it was only available on a periodic basis. But with the scalability of the cloud, users can get data on demand. So real-time data analysis is a practical use case for big data analytics in areas such as the IoT, help desks, and mobile ad tracking.
Watch this video to learn how the evolution of the cloud is similar to other large-scale technology innovations of the past.
Generational or seismic shifts in technology happen periodically. For example, decades ago it was the switch from mainframes to client server computing. We’re in the midst of such a change now. The cloud is a response to solving large problems that require huge amounts of data. Too much data for organizations to house on premise. The on-premise model is simply too expensive to maintain. The cloud offers dramatic economies of scale, which allows businesses to shift many of their workloads to the cloud.
In this video, you’ll get an informative overview on the three major cloud providers: Amazon, Microsoft Azure, and Google Cloud Platform.
Amazon invented the cloud market. It has a huge head start in terms of growing a customer base, and they’ve exploited that by developing private label offerings like Redshift and EMR. Microsoft Azure has taken a different approach. It’s concentrating its efforts to gain cloud market share by focusing on the Fortune 500. Google Cloud Platform has put a big bet on Apache projects and moving them into the marketplace where companies of all sizes can use them.
If you’re in the market for a BI or analytics tool, know that you’ll have to make a choice between speed and scalability. As you review your prospective solutions, do your research on how well each can balance scalability and performance.
We’ll help you get started. Your decision may depend on what you’re trying to accomplish with your solution. If fast queries are your priority, then speed is the way to go. If going big is more important than going fast, scalability is what matters. There are also ways to achieve a healthy balance.
How, you ask? Find out now by reading our eBook. It gives you the lowdown on how to compensate for lack of speed or scalability and what to do if you want the best of both worlds.
In this video, you’ll find out how Zoomdata’s vision evolved as the cloud gained momentum.
The company started with a vision of changing the way people accessed, interacted with, and consumed data. And it’s always looked to the latest technologies to achieve that vision. That’s why, for example, it invested a lot in technologies like Impala, Spark, and Redshift. The company didn’t want to work inside a bubble but instead continually work the latest technologies into the its ecosystem. When the cloud really started to take off, it was a great opportunity for Zoomdata to deliver its capabilities to customers via the cloud.