In 1998, Merrill Lynch cited a rule of thumb that somewhere around 80-90 percent of all potentially usable business information may originate in unstructured form.
Traditional relational data stores are not efficient at storing and enabling access to unstructured and semi-structured data. Yet, there's tremendous business and economic value buried in the mountains of this type of data that organizations generate. In fact, no big data initiative can ignore unstructured data.
The trend towards including unstructured data and its analysis in modern business intelligence (BI) has fueled the growth of enterprise search engines such as Elasticsearch, Apache Solr and Cloudera Search.
Everyone is familiar with search. You use search to find, review and buy things on websites. Search lets us filter through large data sets of hotel rooms, airline flights or product reviews to get the best product and deal. The better the search algorithm, the faster and more accurate the search. And search engines use text analytics and machine learning to consistently improve results.
Of course, those consumer experiences create expectations in the workplace. Now business users and data analysts expect to search their enterprise data to gain quantitative and qualitative insights they can use to make data-driven decisions that improve operations and and customer engagement.
“Zoomdata's magic is that it is built for the unstructured Big Data world. Rather than stripping away rich data models in Hadoop, MongoDB or Cassandra, Zoomdata embraces them.
Matt Asay, ReadWrite*”
Using a search index technology like Apache Solr, Elasticsearch, or Cloudera Search is a great way to access unstructured and semi-structured data, as well as structured data in the enterprise. Search provides fast queries of data without having to anticipate specific access patterns and build traditional table schemas and indexes ahead of time.
Zoomdata lets you connect to and visualize data in a search index as easily as any other data source. But more than that, Zoomdata’s Smart Connectors use the native APIs of search providers like Apache Solr, Elasticsearch, and Cloudera Search to take advantage of the special functionality that they provide. Users enter full-text search terms and results comes back in milliseconds. Zoomdata leverages the built-in facets capability of search indexes for the fast filtering experience users expect.
Elasticsearch is a distributed, scalable, real-time search and analytics engine. The core of its intelligent search capabilities are based on Apache Lucene. For example, Wikipedia uses Elasticsearch to provide full-text search with highlighted search snippets, search-as-you-type, and did-you-mean suggestions. The Guardian newspaper uses Elasticsearch to combine visitor logs with social-network data to provide real-time feedback to its editors about the public’s response to new articles.
Apache Solr is a standalone search server with a REST-like API. Its major features include full-text search, hit highlighting, faceted search, real-time indexing, dynamic clustering, database integration, NoSQL features and rich document (e.g., Word, PDF) handling. Powered by Apache Lucene, Solr enables powerful matching capabilities including phrases, wildcards, joins, grouping and much more across any data type. Solr is highly reliable, scalable and fault tolerant, providing distributed indexing, replication and load-balanced querying, automated failover and recovery, centralized configuration, and more.
An integrated and supported part of Cloudera Enterprise, Cloudera Search (powered by Apache Solr) makes Apache Hadoop accessible to everyone via integrated full-text search.
Cloudera Search brings full-text, interactive search and scalable, flexible indexing to CDH and your enterprise data hub. Powered by Apache Hadoop and Apache Solr, Cloudera Search brings scale and reliability for a new generation of integrated, multi-workload search. Through its unique integration with CDH, Cloudera Search gains the same fault tolerance, scale, visibility, security, and flexibility provided to other data hub workloads.
Cloudera Search lets your entire business explore and analyze data quickly and easily for a variety of critical big data use cases all within a single platform, including: