Everyone is familiar with the search paradigm and search applications. You use search to locate information via search engines. You use search to buy things on websites. Search lets us filter through large data sets of hotel rooms, airline flights, or product reviews to get the best deal. And, of course, those consumer experiences create expectations in the workplace. Now business users and data analysts expect to search their enterprise data too, particularly big data.
Using a search index technology like Apache Solr, Elasticsearch, or Cloudera Search is a great way to enable access to unstructured data in the enterprise, much of which lives in modern data stores like Hadoop. A search index can deliver fast access to unstructured or semi-structured information in text documents like blog posts and comments and customer product reviews as well as machine logs, and JSON snippets. But a search index can also be very effective with structured data. Search provides fast queries of big data without having to anticipate specific access patterns and build traditional indexes ahead of time.
Apache Solr is an open source enterprise search platform written in Java. It operates as a standalone enterprise search server with a REST-like API. Powered by Lucene™, Solr enables powerful matching capabilities including phrases, wildcards, joins, grouping and much more across any data type. Solr is highly reliable, scalable and fault tolerant, providing distributed indexing, replication and load-balanced querying, automated failover and recovery, centralized configuration, dynamic clustering, and more. Solr is widely used for enterprise search business intelligence, and big data analytics use cases and has an active development community and regular releases. Solr's external configuration allows it to be tailored to many types of applications without Java coding, and it has a plugin architecture to support advanced customization.
Elasticsearch is a distributed, scalable, near-real-time search and analytics engine that supports multi-tenancy. The core of its intelligent search capabilities comes from Lucene.
Elasticsearch indices can be divided into shards and each shard can have zero or more replicas. Each node hosts one or multiple shards, and acts as a coordinator to delegate operations to the correct shard(s). Rebalancing and routing are done automatically. Related data is often stored in the same index, which consists of one or more primary shards, and zero or more replica shards. Once an index has been created, the number of primary shards cannot be changed.
Wikipedia uses Elasticsearch to provide full-text search with highlighted search snippets, search-as-you-type, and did-you-mean suggestions. The Guardian uses Elasticsearch to combine visitor logs with social-network data to provide real-time feedback to its editors about the public’s response to new articles. Elasticsearch uses Lucene and tries to make all its features available through the JSON and Java API.
Cloudera Search brings full-text, interactive search and scalable, flexible indexing to CDH (Cloudera's distribution including Hadoop) and your enterprise data hub. Powered by Apache Hadoop and Apache Solr, the enterprise standard for open-source search, Cloudera Search brings scale and reliability to a new generation of integrated, multi-workload search. Through its unique integrations with CDH, Cloudera Search gains the same fault tolerance, scale, visibility, security, and flexibility provided to other enterprise data hub workloads. Cloudera Search provides:
Interactive full-text search and faceted navigation lets users query, explore and analyze data in real-time to find what’s relevant and gain new insights.
Solr supports batch, on-demand, and real-time indexing (and reindexing) of data of any type so more users can get faster value from data.
A proven, rich API opens up big data to users for the fastest time-to-insight and an active, mature community means constant innovation with the enterprise in mind.
Out-of-the-box, traditional BI tools generally do not offer any support for enterprise search engines. Zoomdata lets you connect to and visualize data in a search index as easily as any other data source. But more than that, Zoomdata’s Smart Connectors use the native APIs of search providers like Apache Solr, Elasticsearch, and Cloudera Search to take advantage of the special analytic functionality search provides. When users enter a full-text search query, results comes back in milliseconds.
Zoomdata leverages the built-in facets capability of search indices for the fast filtering experience users expect. As new search index partitions are created, Zoomdata will automatically recognize and query them. For example, people partition their log data across indices by date/time and Zoomdata can automatically recognize and query across them.
"Zoomdata's magic is that it is built for the unstructured Big Data world. Rather than stripping away rich data models in Hadoop, MongoDB or Cassandra, Zoomdata embraces them." Matt Asay, ReadWrite*
Zoomdata makes it easy for users to ask open questions such as which hotels have the highest incidents of food poisoning occurrences based on hotel reviews, or what are the characteristics of people who have the most positive sentiment about a particular product?
Use Zoomdata to visualize search index data from Apache Solr, Elasticsearch, or Cloudera Search. Learn more today!