Implementing Data Source Creation
When a user requests to create a data source using a connection to a data store, the connector must be able to complete the following tasks.
- Validate the data store
- Describe data store and connector features
- Describing Schemas, Collections, and Fields
When an user creates a data source using a connection to a data store, Zoomdata sends a
to the connector server to ensure that the connection is valid before allowing the user to proceed. The request contains
RequestInfo, which contains
contains parameters that the user has provided for connecting to the data store. The connector server uses these parameters to connect to the data store and validate the connection.
In our example MyDB data store with a driver and simple security requiring only a username and password, the steps might be as follows.
- Create a connection to the MyDB data store using the connection string, username, and password provided.
Run a simple query or command to validate the connection. This test should not be run against a specific schema or collection. For example, a simple validation SQL query might be
select count(1). Note that there is no specific collection or schema information in the query. The query should work on any valid data store for that connector.
If there is a particular permission required to perform some function for Zoomdata, such as performing aggregations, the validation query or command should also test those permissions.For example, many HDFS stores may allow a user to connect but not to run MapReduce or Tez jobs, which are required to perform aggregations for Zoomdata. In a Zoomdata Hive on Tez connector, the validation query must also trigger a Tez job to ensure that the connecting user has the appropriate permissions.
Return a success message if the execution was successful without problems.
In response to the
ValidateSourceRequest, the connector server sends a
ValidateSourceResponse, which should only be successiful if the user has permissions to execute queries in the provided data store.
After validating a source, the next preliminary step a connector must undertake is to describe itself and its data store to Zoomata. The description is provided by responding to a
ServerInfoRequest. The response,
ServerInfoResponse, is a series of string key/value pairs that indicate how your connector server communicates, the features that it supports, and any specific limitations it may have. This list of keys should be constructed by the connector server based on the
list of connector info keys
included with this guide. In many cases the key values are static and pre-determined, so the creating the list may not need any communication with the data store. Some keys are required. The
connector provided with the SDK includes such a hard-coded set of keys.
Our hypothetical MyDB connector is a brand new connector to a simple data store, so it only supports a few capabilities and might return a list of the following features.
- REQUEST.SEND_METADATA (required by Zoomdata)
- REQUEST.TYPE (required by Zoomdata)
In future versions of our hypothetical MyDB connector, we may implement more of the features of the MyDB data store.
After the connector describes its features and limitations to Zoomdata, Zoomdata will expect the connector to identify its schemas, collections, and fields included in its metadata. Zoomdata will request this information using a series of calls:
MetaDescribeRequest. Your connector must respond with corresponding
MetaSchemasResponsedetails the schemas or schema-like objects that the data store uses to group collections, if it does so.
If the data store does not use schemas or schema-like objects such as catalogs or namespaces,
FEATURE.SUPPORT_SCHEMAshould be set to false in the ServerInfoResponse that the connector sends to Zoomdata. In this case, the connector should return an error to any MetaSchemasRequest that it receives.
- The connector is responsible for excluding from its response any schemas to which the querying user should not have access, including any system schemas such as system metadata not normally intended for users.
- If the data store does not use schemas or schema-like objects such as catalogs or namespaces,
MetaCollectionsResponsedetails the collections or collection-like objects that the data store uses to group data.
- Collections should be returned as a list of strings representing the collection names.
- The connector should remove any schema prefix attached to collection names before sending the list of collection names to Zoomdata.
- This response provides a list of collections to Zoomdata for use during source creation. If the connector supports schemas, the MetaCollectionsRequest will include the name of the schema to be queried. Only collections found within that schema should be returned in the response.
- The connector is responsible for excluding from its response any collections to which the querying user should not have access.
MetaDescribeResponsedetails a list of fields with their associated metadata for a given collection. If the data store uses schemas or schema-like objects, that will be provided in the request as well.
- The connector is responsible for retrieving the list of fields, mapping them to Zoomdata Thrift field types [XYZ: link Zoomdata types], and setting their metadata with any additional applicable information. See the full metadata reference [XYZ: write and link] for more details.
- It is the responsibility of the connector to assess how fields map to Zoomdata Thrift types and what additional flags may be added to their parameters.
Note the following about the example:
Field name MyDB field type Gets Mapped to Zoomdata type Notes key_field long integer Indexed salary float double last_name varchar(200) string date_hired_indexed timestamp date Considered indexed, should be marked PLAYABLE date_hired_unix_time bigint integer complex_object mydb_object unknown Unknown types are designated as unknown and treated as RAW_DATA_ONLY.
- Most of the types map directly to Zoomdata’s Thrift types
mydb_objectis a user defined type that Zoomdata cannot use, so it is mapped as unknown. Fields of unknown type are listed by Zoomdata as
RAW_DATA_ONLY, meaning they can’t be queried, filtered, grouped, etc.
There are two indexed fields. One is a MyDB primary key. The other is an indexed timestamp. Since MyDB indexes are considered fast and can be quickly filtered, the field should have the
PLAYABLEflag added to its parameters to indicate the field can be used to enable playback.
After you have implemented functionality for adding the connector and creating a source on it, you must implement functionality to respond to requests for data from the data store. For more information about responding to requests for data, see Responding to Requests for Data.