Connector Properties

Starting from version 2.3,  Zoomdata's architecture was enhanced to enable the deployment of Zoomdata's data connectors as standalone components running in their own process space. Each of these connectors with their own, dedicated connector servers also have a corresponding property file, as listed below:

  • edc-aurora.properties
  • edc-bigquery.properties
  • edc-cloudera-search.properties
  • edc-drill.properties
  • edc-elasticsearch-1.7.properties
  • edc-elasticsearch-2.0.properties
  • edc-emr.properties
  • edc-impala.properties
  • edc-memsql.properties
  • edc-mssql.properties
  • edc-mysql.properties
  • edc-oracle.properties
  • edc-phoenix-4.4.properties
  • edc-phoenix-4.5.properties
  • edc-postgresql.properties
  • edc-presto-0.105.properties
  • edc-presto-0.132.properties
  • edc-redshift.properties
  • edc-rts.properties
  • edc-solr.properties
  • edc-sparksql.properties
  • edc-sqldb.properties
  • edc-teradata.properties
  • edc-tez.properties
  • edc-vertica.properties

These property files can be found in the following locations:

  • /etc/zoomdata - for example, /etc/zoomdata/edc-elasticsearch-1.7.properties
  • / your_install_directory /conf - for example, / your_directory /conf/edc-oracle.properties

The available variables and parameters are defined in the following tables:

  • Table 1 : common properties: identifies variables common to all the connector property files
  • Table 2 : unique properties for Apache Drill
  • Table 3 : unique properties for BigQuery
  • Table 4 : unique properties for Cloudera Impala
  • Table 5 : unique properties for Elasticsearch v1.7, v2.0

Table 1: Common Connector.properties Options

Property Name Default Value Possible Value(s) Mandatory? Description
app.name unique to each connector alphanumeric string yes Defines the name of the connector in Zoomdata environment.

Example: app.name=Drill
app.connection.type.name unique to each connector alphanumeric string yes Defines the name of the connection type and used for display in the Data Sources page.
server.port unique to each connector a positive integer yes Defines the port where the server is started; must be an available port
discovery.enabled true true, false no Defines whether the connector service requires Consul (a service registry app) to register in Zoomdata
Logging
log.file.base.name unique to each connector any valid file name yes The name of the log file for a specific connector.
log.zoomdata.level INFO TRACE, DEBUG, INFO, WARN, ERROR no Sets the logging level for Zoomdata classes.

Example: log.zoomdata.level=INFO
logs.dir $ your_install_dir /logs any valid directory path no Identifies the name of the directory for the log file.
sample.log.limit 10 a positive integer no Specifies the number of records to log.
Syslog
syslog.host localhost any valid host no Identifies the syslog server host.
syslog.port 514 any valid port no Identifies the syslog server port.
syslog.log.level OFF INFO (enabled)
OFF (disabled)
no Sets the syslog logging level.
syslog.suffix EDC alphanumeric string no Distinguishes the connector service from other types of services in the Zoomdata environment.
Properties for Connectors Using JDBC URL
datasource.min.idle 0 an integer Sets the minimum number of idle connections in the pool. The pool attempts to ensure that minIdle connections are available when the idle connection evictor runs. The value of this property has no effect unless datasource.eviction.time.sec has a positive value.
datasource.max.idle 5 an integer Sets the maximum number of connections that can remain idle in the pool. Excess idle connections are deleted on return to the pool.
datasource.max.active 100 an integer Sets the maximum total number of idle and borrow connections that can be active at the same time. Use a negative value setting no limits.
datasource.max.idle.time.sec 5 an integer The minimum amount of time a connection may sit idle in the pool before it is eligible for eviction by the idle connection evictor (if any). When this is a negative value, no connections will be evicted from the pool due to idle time alone.
datasource.max.wait.time.sec 20 an integer Sets the max amount of time (in seconds) the borrowObject() method should block before throwing an exception when the pool is exhausted and getBlockWhenExhausted() is true. When less than 0, the borrowObject() method may block indefinitely.
datasource.eviction.time.sec 1 an integer The number of seconds to sleep between runs of the idle connection evictor thread. When set to a negative value, no idle object evictor thread will be run.
jdbc.connection.timeout.sec 60 an integer Sets the max time (in seconds) that a driver will wait while attempting to connect to a database after the driver is identified. A zero value means there is no limit.
Not all JDBC drivers support this property.

Table 2: Connector.properties Options for Apache Drill

Property Name Default Value Possible Value(s) Mandatory? Description
Schema Scanner's Configuration
drill.hidden.workspaces cp.default
dfs.root
dfs.tmp
dfs.default
any string representing a Drill schema no Comma-separated list of schemas to help filter the Schema dropdown list.

Example: drill.hidden.workspaces=cp.default,dfs.root,dfs.tmp,dfs.default
drill.supported.file.extensions .parquet,.csv,.json,.tsv,.tbl,.avro,.seq,.csvh any string that is a valid file extension supported by Drill no Comma-separated list of supported file extensions.

Example: drill.supported.file.extensions=.parquet,.csv,.json,.tsv,.tbl,.avro,.seq,.csvh
drill.max.items.in.collections.list 100 a positive integer no Maximum number of collections (for example, a JSON file or a directory that can be queried) to display for a schema.
drill.max.dfs.scan.depth 3 a positive integer no Maximum depth of directories to scan for queryable items.
drill.dfs.scanner.threads number of processors a positive integer no Number of Java threads to use while scanning Drill file system for queryable items. Defaults to number of physical processors if not specified.
drill.dfs.scanner.include.pattern ^.* Any valid regexp no Regular expression (regexp) that is used to validate a queryable item.
drill.dfs.scanner.exclude.pattern ^\..* Any valid regexp no Regular expression (regexp) that is used to verify that a queryable item should not match. The default expression is ^\..* and it excludes files that start with a dot character.
Partition Detector's Configuration
drill.partitions.detector.threads number of processors a positive integer no Number of Java threads to use while scanning Drill file system for queryable items. Defaults to number of physical processors if not specified..
drill.physical.plan.parser.type json json
text
no Defines the format for the Drill's execution plan used for detecting partitioned fields.
  • json - the main format used by Drill for representing execution plan
  • text - used only as a fallback solution for cases when JSON analyser cannot parse physical plans from some new version of Drill
drill.physical.plan.cost.field.path JsonPath expression A valid JsonPath expression no Path to the field of a Drill's JSON  execution plan representing cost of a query execution. Valid only when drill.physical.plan.parser.type=json.
Session Level Settings For Apache Drill Server
drill.session.config. option_name = value option_name
value
valid pair of Drill's system option/value no Planning and execution options for Apache Drill server that should be set after opening new JDBC connection with 'alter session set option_name=value' SQL query.
  • option_name - the option name as it appears in Drill's systems table
  • value - a value of the type listed in the sys.options table: number, string, boolean, or float. Use the appropriate value type for each option that you set

Table 3: Connector.properties Options for BigQuery

Property Name Default Value Possible Value(s) Mandatory? Description
bigquery.public.project.ids Google Public Project IDs valid Google Project IDs no Sets a comma-separated list of IDs for Google Public Projects.

Table 4: Connector.properties Options for Cloudera Impala

Property Name Default Value Possible Value(s) Mandatory? Description
impala.connection.retry.timeout.min 30 a positive integer no Sets the amount of time (in minutes) to wait before accessing an Impala node which failed to respond on the previous access. Works only when several JDBC URLs are specified in a comma-separated list.
Kerberos Service Account Authentication Properties
kerberos.krb5.conf.location /etc/krb5.conf any valid path no Identifies the full path to the krb5.conf file.
kerberos.service.account.authentication false true
false
no Enables Kerberos configuration for cases when connector should access an Impala server on behalf of a particular Service Account.
kerberos.service.account.principal a string a string yes, if kerberos.service.account.authentication=true Identifies the Kerberos service principal for this Impala connector.
kerberos.service.account.keytab.location a path any valid path yes, if kerberos.service.account.authentication=true Sets the full path to the keytab for the specified service principal.

Table 5: Connector.properties Options for Elasticsearch v1.7, v2.0

Property Name Default Value Possible Value(s) Mandatory? Description
elasticsearch.query.cardinality.precision.threshold 1000 a number from 0 to 40000 no Defines a precision threshold for distinct count queries.

Example: elasticsearch.query.cardinality.precision.threshold=1000
elasticsearch.query.limit.nongrouped 10000 a positive integer no Identifies the number of search hits to return for non-grouped queries.
elasticsearch.query.limit.grouped 100000 a positive integer no Indicates how many buckets should be returned for a terms aggregation.
elasticsearch.transport.client.settings.threadpool.index.type cached cached
fixed
no Defines the types of thread pool that will be used by an ES node.
  • cached - an unbounded thread pool that will spawn a thread if there are pending requests
  • fixed - a thread pool that holds a fixed size of threads to handle the requests with a queue (optionally bounded) for pending requests that have no threads to service them
elasticsearch.transport.client.settings.client.transport.ping_timeout 10s a positive integer no Sets the time to wait for a ping response from a node.