Zoomdata Version

Testing with the Zoomdata Connector Shell

Overview

Zoomdata's connector shell is a read-eval-print loop (REPL) tool. The connector shell is delivered in the form of a bash script that runs an executable JAR file. You can use it to test your connector. Testing a connector is a multi-step process. This topic provides the information and steps that you will need to to use the connector shell to test your connector.

To test that a connector is correctly performing calculations and providing accurate results to Zoomdata, the testing process sends structured requests to the connector server and compares the connector’s results to those pre-calculated and known to be correct.

A working connector should pass both the meta and structured tests. Passing these tests would indicate that the connector responds to Thrift requests with correct results.

Prerequisites

To test with the connector shell you must:

  • Ensure that Java 8 is available on the path.
    Oracle runtime version 1.8.0_77 has been verified. Other versions of the runtime such as OpenJDK may work but have not been verified.
  • Download the connector shell. It is distributed as a zip file that you can unzip using your favorite utility.
  • Download the Connector Reference Testing Dataset (CRTD).

High-level Steps

Once you have fulfilled the prerequistes, you can move forward with the following high-level steps to test your connector with the connector shell.

  1. Load the Connector Reference Testing Dataset into your data store.
  2. Create a metadata collection.
  3. Start the connector and connector shell.
  4. Create a data source using the shell.
  5. Run the tests.
  6. Analyze your test results.

Loading the Connector Reference Testing Dataset Into Your Data Store

The structured test requires you to load the Connector Reference Testing Dataset (CRTD) into your data store. It is available in both CSV and JSON formats. Some data stores are capable of importing directly from a CSV or JSON file. Others require additional scripting to parse the contents and load the data using an alternative mechanism. Since every data store is different, you must refer to your data store's user instructions for steps to load the CRTD into a collection in your data store. When loading the CRTD, you need to select or verify the data type of each field. The structure of the CRTD is as follows. Note that the CRTD file has a header row.

You can learn to use the connector shell using the sample CrateDB connector provided with the SDK. To help do so, Zoomdata provides a Docker container that contains a CrateDB instance preloaded with the Connector Reference Testing Dataset. You can download it at https://github.com/Zoomdata/edc-cratedb/tree/master/test-server.

Carefully note the following:

  • Once loaded, the CRTD should not be modified. Ideally, it should reside in its own schema, namespace, or schema-like structure where it will not be accessed by users not involved in testing.
  • The testing tool assumes that all date- and time-related data are stored as UTC. Failure to store and serve dates as UTC may result in time-related tests failing.
  • Although the CRTD contains array data types, the connector shell does not currently support the complex types array, struct, or object. At present, these fields can be ignored and do not need to be loaded.
  • If your data store does not support one of the following field types, the connector shell will be unable to verify it.

Field name

Data Type

Notes

date_array

Array (timestamp)

Do not use

date_empty

Timestamp

date_full

Timestamp

Full date and time with time zone provided in the format Wed Jan 01 01:59:59 EET 2014

date_milliseconds

Long / BigInteger

Date in epoch/Unix time, given in milliseconds. Requires at least 14 digits of precision

date_milliseconds_array

Array (long)

Do not use

date_milliseconds_with_null

Long / BigInteger

Date in epoch/Unix time, given in milliseconds. Requires at least 14 digits of precision.

date_seconds

Long / BigInteger

Date as seconds in epoch/Unix time. Requires at least 12 digits of precision.

date_with_null

Timestamp

Full date and time with time zone provided in the format Wed Jan 01 01:59:59 EET 2014

date_year

Integer

Year stored as 4 digits

date_year_with_null

Integer

Year stored as 4 digits; some entries are NULL.

double_array

Array (double)

Do not use

double_empty

Double

A double or floating precision value supporting at least 10 significant digits

double_full

Double

A double or floating precision value supporting at least 10 significant digits

double_with_null

Double

A double or floating precision value supporting at least 10 significant digits

id

Integer

Not used by test calculations. Can be ignored or loaded and used to uniquely identify records

integer_array

Array (integer)

Do not use

integer_empty

Integer

integer_full

Integer

integer_with_null

Integer

nested_array

Array

Do not use

string_array

Array (object)

Do not use

string_empty

String / Varchar

If data store requires length, use 30

string_full

String / Varchar

If data store requires length, use 30

string_with_null

String / Varchar

If data store requires length, use 30

Back to top

Creating a Metadata Collection

The meta suite tests a collection in the data store to ensure that a connector is able to query all its fields. The composition of this collection will vary from data store to data store. The recommended best practice is to create a collection containing at least one field of each data type available to the data store. The meta test suite will verify that it can query all of the fields in the collection and map them to one or another of the Zoomdata connector data types:

  • DATE
  • INTEGER
  • DOUBLE
  • STRING
  • UNKNOWN

Unknown types are mapped to Zoomdata type UNKNOWN.

The collection should also contain any special cases specific to the data store that Zoomdata can reasonably expect to encounter when querying a source. Examples of special cases include indexes, partitions, hashes, and other data store field characteristics.

Meta tests ascertain that Zoomdata can describe a collection, but not that the collection can also be queried. You should manually insert at least one row of data into the collection, populating every field type, and test that Zoomdata can query it without error.

Back to top

STARTING THE CONNECTOR AND THE CONNECTOR SHELL

The first step is to start the connector server and the connector shell, and connect the shell to the server.

Starting the Connector Server

How you start your connector server depends entirely upon the language/technology that you used to build it. If you did not develop the connector that you are testing, consult the connector’s developers for more information.

Starting the Connector Shell and Connecting it to the Connector Server

Using the CrateDB sample as an example, a connector server running on localhost at port 7337, creating a connection would look like:

./connector-shell.sh -h localhost -p 7337

in which:

  • -h indicates the hostname of the connector server. You should replace localhost with the hostname that you are using. It may be localhost.
  • -p indicates the port. You should replace 7337 with the port that your connector server uses.

If the connector server is running on the Connector shell defaults ( localhost:8090 ), these arguments don’t need to be specified.

If the shell connects to the server successfully, you should see output that looks like

[email protected] connected.
Connector Client API Version: 2.3.0
Connector Server API Version: 2.1.7
WARNING: the version of the connected server is older than the client. Some features may not work or behave unpredictably.
connector-shell>

This shows that the connector shell is ready to interact with the connector server.

You can get help from the connector shell's command line by typing help.

The warning displayed indicates that the Thrift client API in the connector shell is newer than the Thrift server API of the connector. Depending on how outdated the Thrift server API is, some features may not be supported.

In general, a newer client will work with an older server. The connector shell is in alpha stage and Zoomdata does not guarantee backward compatibility.

The warning indicates that a subset of the functionality within the connector shell may lead to unexpected results such as unintuitive errors.

After you have connected the connector shell to the running connector, you can continue to create a data source.

Back to top

Creating a Data Source

Once you have connect the shell to the connector, you need to have at least one data store to connect to for testing. The connector shell needs a data source, of the kind used in Zoomdata, to manage its connection to your data store.

Every connector also requires a CONNECTOR_TYPE parameter, which specifies the connector type it should attempt to reach from the server. The connector type is defined by the developer of the connector and should be unique within the set of connectors used by your Zoomdata instance.

The required connection parameters vary by data store type. Using the CrateDB connector included with the Connector Development Kit as an example, the only required parameter in this case is a JDBC URL. Assuming an instance of CrateDB running in a local Docker container on port 4300, we create an appropriate data source with:

datasource add -n cratedb_test_source JDBC_URL jdbc:crate://localhost:4300 CONNECTOR_TYPE CRATEDB

in which

  • -n (optional) provides a name for the created data source, but is not required. If unspecified, the connector shell generates a name for the data source.
  • JDBC_URL (required for CrateDB data stores) specifies the JDBC URL for the data store.
  • CONNECTOR_TYPE (required) specifies the type of connector

Successful creation of the datasource should yield:

Datasource 0123456789 added.

in which 0123456789 is replaced with a unique ID for the data source.

Note that the connector shell will NOT test whether the source is valid and accessible upon creation. The developer should independently verify the connection information is correct or or use the shell command:

validate -ds 0123456789

in which you replace 0123456789 with the unique ID of your test data source.

The response should indicate success if the connector server can reach the source.

Back to top

Running the Tests

The Zoomdata connector shell includes the following test suites that provide different looks into the correctness of your connector.

  • smoke - brief syntax test that ensures that data read responses execute without error, although without validating their accuracy. The smoke test validates syntax and makes sure that data read responses execute without error. It does not test the accuracy of results, but it is a good first test because it is fast and easy.
  • meta - brief test that ensures that the connector can describe its data store, its schemas, collections, and fields, as well as its own abilities.
  • structured - long test that sends structured data queries for all features advertised for the connector and validates that the returned results match expected values. You can also test only particular features, each of which has its own set of individual tests. Finally, you can run or exclude an individual test.

The smoke and meta suites are easiest and catch the most common errors, so the best practice is to run them first. When your connector passes those tests, you should subject it to the complete structured test suite.

For information about running a test suite, see Running a Single Suite.

Additionally, you can

The connector shell allows you to treat nulls as zeroes and manage test output.

After you have run a test, you may want guidance in analyzing test results.

Back to top

Running a Single Suite

Use the test command to execute a test suite or a list of test suites. It needs at least the following pieces of information:

  • A data source to use for connection info
  • The test suite to run
  • The collection and, if applicable, the schema to run against.

You can provide this information as command line parameters or in a JSON file. To execute a list of test suites, you must use a JSON file to provide parameters.

Once a test command is executed, it cannot be stopped. While pressing Control + C stops stops the test command and most other commands in the shell, the underlying test suite will continue executing and printing output.

Using the Command Line Parameters to Execute a Suite

For example, to run a smoke test on the sample CrateDB connector included with the Connector Development Kit, we might use the command:

test -ds 0123456789 -u smoke -c test_collection -s test_schema

in which:

  • 01234567890 is your test data source's unique ID
  • smoke is the type of test that you want to run
  • test_collection is the name of the collection containing the test dataset
  • test_schema is the name of the schema containing the test collection (if applicable)

This example command above ells the connector shell to: run the test using the shell's data source cratedb with the smoke suite using the collection test_collection in the schema test_schema.​

The connector shell includes smoke, meta, and structured suites. For more information about each suite, see Running the Tests.

Using a JSON File as Parameters to Execute a Suite

You can specify a JSON file from which the shell should take parameters. To specify a JSON file, use the -f parameter of test on the command line.

test -ds 0123456789 -f /Users/home/test.json

in which:

  • 0123456789 is the ID of the data source containing the test dataset
  • the file /Users/home/test.json includes test parameters in the format outlined below. The path to the JSON file should be absolute and fully qualified.

The single-test parameters can be expressed as a single JSON object, included in a .json file, and passed to the test command using the -f parameter.

{
"tests": [{
"suite": "structured",
"schema": "test_schema",
"collection": "test_collection"
}]
}

in which:

  • tests is an array of one (or more) test objects.
  • suite is the type of suite to run: smoke, meta, or structured.
  • schema the schema that contains the collection, if applicable.
  • collection is the collection that contains the data for the test.

Using the format above, you can run a test without having to manually enter the same parameters repeatedly.

Back to top

Running Custom Test Configurations

When running the connector shell's test command, you can use the -f parameter to specify a JSON file from which to load test parameters. Using these test parmaters, you can run custom test configurations. Custom test configurations may include running multiple test suites, using a structured suite to test only particular features, or even running individual tests.

To run multiple test suites:

You can run multiple test suites together by passing a file with the test parameters expressed in JSON as described above. To add additional tests to the single test suite, add additional test objects to the tests array.

{
"tests": [{
"suite": "structured",
"schema": "test_schema",
"collection": "structured_test_collection"
}, {
"suite": "meta",
"schema": "test_schema",
"collection": "meta_test_collection"
}]

}

Back to top

Testing Specific Features

Many of the tests run by the test tool are based on which features a connector server advertises that it supports. However, when developing and testing a specific feature, it may be useful to only execute the tests for that feature using the -t option and a comma delimited list. For example:

test -ds 0123456789 -u structured -s integration_tests -c connector_test -t DISTINCT_COUNT,GROUP_BY_TIME

This will run the structured suite, but only the tests which are verify DISTINCT_COUNT and GROUP_BY_TIME features.

At present, you cannot test a specific feature while passing parameters using a JSON file.

Back to top

Running a Individual Tests

When you specify tests for inclusion, all other tests are excluded. The list of all individual tests executed against your connector varies based on the connector's registered features. When you run the structured test suite without including or excluding any individual tests, its output will identify all tests as they execute. For more information about supported features, see Connector Info Keys.

To run individual tests, include or exclude particular tests at the command line using the -i or -x switches, as shown below.

test -ds 0123456789 -u structured -s test_schema -c test_collection -i RawRequestSortByStringFieldDesc,DoubleTermsAggEmptyStringAndStringWithNull

If you use a JSON file to pass parameter to the test command, you can include or exclude particular tests by adding the includeTests array to the JSON file's object.

{
"includeTests:[
"RawRequestSortByStringFieldDesc",
"DoubleTermsAggEmptyStringAndStringWithNull"]
"tests": [{
"suite": "structured",
"schema": "test_schema",
"collection": "test_collection"
}]
}

You can also exclude individual tests using the excludeTests member.

Tests specified by includeTests and excludeTests only apply to the structured test suite. They do not affect the use of smoke and meta tests.

Back to top

Treating Nulls as Zeroes

You can set the test to treat null values as zeroes.

To treat null values as zeroes using the command line, add the -p switch with the parameter isNullEqualToZero=true, as shown below.

test -ds 01234567890 -u structured -s test_schema -c test_collection -p isNullEqualToZero=true

If you use a JSON file to pass parameter to the test command, you can treat nulls as zeroes by adding an options object as shown below.

{
"tests": [{
"suite": "structured",
"schema": "test_schema",
"collection": "test_collection" }],
"options": { "isNullEqualToZero": "true" }
}

Back to top

Managing Output

By default, the Zoomdata connector shell outputs test progress and final results to the console.

You can reduce console output to a brief form using the -b switch as shown below.

test -ds 0123456789 -u smoke -c connector_test -s integration_tests -b

You can also output test progress and results to XML and HTML files using the -o switch as shown below.

test -ds 0123456789 -u smoke -c connector_test -s integration_tests -o /tmp/connector-tests

In the example above, /tmp/connector-tests is the destination directory relative to the directory from which the connector shell is run. You can use your own directory for output. The output files conform to the JUnit specification.

Successive testing results output overwrites existing results files unless a new path is specified.

After you have run your tests, you can analyze your test results.

Back to top

Analyzing Test Results

Assuming that you have a valid connector server running with an valid data source from a running data store, the course of action for a test suite is as follows.

  1. Perform preliminary checks
  2. Execute test suites
  3. Report the results

The preliminary checks verify that the connector server and data source meet the minimum viable requirements for running tests. This verification includes activities such as validating the data source, listing advertised features, and describing the fields in the specified collections. If these preliminary checks fail, the test suite cannot be executed and testing stops.

Designating Test Output Destination

By default, test results output to the console in verbose form. You can specify brief form. You can also specify a file location for outputs. Results logged in a file are provided in the JUnit specification and are intended mainly for continuous integration testing, rather than for human use. For information about these forms, see Managing Output.

Outputted Results

First Line

The first line establishes the versions associated with each component of the testing framework. For example:

Testing datasource cratedb with Connector Shell Client version 2.3.0, Test Suite version 0.1

In the example above, we see which data source is being tested, the version of the client, and the version of the test suite. The version of the client may be incremented without changing anything about the test suite. This information can help identify testing discrepancies.

Preliminary Check - Feature Validation

Except while running in brief ( -b ), the connector shell reports the results of the preliminary checks. The preliminary checks verify that the connector serves a list of features and field metadata, printing this information along the way. The output will look something like the following.

(OK) Features Validation complete, 1 check(s) remaining
Detected 10 supported features for cratedb:
FEATURE.DISTINCT_COUNT,
FEATURE.FAST_DISTINCT_VALUES,
FEATURE.GROUP_BY_TIME,
...

You should eview the list of features reported by the testing tool to ensure that they correspond to the features registered by the connector server.

Preliminary Check - Fields Validation

The next preliminary check validates the metadata for each collection used in testing.

(OK) Source Fields Validation complete, 0 check(s) remaining
Retrieved 20 fields for cratedb from CollectionInfo(collection:test_collection, schema:schema_test): [string_with_null: STRING, date_year: INTEGER, date_full: DATE, double_with_null: DOUBLE, date_empty: DATE, date_with_null: DATE, date_seconds_empty: INTEGER, string_empty: STRING, date_milliseconds_with_null: INTEGER, string_full: STRING, integer_with_null: INTEGER, double_full: DOUBLE, date_year_with_null: INTEGER, integer_full: INTEGER, date_seconds_with_null: INTEGER, double_empty: DOUBLE, date_milliseconds: INTEGER, integer_empty: INTEGER, date_seconds: INTEGER, date_milliseconds_empty: INTEGER]

You should review the outputted list of fields from the collection to ensure that they correspond to the actual fields in the collection.

Test Suite Execution

After completing preliminary tests, the connector shell will begin to run the requested test suite and output results. In the default verbose mode, results look something like the following:

2016-10-07 12:51:21 INFO - Successfully completed test: {name=testServerInfo, method=SmokeTest.testServerInfo, arguments=[cratedb:integration_tests.connector_test]} in 2 milliseconds
2016-10-07 12:51:21 INFO - Starting test: {name=testValidateCollection, method=SmokeTest.testValidateCollection, arguments=[cratedb:integration_tests.connector_test]}

Final Summary

After executing the test suites requested, the connector shell provides summary results. Some results are not reported in brief ( -b ) mode.

After completing the run, the output will summarize the results in a few ways. First, it will specify the total number of tests run.

Completed run of 20 tests

Next, the connector shell prints each feature tested along with the percentage complete and the result of each individual test for that feature. For example:

Featured-specific test results:
(50.0%) DISTINCT_COUNT:
- testDistinctCount (OK)
- testDistinctCountNull (FAIL)

These results the developer to analyze the implementation of individual features. The connector is considered to support a feature if it has achieved a score of 100% without excluding any of the associated tests.

Each test in one of the following results.

  • OK means that the connector server passed the test.
  • FAIL means that the tested feature returned an error or incorrect result.
  • SKIP means that the test was not run. Tests are skipped if:
    • the connector does not meet the requirements to run the test
    • the connector's info keys do not advertise the feature as supported
    • the test plan submitted by the user excludes them explicitly.

    Skipped tests do not count against the percentage of tests completed for a given feature.
    The best practice is to exclude tests (so that they are skipped) only if the feature involved is not supported by the connector. It is an unsound practice to skip a test in order to get a passing score for the feature.

The shell lists any features that are advertised by the connector's info keys but are not tested. A list of untested features looks something like the following.

Connector features advertised but not tested by this plan:
- FAST_DISTINCT_VALUES
- LIVE_SOURCE
- REFRESHABLE
...

Typically, advertised features are left untested because either:

  • There are no tests available for the feature. This is often true when the feature info key merely informs Zoomdata to present a UI option but does not involve the connector executing any requests.
  • Because they were excluded by the specified test plan using the -t switch at the command line.

After the preceding feedback, a final count is presented.

Test Failures

Test failures do not interrupt execution of a test suite because generally each feature test is independent of the others. Tests fail for one of the following reasons.

  • There is an error during the execution of a test.
  • The request executes but returns incorrect results.

Errors

When a test fails due to an error, the connector server reports a server error instead of success. The underlying cause can vary but is most commonly that the data store produced an exception when performing the requested query. Often this result indicates a syntax error in the retrieval.

The best practice for troubleshooting errors is to turn on logging on the data store and to try to analyze the error that the data store returned.

Incorrect Results

Incorrect results only appear in the results of the structured test suite. The structured test suite both validates successful responses and also compares the results calculated by the data store to the results expected to be sent to Zoomdata.

A failure showing an incorrect result might look like the following.

java.lang.AssertionError: { module: unknown, source: SICK_CRATEDB(schema: integration_tests, collection: connector_test, params: null), error-message: " expected [[[null, New Mexico, -793.04533, 0, null], [null, Arkansas, 324.02415, 0, null], [null, Colorado, 245.66199, 0, null], [null, Florida, -251.98766, 0, null], [null, Louisiana, -899.64323, 0, null], [null, Maine, -759.52106, 0, null], [null, Maryland, -87.18804, 0, null], [null, Montana, -419.83826, 0, null], [null, Nebraska, 752.68503, 0, null], [null, North Carolina, 639.38712, 0, null], [null, North Dakota, -522.47887, 0, null], [null, South Carolina, 715.34538, 0, null], [null, Utah, 397.75294, 0, null], [null, Vermont, 342.87147, 0, null], [null, Virginia, 144.03915, 0, null], [null, Washington, 393.20948, 0, null], [null, Wyoming, -843.37912, 0, null], [null, null, -814.37425, 0, null]]] but found [[[null, North Carolina, 639.38712, 0, null], [null, null, -577.52925, 0, null], [null, Nebraska, 752.68503, 0, null], [null, Florida, -251.98766, 0, null], [null, Vermont, -670.9809, 0, null]]]" }

The error shows that Zoomdata expected an array containing certain values but found an array containing different values. You can use this information to diagnose the exact cause of inaccuracies and to adjust the query that the connector passes down to the data store.

The best practice is to verify that the reference dataset has been properly loaded into data store. The data store should contain exactly 1500 records and should be consistent with the CRTD files provided, whether the CSV or JSON file was used.

Back to top