Zoomdata Version

Connecting to Amazon S3

Amazon Simple Storage Service (S3) provides a “web service interface that can be used to store and retrieve any amount of data, at any time, from anywhere on the web [1].” Zoomdata connects to S3 sources using the Apache Spark processing framework.
[1]: Excerpted from AWS Documentation “ What is Amazon S3?

As a result, Amazon's S3 source utilizes Zoomdata's embedded Spark server. For information, access the article Configuring an Embedded Spark Server in Zoomdata .

To learn more about the Spark functionality and how it is utilized and enabled in Zoomdata, refer to the article How Zoomdata Uses Apache Spark .

Prerequisites

The table below lists information on the features that are supported by Amazon S3:

Supports Distinct Count? Yes
Supports Live Mode/ Playback? No
SparkIt Capable? Yes
Supports Group-by Time? Yes
Supports Multi Group-by Charts? Yes
Supports Histogram? Yes
Supports Box Plot? No
Custom SQL Capable? No
Supports Last Value? No
Supports Partition? No

CONFIGURING THE CONNECTION

For details about what is provided on each page of the connection process, review the article Source Connection Workflow . Depending on your needs, you can either follow the steps in order from start to finish or jump to a specific section in the connection process:

Start

After setting up Spark, follow the steps below to connect Zoomdata to your Amazon S3 source:

  1. Click the Sources menu item.

    Figure 1
  2. Click the connector icon.

General Page

  1. Specify the name of your source and add a description (if desired).


Figure 2

  1. Click Next to continue to the next setup page.

File Path Page

This page defines the connection source for Zoomdata to be able to access the data source. Perform the steps below.

  1. From the Remote File Settings (Spark It) list, select the number of entries to be displayed in the file preview.

  2. Specify the path to file. This is the path to a remote file that you want to be uploaded into Zoomdata.
    (you can use this publicly available dataset:
    s3n://AKIAI535P5R2QX7NYAQQ:[email protected]/consolidated_olympic_events.csv)


Figure 3

  1. Select the Read Headers checkbox if you want to use the first row of your data source as column names.

  2. Specify the Value Separator that is in your data source. Standard separators include commas (,) and semi-colons (;).

  3. Toggle the caching setting (by default caching is enabled).

  4. Click Preview to see a preview of the data file.

    Figure 4

  5. Click Next .

Fields Page

The Fields page lets you (1) configure attribute options, (2) create custom labels for the fields in your data source (that will be displayed in the charts), (3) manage the Volume metric, and (4) work with Calculations.

  1. Determine whether the field should be visible or not to the user.
  2. Create unique label names, as needed, for each Label field.
When you create a data source, the specific number of distinct values for the attribute fields are saved in Zoomdata depending on the data sample from your data set. You can filter the data on your chart by these values. While editing a data source, if you want to use all distinct values in the filter (that is from whole data source), click the Refresh button in the Statistics column.
  1. For the Type column, you have the option to edit the field type (although usually you won't need to do this).
  2. For the Configure column, numeric and time-based fields may be edited:
    • Numeric types including Money, Number and Integer - ability to select a default aggregation function
    • Time fields - ability to define the default time pattern and granularity; if the time field provides granularities of hour, minute and second, then a time zone label may be applied
  3. Select fields for Distinct Counts as needed.
  4. Refresh the connection to a particular field, as desired.
  5. Configure Filter Display settings for fields.
  6. Edit the Volume Metric settings, as needed.
  7. Work with Calculations, if available and as needed.
    If you are setting up a new connection, the Calculations section will not be available until after the connection is saved.
  8. Click Next to continue.
When you click Next , it may take some time to load the dataset into memory depending on its size (from several minutes to possibly over half an hour).


Figure 5

Refresh Page

The Refresh page lets you schedule asynchronous jobs to update the source metadata. For guidance to set up a refresh schedule, refer to the article Using the Zoomdata Scheduler .

Charts Page

On the Charts page, you can:

  1. Edit Global Default Settings .
  2. Select the Standard and, if available, Custom chart styles to be used with the data source.
  3. Set default parameters (group, sub-group, colors, sorting, and so on) for each chart style.


Figure 6

Learn more about how to customize a chart .

Click Finish to save your changes. Once your data connection has been established, it will be listed under the My Data Sources section of the page.