Zoomdata Version

Connecting to Amazon S3

Amazon Simple Storage Service (S3) provides a “web service interface that can be used to store and retrieve any amount of data, at any time, from anywhere on the web [1].” Zoomdata connects to S3 sources using the Apache Spark processing framework.
[1]: Excerpted from AWS Documentation “ What is Amazon S3?

As a result, Amazon's S3 source utilizes Zoomdata's embedded Spark server. For information, access the article Configuring an Embedded Spark Server in Zoomdata .

To learn more about the Spark functionality and how it is utilized and enabled in Zoomdata, see How Zoomdata Uses Apache Spark .

CONFIGURING THE S3 CONNECTOR

After setting up Spark, follow the steps below to connect Zoomdata to your Amazon S3 source:

  1. Click the Sources menu item.

Figure 1

  1. Click the S3 connector icon.
  2. Specify the name of your source and add a description (if desired).

Figure 2

  1. Click Next .

  2. Specify the path to file. This is the path to remote file that you want to be uploaded into Zoomdata.
    (you can use this publicly available dataset:
    s3n://AKIAI535P5R2QX7NYAQQ:[email protected]/consolidated_olympic_events.csv)

Figure 3

  1. Select the Read Headers checkbox if you want to use the first row of your data source as column names.

  2. Specify the Value Separator that is in your data source. Standard separators include commas (,) and semi-colons (;).

  3. Toggle the caching setting (by default caching is enabled).

  4. Click the Preview button.  You will see a preview of the data file.

Figure 4

  1. Click Next . On the Fields page you can create unique label names for the available fields in your data source. These labels will be displayed in the charts.

When you click Next , it may take some time to load the dataset into memory depending on its size (from several minutes to possibly over half an hour).

Figure 5

If you have changed the fields in your data set, click Refresh Fields to synchronize.
  1. If necessary, change the Type and Default options, select the checkboxes in the Distinct Count column. Configure Filter Display settings for the required fields.
  2. Click Next to continue.
  3. On the Refresh page, you can schedule asynchronous jobs to refresh fields in your data source. Refer to Using the Zoomdata Scheduler article for more information.
  4. On the Charts page you can enable the charts that will be available for the data source and edit the settings for your charts. That is, select the styles that will be available for the data source, change the global default settings, and more. Learn more about how to customize a chart . Click Finish to save your changes.


Figure 6