Zoomdata Version

Using the Zoomdata Scheduler

Overview

The Zoomdata Scheduler is a component within the Server used to run jobs that will refresh Zoomdata’s connection to the data source in asynchronous mode. Zoomdata scheduler is integrated with the data connectors and supports the following types of jobs:

  1. Refreshing the data sources that are connected to Zoomdata (in other words, refreshing metadata and clearing the cache)
  2. Refreshing the specific fields for the data source.

The Zoomdata Administrator and users with admin privileges can access the Scheduler, which is available from the Refresh page for specific data source (Figure 1).

Figure 1

The Refresh page is not available for the live, streaming data sources (such as Twitter and Kinesis ).

Administrators can view the status of scheduled jobs in the Zoomdata Console , which is available from the Settings menu (as shown in Figure 2).

Figure 2

The following topics are covered in this article:

How the Zoomdata Scheduler Works

When you initially connect Zoomdata to your data source the following activities are automatically run:

  • An asynchronous job is kicked off to calculate the min/max values of all refreshable fields. For the attributes fields, the distinct values will be refreshed.
  • A sampling of the dataset is executed to determine distinct values for all field types set to Attribute and min/max values for all field types set to ‘Number’, ‘Integer’ and ‘Money’ (used by a chart's Filter controls).
To refresh the lookup and min/max values for your fields you can set the fields as refreshable .
You can also refresh them manually by clicking Refresh near the corresponding field on the data source's Fields page. This action kicks off an asynchronous job to determine the distinct values and min/max range based on the entire dataset rather than the sample.

Besides these initial activities, administrators can set the Scheduler to perform jobs related to the data connectors. Table 1 identifies the jobs that are supported currently, the triggers for these jobs, and the activities that occur when the job is run.

Future integrations of the Scheduler with other Zoomdata features are being researched and evaluated.
Job Type Triggers Activities
Data Source Refresh
  1. After initial connection to the data source is saved
  2. After edits to the Table or Fields for the data source is saved
  3. On a set schedule (set in the Refresh page)
  • Both the SparkIt and Zoomdata caches are cleared
  • For SparkIt-enabled sources, the SparkIt cache is reloaded
  • The min/max for non-attribute fields and distinct values for attribute fields are refreshed
Data Source Fields Refresh By manual selection of the Refresh button on the Fields page
  • The min/max for non-attribute fields and distinct values for attribute fields are refreshed

Table 1

Setting up Data Source Refresh Job

By default, when you are in the process of connecting your data source to Zoomdata, the Refresh page is set to the No Schedule option (as shown in Figure 3). This means that the Scheduler will run an initial data source refresh job after the source has been successfully created and saved.

Figure 3

Configuring the 'Periodically' Option

To enable the Scheduler to run at predetermined points in time, perform the following steps (Figure 4):

Figure 4

  1. Select the Periodically option.
  2. Set the Start on date and time.
    Zoomdata uses the UTC time zone.
  3. Select the time interval for the job to be run from the Runs list (which includes monthly, weekly, daily, or hourly). Depending on the option that you select in this list, corresponding  options will be available in the Run every section:
  • Monthly - specify the time interval (months) for the job to be run. The job will run as described below.
    The job will run every M months starting January (included),  where M is the value specified in the Run every field.
    For example, your job starts on March 10, 2016 and is scheduled to run every 3 months . Therefore, the job will run every third month at the specified time (that is, April, July, and October, the following January , etc).

Figure 5

  • Weekly - select the days of the week for the job to be run.

Figure 6

  • Daily - specify the time interval (in days from 1 to 31) for the job to be run.
    The job will run every D days from the first day of the month (inclusive),  where D is the value specified in the Run every field. The  first job runs on the date and time you specified in the Start on field. For example, you set the job to start on March 10, 2016 at 5:00 AM and to run every five days. The next job runs on March 11 at 5:00 AM and subsequent jobs will run every fifth day at the specified time until the end of the month.

Figure 7

  • Hourly - specify the time interval in hours (1-23) and minutes (1-59) for the job to be run.
    You can set the specific hour and minute for the initial job to run (in the Start on field). Then set the time interval for jobs to be run down to the hourly and minutes granularity (in the Run every field). For example, you can set your job to start on March 10, 2016 at 5:00 AM and to run every 3 hours and 20 minutes. The next job run will be at 8:20 AM and so on.

Figure 8

  1. Your configuration summary is displayed in the Summary section.

Figure 9

Configuring the Advanced Option

For more complicated update schedules, use the Advanced option to set Cron expressions (as shown in Figure 10).


Figure 10

A Cron expression sets a schedule using a string of six fields and separated by a blank space. The format for a Cron expression is:

[seconds]  [minutes]  [hours]  [days of the month]  [months]  [days of the week]

The standard values that are supported by each field (and with Zoomdata’s Scheduler) include:

Field Allowed Values Additional Characters
Seconds 0-59 , - * /
Minutes 0-59 , - * /
Hours 0-23 , - * /
Day of the month 1-31 , - * / ? L W
Month 1-12 or Jan-Dec , - * /
Day of the week 1-7 or Sun-Sat , - * / ? L W #

When creating a Cron expression, keep the following requirements in mind:

  • Either ‘Day of the month’ or ‘Day of the week’ is needed, but not both; insert a question mark (?) as a placeholder for the one not specified.
  • Names of the ‘Month’ and ‘Day of the week’ are not case sensitive; for example, ‘FRI’ and ‘fri’ are both acceptable formats.

Special Characters

Special Characters What It Means
*

All values. Represents all the values within the specified field. For example, when used in the minute field, a job will run every minute.

0  *  0  0  0  0

?

No specific value. Used as a placeholder when no value is needed in the field. For example, if specifying a ‘Month’ value you would enter ‘?’ for the ‘Day of the week’ field.

0  0  0  0  6  ?

-

Range. Enter a time range for the field using this symbol. For example, 3-6 in the ‘Hours’ field means a job will run at 3:00, 4:00, 5:00 and 6:00 am.

0  0  3-6  0  0  0

,

Comma. When a series of information is needed, use the comma to identify all the values for the field. For example, Wed, Thur, Fri in the ‘Day of the week’ field means a job is run on Wednesdays, Thursdays and Fridays.

0  0  0  0  0  Wed,Thur,Fri

/

Forward slash. Specifies the starting time value and the incremental increase of time. For example, 0/5 in the minutes field means a starting point of 0 and running a job every 5 minutes.

0  0/5  0  0  0  0

L

Last. Used in two fields only - ‘Day of the month’ and ‘Day of the week’.

  • When used in the ‘Day of the month’ field, L means the last day of the month
  • When used in the ‘Day of the week’ field, ‘L’ by itself means Saturday; but when used in conjunction with a value, this field identifies the day from the last day of the month. For example, 5L means the last Friday of the month.

0  0  0  5L  0  0

W

Weekday. Used in two fields only - ‘Day of the month’ and ‘Day of the week’.

Identifies the weekday closest to the given day. For example, 15W means the closest weekday to the 15th of the month. The following results are possible:

  • If the 15th falls on a Saturday, then the result returned would be Friday the 14th
  • If the 15th falls on a Sunday, then the result is Monday the 16th
  • If the 15th falls on a weekday, that specific day is returned
#

Number sign. Used only with the ‘Day of the week; identifies the specific day of the month. For example, both Wed#2 and 3#2 identifies the second Wednesday of the month.

0  0  0  0  0  Wed#2

Examples of Cron Expressions

Cron Expression Meaning
0  0  12  *  *  ? Noon every day
0  30  20  ?  *  * 8:30pm every night
0  0/10  17  *  *  ? Every 10 minutes starting at 5pm and ending at 5:50pm, every day
0  15-30  20  *  *  ? Every minute starting at 8:15pm and ending at 8:30pm, every day
0  45  20  ?  *  Mon,Wed,Fri 8:45pm every Monday, Wednesday and Friday
0  0  20  3/3  *  ? 8pm every 3 days in every month, starting on the third day of the month

Defining Fields to be Refreshed

You can select the fields from your data source to be refreshed on the Configuration page.

Figure 11

All the fields from your data source are listed in the Refresh Fields Metadata section. By default, only the fields of type Time are selected. If you want to refresh all the fields from your data source, click Select All . Otherwise, select the checkboxes for the specific fields in the Refreshable column.

Refreshing the Data Source

To update the entire list of fields from the data source, access the Fields page and click Refresh Fields (Figure 12). This option differs from the functionality in the Refresh page because this is focused on a manual refresh of the fields contained in the data source.

Figure 12

You can also refresh specific fields from your data source. Click the Refresh button from the Statistics column for the field. The job immediately begins and the status shows in that cell.

The Zoomdata Console

Administrators can monitor jobs using the Console (which is located in the Settings menu, as shown in Figure 12). The Console automatically refreshes every 15 seconds.

Figure 13

The jobs (that is, the Job Names) are identified in the Console by the Source Name (as shown in Figure 13).


Figure 14

If you have scheduled many jobs, you can quickly filter by a specific job status: Upcoming , In Progress , or Finished . To return to the comprehensive list of all jobs, click Clear .

Figure 15

The Console provides the following details for jobs:

  • Data Source -  the name of the data source, for which the job has been created
  • Status - the status of the job
  • Last Finished - date and time of the most recent executed job
  • Next Run - the next scheduled run for the job
  • Job History - opens new pop-up window showing all jobs that have been run for the data source

You can also sort the Jobs table by the following column headers:

  • Data Source
  • Job Type
  • Status
  • Last Finished

However, keep in mind that the sorting automatically resets to the default state every time the table is refreshed.

The Source Refresh window provides an historical view of the jobs that have been run for the selected data source. The information provided includes: job start time, job finish time, and the job execution status (as shown in Figure 14). Use the quick filters to view the jobs in the In Progress or Finished status.

Figure 16

For the Status column, three conditions are used to identify the status of the most currently run job:

  • COMPLETE: The job was successfully completed
  • INCOMPLETE: The job has only been partially completed
    For example, the min/max values were successfully refreshed, but the distinct values were not refreshed.
  • FAILED: The job could not run or could not be completed due to some error in the system
    For example, Zoomdata may be experiencing connection issues with the data source. Click the arrow to view the details on the issues that occurred while executing the job.