Welcome to the Visual Flow ETL Demo!

P.S.Click through the navigation menu to immerse into your own user experience.
Introduction to Visual Flow
Demo Task ETL Process
Extract, Join & Cache Data
Load to Elasticsearch, AWS S3, Local File
More Transformation / Load to DB2
Job Execution

Once a project is selected, you can go to its overview page which shows the number of jobs / pipelines and their status.

On the left menu, go to Settings → Parameters to view the parameters created at the project level. These are some connection parameters used in our demo job.

On the Jobs tab there is a list of existing jobs within a project with a search bar, sorting and filtering options.

With the Palette button clicked, go to Job Designer. Switch from Palette to the Params tab to see the job parameters.

Getting started
Project Parameters
Jobs Overview
Job Designer

We are using two tables from MSSQL database.
The first one is Event, and it stores the events data like Event ID, Venue ID, Event Category, Event Name and Start time

The second one is Venue and it contains data about Venue (Venue ID, Venue Name, Venue City, Venue State, Venue Seats Quantity)

We use  the following stages in the job:
* Read * Join * Cache * GroupBy * Transformer * Sort * Filter * Write 

Table Event
Table Venue
Transformation and Load
Stages used in the job

For Read stages, MSSQL storage is used as we read the data from 2 MSSQL tables Event and Venue.
For such MSSQL parameters as JDBC URL, User, Password, Schema, and Table, we usE parameters created earlier via the Settings menu.

Two Read stages are connected to the Join stage as we join Event and Venue tables.
In the Join stage, we use the type Inner join, and the key for joining is venueid.

Cache stage is used here for keeping intermediate data (in our case it is the result of join) in memory before it is loaded to multiple destinations.

Read Stages
Join Stage
Cache Stage

To record Event and Venue joined data to Elasticsearch, we selected Elasticsearch storage in the Write stage configuration.
Its parameters like nodes, port, user, and password are also pre-created via the Settings → Parameters menu option.

To load the data to AWS S3 we partition it by venue state using Partition By option.
So the result dataset on AWS S3 is split into multiple files per each venue state.

We filter the data to the specific time frame we are interested in. This data is to be loaded to the local parquet file.

Now we load the filtered data to the local file of Parquet format.
Also, we partition the data by eventname.

Load to Elasticsearch
Load to AWS S3
Filter by Start Time
Load to Local File

With Group By stage we group the joined data by venue city and apply aggregation function SUM to count venue seats per venue city.

We use Transformer Stage here to rename a couple of columns.

With Sort Stage we sort the data by the number of seats ascending.

Finally the result is recorded to DB2 table count_seats.

Note: we use  the existing DB2 connection called DB2.

This  was previously created via Settings→Connections option, so it is visible for all jobs within the project “Demo Visual Flow”.

Group By Stage
Transformer Stage
Sort Stage
Load to DB2
Using Existing Connection

Now let’s run the job. First, its status turns to Pending, then to Running.

Now our job has succeeded. The status is reflected as Succeeded on the Job Designer control bar.

The Logs button on the job control bar, leads you to the  Logs screen

With the info button, you can learn more about each stage and how you can use it to manage your data.

Running the job
Job Completion
Job Logs
More About Job Stages

Watch Visual Flow demonstration and
UI review from the solution architect

Contact us

youtube github linkedin facebook instagram twitter
Support Assistance