Once a project is selected, you can go to its overview page which shows the number of jobs / pipelines and their status.
On the left menu, go to Settings → Parameters to view the parameters created at the project level. These are some connection parameters used in our demo job.
On the Jobs tab there is a list of existing jobs within a project with a search bar, sorting and filtering options.
With the Palette button clicked, go to Job Designer. Switch from Palette to the Params tab to see the job parameters.
We are using two tables from MSSQL database.
The first one is Event, and it stores the events data like Event ID, Venue ID, Event Category, Event Name and Start time
The second one is Venue and it contains data about Venue (Venue ID, Venue Name, Venue City, Venue State, Venue Seats Quantity)
We use the following stages in the job:
* Read * Join * Cache * GroupBy * Transformer * Sort * Filter * Write
For Read stages, MSSQL storage is used as we read the data from 2 MSSQL tables Event and Venue.
For such MSSQL parameters as JDBC URL, User, Password, Schema, and Table, we usE parameters created earlier via the Settings menu.
Two Read stages are connected to the Join stage as we join Event and Venue tables.
In the Join stage, we use the type Inner join, and the key for joining is venueid.
Cache stage is used here for keeping intermediate data (in our case it is the result of join) in memory before it is loaded to multiple destinations.
To record Event and Venue joined data to Elasticsearch, we selected Elasticsearch storage in the Write stage configuration.
Its parameters like nodes, port, user, and password are also pre-created via the Settings → Parameters menu option.
To load the data to AWS S3 we partition it by venue state using Partition By option.
So the result dataset on AWS S3 is split into multiple files per each venue state.
We filter the data to the specific time frame we are interested in. This data is to be loaded to the local parquet file.
Now we load the filtered data to the local file of Parquet format.
Also, we partition the data by eventname.
With Group By stage we group the joined data by venue city and apply aggregation function SUM to count venue seats per venue city.
We use Transformer Stage here to rename a couple of columns.
With Sort Stage we sort the data by the number of seats ascending.
Finally the result is recorded to DB2 table count_seats.
Note: we use the existing DB2 connection called DB2.
This was previously created via Settings→Connections option, so it is visible for all jobs within the project “Demo Visual Flow”.
Now let’s run the job. First, its status turns to Pending, then to Running.
Now our job has succeeded. The status is reflected as Succeeded on the Job Designer control bar.
The Logs button on the job control bar, leads you to the Logs screen
With the info button, you can learn more about each stage and how you can use it to manage your data.