Overview: Execution In Detail

In Discovery Hub 101 - Basics we briefly introduced the concepts such as Deployment & Execution, and how these processes interact with the Raw table, Valid table, and Transformation View etc.

While the diagram above gives a good overview, the process and objects involved are a bit more complex. In this section we will explain this process in more detail so you can better understand some of the higher level features covered in this course.

Step by step

Deployment

  1. During deployment Discovery Hub creates all the database objects necessary for every table. This includes (table postfix in parenthesis):
    1. The Raw Table (_R) - Stores raw copy of source table.
    2. The Transformation View (_T) - Performs all transformation through CAST and AS and other operators.
    3. The List Table (_L) - Holds the list of all DW_IDs of rows in the raw table that were flagged by a data validation rule.
    4. The Messages Table (_M) - Includes a description of why each record in the List table was flagged.
    5. The Data Cleansing Stored Procedure (_Clean) - This is the the script that does "everything else" including, processing conditional lookups, enforcing data validation rules, handling History, and processing incremental load, among other things.
    6. The Valid Table (no postfix or _V if enabled) - All resulting "Valid" records are inserted here.
Depending on the features enabled on a table, there may be additional tables created such as:
  1. The Custom Data table (_CustomData) - Discovery Hub can insert custom rows of data that is manually typed in by a user. The table is created containing the custom data during deployment.
  2. The Custom Data stored procedure (_CustomDataFill) - This contains the data transfer script responsible for inserting the custom data into the Valid table.
  3. The Incremental table (_I) - Stores the last max incremental selection value, enabling Incremental Loads.

Execution

Once Deployment is complete Execution begins:

  1. Either ADO.Net or SSIS (Depending on configuration) performs a bulk load Data Transfer from the source table, storing the raw values into the Raw Table.
  2. Custom Data is inserted using the (_CustomDataFill) Stored procedure
  3. The Data Cleansing Stored Procedure then handles the rest:
    1. Disable indexes on valid table
    2. Process Incremental Load
    3. Process conditional lookups and insert values into the Raw table.
    4. Apply all transformations through Transformation view (_T)
    5. It identifies any records in the Raw table that do not meet the configured data cleansing rules (Primary & Foreign key checks, Custom field validations) and lists the DW_Id of those rows in the List table (_L) with reference to the Message table.
    6. A Description of why each record in the List table was flagged is recorded in the Messages table (_M)
    7. Handle all functions necessary for enabling History.
    8. Load all records from the Transformation view into the Valid table.
    9. Rebuild indexes

Discussion

1 comments