Overview: Incremental Load
Incremental Load is the process of loading ONLY the new or modified records during an execution. This configuration is typically preferable on large transaction tables. Say for example a grocery store that logs thousands of transactions each day in a Sales table. Over time, this table would become very large, storing millions or billions of records. This considerable amount of data could take hours to load for just this single table. If Incremental Load is enabled on the transaction tables in each database, it reduces the total number of records that need to be loaded, and therefore drastically reduces the total time it takes to refresh an organization's analytical infrastructure.
While this configuration may be preferable for transaction tables, it may not be necessary or even desired for dimension tables. Using our grocery store example, while they may have thousands of updated transaction each day, their list of items on the shelves changes at a much slower pace. They may have 1-2 new items each day, which makes for significantly smaller tables without the need to incrementally load.
What is needed?
For Discovery Hub to enable this feature, there are two prerequisites needed in the source data of the table.
- A reliable primary key: The table needs to have a unique key applied to every single record in the table.
- An incremental selection field: This is commonly a modified date field, where the value contains the date (and preferably the time) that the record was added or modified. However some systems track modifications to records with some other data type. As long as the value of the field in the updated record can be evaluated as greater than the value previously recorded value (e.g. 12345 is updated to 12367).
It is possible in some systems for these requirements to be met by the same field. In other words, the Primary key is also the Incremental selection field. This is okay.
How does it work?
To enable this feature, Discovery Hub captures the greatest value in the incremental selection field and stores it in a separate table known as the Incremental table.
During the subsequent execution, Discover Hub only loads records where the value in the Incremental selection field are greater than the value previously stored in the Incremental table. Records with a new primary key are added to the table, and records with an existing primary key are updated.
The Incremental table can be found in the SQL database with the same name of the original table followed by "_I" (SalesOrderTransactions_I).
It's important to be aware that when incremental load is enabled on a table, historical records become static. This means that all values, including transformations and conditional lookup values, will only be updated when the records are updated/modified in the source, or you fully load all records in the table.
A full load can be run manually on a table by Deploying & Executing it, clicking the "Review Tasks" button in the Deploy/Execute menu, then selecting the following deployment steps.
Hard and Soft Deletes
By default, incrementally loaded tables will ignore fields that have been deleted from the source database. However, while it does add a slight bit of overhead (additional load time), Discovery Hub does support this.
Enabling Hard Deletes will identify deleted records in the source and delete these records from the Discovery Hub table.
Enabling Soft Deletes will identify deleted records from the source and flag these records in the Discovery Hub table. A field named "IsTombstone" is added to the table, and the value is set to 1 on records that have been deleted from the source.