Dataset lineage

Overview

A dataset doesn't exist in isolation; rather, datasets can be built on top of tables that are derived from other datasets. That means that changing one dataset can impact downstream datasets or that data issues in a dataset may be caused by upstream datasets. Reviewing the source of the data and dependencies directly in Simon via Dataset lineage helps you visualize the relationships between datasets.

For example: if dataset A fails, you can quickly see that dataset B will also be affected because it's dependent on A. Likewise, if you update a query in dataset A then you could break dataset B. Checking the lineage before making changes can help you prevent these issues

1058

Dataset lineage

Dataset lineage is available at the dataset level and shows dependencies within Simon for the following dataset types using the Simon Snowflake database:

  • Contact Data
  • Data Lake
  • Simon Warehouse

🚧

Upstream datasets must complete their extract step prior to dependent downstream datasets

This means downstream dataset lineage is not up to date until the upstream dataset has completed its extract step. In the image above, Lineage Test is not up to date until Cody DAG 1 has completed its extract step.


View dataset lineage

  1. From the left navigation, expand Datasets.
  2. Click Datasets.
  3. Search for and click the dataset name you want to view lineage for. The dataset summary displays.
  4. Click the Lineage tab.
372
  1. Click the pink links within the lineage to open the dataset in the editor.
1090

Click link to open editor

  1. Click the Lineage tab to return to the lineage view.