A dataset doesn't exist in isolation; rather, datasets can be built on top of tables that are derived from other datasets. That means that changing one dataset can impact downstream datasets or that data issues in a dataset may be caused by upstream datasets. Reviewing the source of the data and dependencies directly in Simon via Dataset lineage helps you visualize the relationships between datasets.
For example: if dataset A fails, you can quickly see that dataset B will also be affected because it's dependent on A. Likewise, if you update a query in dataset A then you could break dataset B. Checking the lineage before making changes can help you prevent these issues
Dataset lineage is available at the dataset level and shows dependencies within Simon for the following dataset types using the Simon Snowflake database:
- Contact Data
- Data Lake
- Simon Warehouse
Upstream datasets must complete their extract step prior to dependent downstream datasets
This means downstream dataset lineage is not up to date until the upstream dataset has completed its extract step. In the image above,
Lineage Testis not up to date until
Cody DAG 1has completed its extract step.
- From the left navigation, expand Datasets.
- Click Datasets.
- Search for and click the dataset name you want to view lineage for. The dataset summary displays.
- Click the Lineage tab.
- Click the pink links within the lineage to open the dataset in the editor.
- Click the Lineage tab to return to the lineage view.
Updated about 1 year ago