Before allowing you to create or edit a dataset, Simon runs certain validation checks on the dataset’s configuration. These validations are designed to ensure the dataset can be successfully ingested by Simon for use in your account. To see if a dataset is valid, click Validate.
- If successful, a small sample result displays.
- If unsuccessful, a description of the error(s) and remediation steps display.
|Identifier||Your dataset must contain a customer identifier (e.g. ‘email’).|
|Unique field names||Each dataset is associated with a unique set of fields, and no two datasets can be associated with a field with the same name. If your dataset returns a field that already exists in your data, the dataset will be invalid. An exception to this unique constraint is the identifier, which must be present in every dataset.|
|No field deletion||Fields cannot be removed from a dataset. If for some reason an existing field is causing issues and needs to be removed, please contact your Client Solutions Manager.|
|Non-zero rows returned||The dataset must contain data.|
|Valid syntax (queries only)||Your query must have valid syntax and successfully run against your database.|
The Contact Event or Object Dataset Settings tab has Extract Schedule configurations that require a Created Timestamp and an Updated Timestamp.
Fields selected for these configurations require additional validations so you need to click the Validate button again. You'll see an error message if the following conditions are not met:
- Timestamps can't be in the future.
- Timestamps can't contain milliseconds.
- Timestamps can't be null.
- Updated Timestamp must be greater or equal to Created Timestamp.
The dataset details page has a tab called Settings, which contains additional dataset-level and field-level validation rules that you can configure to operate on your datasets.
- Unlike the query-level validations above, these rules are run against the entire dataset during extraction, not just a sample. While validations are designed to confirm a dataset’s configuration, dataset rules are able to detect anomalies in the data itself.
- Unlike the mandatory configuration validations listed above, these dataset rules can be edited and removed as appropriate.
Dataset rules are applied by default to every new dataset. These validations currently exist (and more are coming soon!):
|Dataset should not be empty||The dataset is always expected to contain some data. A failure occurs if there are no rows during the dataset extract.|
If you are creating a dataset that is expected to only occasionally contain data, toggle this rule to off.
|Row count should not decrease below threshold on refresh||In general, Simon expects that datasets will contain consistent or increasing amounts of data, which means that row count should not decrease dramatically. After a dataset is extracted, Simon will compare the total number of rows in the dataset to the previous extracts. If there is a significant decrease in row count, the extract job will fail.|
This rule is created by default for all new datasets with a threshold of 75% (in other words, the row count should never decrease by more than 25%). The threshold may be manually adjusted from the rules tab.
If you change the threshold, remember to click Save at the top of the dataset details page or you will lose your changes.
If you expect your datasets to have large fluctuations in data quantity, toggle this rule to off.
|Skippable (only applicable to Contact Data||If the dataset fails during extract for any reason, continue with remaining Contact Data extraction and use data from last successful extract.|
These rules can be applied at the individual field level. If there are currently no field-level rules only the Add Field Rules button displays.
To configure new field-level rules:
- Click Add Field Rules
- From the drop-down, choose a dataset field to add rules to.
- Toggle validation options on or off:
|The number of null values should not increase on refresh||Alerts you when the number of null values within a column increases unexpectedly.|
This validation is meant to detect upstream data issues that need to be resolved before retrying the extract.
|Field should always have data from today||Enable this validation for any datasets where you want to ensure there has been an update before Simon extracts. This will ensure that each time your data is extracted, the row specified contains fresh data.|
If the latest timestamp in a dataset does not have today’s date, this rule will prevent the dataset from extracting successfully.
|Duplicate values should not exist||Simon expects that certain types of non-event contact datasets will only contain one record per user. This rule evaluates whether there are multiple rows mapped to a single identifier and alert if that validation fails.|
To “delete” field-level rules, just toggle them off.
You can explicitly flag specific datasets as skippable to keep them from holding up your pipe.
- From the dataset view, click the Rules tab.
- Toggle the skippable option to *On.
- If a dataset is not flagged as skippable and also isn't used within your account and there’s been at least one successful pipe run (data refresh), the dataset is implicitly skippable to keep your campaigns going. You don't have to toggle it to skippable yourself, but if you do it's OK. No harm done.
- You can configure a notification to let you know if a dataset extract fails, but has also been flagged as skippable.
If there are other rules that you would like to apply to your datasets, please reach out to your Account Manager and let us know!
Updated over 1 year ago