Overview

  • Recurring file feeds are files dropped to a Simon owned S3/SFTP bucket on a frequent cadence. These files can be incrementally loaded or overwritten.
  • If you have a file feed that you’d like to load using the Recurring File Feeds product, please ensure that it meets the requirements outlined here: File ingestion via S3 or SFTP

Set up a new Recurring File Feed

  1. From the left navigation, expand Datasets then click Datasets.
  2. Click Create Dataset.
  3. Choose File Feed then click Next.

  1. The Create a Dataset screen appears. Complete the following:

  • Name: Name your dataset using SNAKE_CASE. This will be the name of the corresponding table in Simon's Data Warehouse.
  • Select File Source from the drop-down. This option is the default.
  • Select the S3 / SFTP location that your feed is in.

🚧

Names must be in SNAKE_CASE

If you don't use SNAKE_CASE when naming your dataset, you'll see this error:

  1. Click Start. The Editor tab opens. There are two primary components:
    • The Directory Structure (on the left): the directory of the selected file source when the dataset was created (previous page). You can navigate down to the file name.
    • File Configuration (on the right): fields required for validating the File Feed display here. See below for descriptions
2380

Editor tab

All fields are required in order to proceed to the next step:

  • File Path - what you place in this field depends on if your files will be incremental or overwrite.
    • If incremental, you will need to insert the file path down to the last folder directory into the file path. ie: file_path/example
    • If overwrite, you will need to insert the file path down to the file name. ie: file_path/example/feed1.csv
  • Replication Method - select if your dataset is incremental or overwrite

🚧

Common validation errors and how to fix them

If you do not have the correct file path and correct replication method, an error will display. See below for details.

  • File Format - select whether your files are .csv, .tsv, or .json
  • Compression - select whether or not your files are zipped or not
  • Encryption - select whether or not your files are PGP or GPG encrypted

❗️

Zip then Encrypt!

If you encrypt and then zip your files, Simon will not be able to process them. Please make sure you are zipping your files and then encrypting via PGP or GPG.

  1. Click Validate. A sample of your file appears.
  2. Mark the data types for each column in the Fields tab.
  3. In the settings tab, there are four configuration options to choose from:
Configuration OptionDescription
Record IDRequired for incremental feeds

An identifier that is guaranteed to be unique per record. Any duplicates or null IDs are filtered out
Updated TimestampRequired for incremental feeds

When the record was created

Must be in epoch time
SkippableSee Common terms
Cadence* The cadence is when Simon can expect to receive the file drop. We use this time to also notify you as well in the case of a missing feed.

See also Common terms

Common terms

TermDefinition
CadenceRefers to the time at which Simon can expect a file drop so we can send a notification to the client if it is missing or on-time.
IncrementalOnly contains net-new data. The data in the new files will be appended to the existing data in the dataset.
OverwriteContains the data you want in the downstream dataset (Snowflake). Files will be replaced with the new file on each drop and the dataset will only have data from the latest overwrite file.
SkippableDoes not impact your customer pipe.

When the toggle is set to off Simon considers this to be a critical feed that holds up the customer pipe refresh.

When the toggle is set to true the customer pipe continues as scheduled.

Common Validation Errors

  • A file feed dataset already exists with this file path. Multiple file feed datasets cannot ingest files from the same file path.

    You can't have multiple file feed datasets that run on the same file path. Choose a new file path.

  • Provided file is either a directory or a file without extension. Overwrite feeds must have a path that leads to a file with an extension.

    The file path drills down to a folder, not a file name with extension. This should either be an incremental feed or you must adjust the file path to drill down to a file name with extension.


Give a Access to a Third Party Vendor

You account manager can help you provide your vendor(s) access to a part of your org’s bucket. We need:

  • The vendor name (we'll use this as their username).
  • SSH public key
    • Ask your vendor to generate an SSH key value pair (by running the ssh-keygen -P "" -f key_name command in the terminal) and provide you with a public key (key_name.pub if they ran the command above).

📘

Public Keys Must Be in the open SSH RSA format

We only accept public keys in the openSSH RSA format (key should start with ssh-rsa), because that is the only format that Amazon Web Services (AWS) accepts. If your RSA key is in a different format (i.e. SSH2), please convert it to openSSH.