Amazon S3

Loading data from your Amazon S3 bucket.

To connect your existing S3 bucket with your DCN, you can add a Amazon S3 source. This will allow you to map data such as identifiers, associated identifiers, and traits from a bucket containing any of the supported file types formatted in the tabular data format.

By directly connecting your Amazon S3 bucket with your DCN, you can easily integrate your data and gain valuable insights without the need for complex data migrations or transformations.

Pre-requisites:

  • Have an Amazon S3 account with permissions to manage buckets.

  • Create an Amazon S3 storage bucket specific to this source.

  • Create IAM credentials that will be used by your DCN to access the previously created S3 bucket.

  • Load file(s) into the bucket that are based on the tabular data schema.

Steps:

  1. Open the source create form for Amazon S3 and name your source.

  2. Enter the Bucket URL (For example s3://mybucket/mydirectory/) and select the Bucket Region.

  3. Set the expiry & ingestion frequency.

  1. In the next section, enter the previously created IAM credentials can will grant your DCN read access to your bucket.

The IAM credentials, <access-id> and <secret-key> are required and should specify the AWS access ID and secret associated with a service account having at least full read permissions to the specified bucket (s3:GetObject, s3:ListBucket).

  1. Click Create

With these steps completed, your DCN service account should now have access to the Amazon S3 bucket you entered and will start to ingest files based on the ingestion frequency you set at creation.

Notes:

Your DCN will automatically check for new files in the storage container at the frequency that you set at source creation. If your DCN finds multiple files, it will trigger multiple ingestions simultaneously.

Cloud storage sources have a default rejection threshold of 100%, meaning that even if your file contains 99% errored records, your DCN will ingest the remaining 1%. However, if a file is 100% invalid, it will return an error for that specific file and will continue to attempt to ingest the rest of the files in the container.

If you modify a file in the container, your DCN will see it as modified and attempt to re-ingest it. Files are deemed "New" or "Updated" if the "Last Modified" timestamp has changed.

Last updated