Tabular Data Schema
By conforming your data to the proper schema, you will be able to bring your first-party data into your node which is the first step to start collaborating with your partners.
Prior to uploading a
TSVfile or connecting to your Snowflake database, you must format your data using a predefined convention so your DCN can ingest it successfully. There are three column types that are processed:
trait. Each column type will require a specific header format to be properly processed.
identifiercolumn must match one of the following headers in order for your node to recognize the values.
scopecolumn indicates either
householdscope. It defaults to
scopeis applied on the primary ID and all the traits of that record.
traitcolumns are prefixed with
trait_followed by the trait key name.
- For example, columns named
trait_owns_housewould map to traits
- All traits are associated to the primary identifier and not the neighbouring IDs
- Trait values are optional.
If header formatting conventions are followed and the DCN can properly ingest all profiles, there are a set of rules that will be applied upon ingestion:
- The first ID column is treated as the primary ID in the associated ID profile
- The value specified for the first ID column must be valid, otherwise the entire row is ignored
- Additional ID columns (if any) are treated as neighbouring IDs
- Neighbouring identifiers are considered of scope "person". To change it to "household", insert a subsequent row with that identifier as primary ID and with the scope "household".
- The values specified for additional ID columns are optional. If no value is provided, that specific neighbour ID will be ignored but the remaining data specified in the row will still be ingested.
In the following section, you can find examples of how your data should be structured with varying parameters:
Example 1: A simple list of emails
In the example above, 2 distinct person clusters will be created with their respective email identifiers.
Example 2: Two profiles with multiple emails and some traits.
In the example above, 2 distinct person clusters will be created but only the first will have 2 email identifiers.
Example 3: Two profiles with a neighbouring ID and some traits.
In the example above, the email provided is the primary identifier and the phone number is a neighbouring ID within this profile. Any trait or scope definition is associated to the email.
Example 4: Four profiles with a mixture of ID types with multiple neighbours and traits.
In the example above, the primary identifier for a profile is the
idcolumn which allows for any type of ID. The email column is the neighbouring ID, the scope column and the traits columns are associated to the primary
Once the data has been formatted to the accepted convention, you will be able to proceed with your File Upload source setup or begin ingestion from your Snowflake database. When done, you can create an audience based off your source which can be used for Matching, Activation or to Export to a destination.
Upon initiating an ingestion, the DCN may reject the file or view due to one or more of the following errors:
All headers of
TSVfiles must be valid. Any invalid header will prevent the entire file from being ingested, and the last successfully ingested file (if any) will remain active.
During file ingestion, if the file is a CSV (or TSV), your DCN validates the header first and if it passes validation, a schema is generated from it. Your DCN will check individual records & rows for errors that may cause an ingestion failure, there are three possible errors:
Your data is rejected if the DCN is unable to recognize more than 10% of the records within your data - nothing is ingested if this threshold is passed. The data must be formatted to ensure the accepted convention is followed.
For example, if 5 records out of 10 have no identifier, the entire file will be rejected because the rejection rate is too high (50% - above the 10% threshold). The last successfully ingested file (if any) will remain active.
The same logic applies for any types of errors - the file will be rejected if more than 10% of them are invalid.