Tabular Data Schema
To ingest data via file upload or snowflake sources, you must follow the established schema so that the ingestion process is seamless.
By conforming your data to the proper schema, you will be able to bring your first-party data into your node which is the first step to start collaborating with your partners.
Prior to uploading a
XLSX
CSV
or TSV
file or connecting to your Snowflake database, you must format your data using a predefined convention so your DCN can ingest it successfully. There are three column types that are processed: identifier
, scope
and trait
. Each column type will require a specific header format to be properly processed.There are 2 classes of
identifiers
: matchable and custom. For more information on what the values of each type could look like, head over to identifier types.- The
identifier
column must match one of the following headers in order for your node to recognize the values.
Identifier | Header | Format (Value) |
---|---|---|
Email address | id_e | Raw email or pre-normalized (lowercase + space trimmed) and SHA256 hashed |
Phone number | id_p | Raw phone or pre-normalized (E.164) and SHA256 hashed |
IPv4 address | id_i4 | Four period (.) separated octets, each of which contains a number from 0 to 255 |
IPv6 address | id_i6 | Eight colon (:) separated segments, each of which contains 4 hexadecimal digits, lowercase |
Apple IDFA | id_a | Down-cased alphanumeric value, including dashes if any |
Google GAID | id_g | Down-cased alphanumeric value, including dashes if any |
Roku RIDA | id_r | Down-cased alphanumeric value, including dashes if any |
Samsung TV TIFA | id_s | Down-cased alphanumeric value, including dashes if any |
Amazon Fire AFAI | id_f | Down-cased alphanumeric value, including dashes if any |
NetID | id_n | Base64 web-safe encoded value with 70 characters (TPID) |
Custom ID | id_c ,
id_c1 , id_c2 , id_c3 , id_c4 , id_c5 ,
id_c6 ,
id_c7 ,
id_c8 ,
id_c9 | Free form text, case sensitive. Cannot be used for matching, for any subsequent ids, +1 at the end of the header name (from 1 to 9)
|
Optable VID | id_v | Alphanumeric string. |
Identifier (catch-all) | id | Any type-prefixed ID. If we do not successfully detect a valid prefix (for example "e:", "i4:", "a:", etc.) for the identifier, we will attempt to auto-detect raw email addresses and phone numbers and automatically encode and type-prefix them at import time |
Scope
- The
scope
column indicates eitherperson
orhousehold
scope. It defaults toperson
if unspecified. - The
scope
is applied on the primary ID.
Traits
- All
trait
columns are prefixed withtrait_
followed by the trait key name.- For example, columns named
trait_age
,trait_gender
andtrait_owns_house
would map to traitsage
, gender andowns_house
, respectively.
- All traits are associated to the primary identifier and not the neighbouring IDs
- Trait values are optional.
While a trait can be associated to a household-scoped identifier, the trait itself is scope-less. See Identity Resolution for more details.
If header formatting conventions are followed and the DCN can properly ingest all profiles, there are a set of rules that will be applied upon ingestion:
- The first ID column is treated as the primary ID in the associated ID profile
- The value specified for the first ID column must be valid, otherwise the entire row is ignored
- Additional ID columns (if any) are treated as neighbouring IDs
- Neighbouring identifiers are considered of scope "person". To change it to "household", insert a subsequent row with that identifier as primary ID and with the scope "household".
- The values specified for additional ID columns are optional. If no value is provided, that specific neighbour ID will be ignored but the remaining data specified in the row will still be ingested.
The DCN allows
identifier
columns to share the same name or include a suffix such as _suffix
.
For example, the following are identical to the DCN:id,id,id_e,id_e
id,id_1,id_e,id_e_1
In the following section, you can find examples of how your data should be structured with varying parameters:
Example 1: A simple list of emails
In the example above, 2 distinct person clusters will be created with their respective email identifiers.
Example 2: Two profiles with multiple emails and some traits.
id_e | id_e_1 | trait_income | trait_ownscat |
---|---|---|---|
25000 | yes | ||
| | no |
In the example above, 2 distinct person clusters will be created but only the first will have 2 email identifiers.
Example 3: Two profiles with a neighbouring ID and some traits.
id_e | id_p | scope | trait_subscribed | trait_household_income |
---|---|---|---|---|
555-555-5555 | person | yes | | |
| household | no | 100000 |
In the example above, the email provided is the primary identifier and the phone number is a neighbouring ID within this profile. Any trait or scope definition is associated to the email.
Example 4: Four profiles with a mixture of ID types with multiple neighbours and traits.
id | id_p | trait_income | trait_age |
---|---|---|---|
i4:8.8.8.8 | 555-555-5555 | 50000 | 54 |
s:e0ef86a8-6ebf-4c9d-9127-e69407fe748d | +15145551234 | 25000 | 24 |
f:6e853799-ef31-4a30-8706-9742be254d38 | | 24000 | |
i6:2001:0db8:85a3:0000:0000:8a2e:0370:7334 | +21630010123 | | 56 |
In the example above, the primary identifier for a profile is the
id
column which allows for any type of ID. The email column is the neighbouring ID, the scope column and the traits columns are associated to the primary id
. There is a limit of 64 traits and 64 neighbours per identifier. Any record which exceeds those limits will be ignored. Note that the time to ingest the file will depend on the number rows, number of identifier columns and number of trait columns contained in the within.
Once the data has been formatted to the accepted convention, you will be able to proceed with your File Upload source setup or begin ingestion from your Snowflake database. When done, you can create an audience based off your source which can be used for Matching, Activation or to Export to a destination.
Upon initiating an ingestion, the DCN may reject the file or view due to one or more of the following errors:
All headers of
CSV
, TSV
or XSLX
files must be valid. Any invalid header will prevent the entire file from being ingested, and the last successfully ingested file (if any) will remain active.Error | Description |
---|---|
Unrecognized header(s) | The header(s) do not follow Optable's convention and could not be recognized.
All headers must be recognized. The unrecognized headers must be renamed to the accepted convention. |
Could not find any valid ID header | No id or id_<type> header was found.
The file must contain at least one id column, with either the id_<type> prefix or id as a catch-all column with type prefixes declared for each value. |
Duplicate scope header(s) | There are at least two headers with the name scope .
Only only scope column is supported. |
Duplicate trait header(s) | There are at least two trait_ headers with the same name.
Each trait_ header must have a unique suffix. |
Empty header(s) | There is at least one header without any name.
Headers with empty names (or only whitespace) are forbidden. You must name each column according to Optable's convention or discard it. |
Trait header must start with "trait_" and must be followed by at least 1 character. | The trait header has no name - the prefix must be followed by any character to identify your trait.
You must add at least 1 character after the prefix in order for your trait column to be recognized. |
During file ingestion, if the file is a
CSV
, TSV
or XSLX
, your DCN validates the header first and if it passes validation, a schema is generated from it. Your DCN will check individual records & rows for errors that may cause an ingestion failure, there are three possible errors:Error | Description |
---|---|
Wrong number of fields. | The number of fields does not match with the schema derived from the header. Typically this will be because of an extra field delimiter ( , for CSV and for TSV). |
Bare quotes found in unquoted field. | A " inside an unquoted field. Example: ...,to"[email protected],..." |
Extraneous or missing quote in quoted-field. | Either a " at the beginning or end of a field or an extra " inside a quoted field.
|
Your data is rejected if the DCN is unable to recognize more than 10% of the records within your data - nothing is ingested if this threshold is passed. The data must be formatted to ensure the accepted convention is followed.
The file will still be ingested as long as the error threshold is under 10%. If you want a lower threshold, you must fix rejected records and initiate ingestion again.
For example, if 5 records out of 10 have no identifier, the entire file will be rejected because the rejection rate is too high (50% - above the 10% threshold). The last successfully ingested file (if any) will remain active.
The same logic applies for any types of errors - the file will be rejected if more than 10% of them are invalid.
Last modified 4mo ago