Snowflake

Loading data from Snowflake

You can add a Snowflake source to connect directly between your Snowflake instance and your DCN. This will enable you to easily map data such as identifiers, associated identifiers and traits, from your existing data warehouse into your DCN from a view or a table using a predefined tabular data schema.

Granting Your DCN Access to Your Snowflake Account

First, before connecting your Snowflake account with your DCN, you will be required to execute the script below in order to pre-configure the necessary user, role, database, warehouse, schema during the configuration of the new source.

Make sure to select the All Queries option when executing the script in Snowflake

We recommend that you define a unique and secure for the user_password field

-- create variables for user / password / role / warehouse / database (needs to be uppercase for objects)
set role_name = 'OPTABLE_ROLE';
set user_name = 'OPTABLE_USER';
set user_password = '*********';
set database_name = 'OPTABLE_DATABASE';
set warehouse_name = 'OPTABLE_WAREHOUSE';
set schema_name = 'OPTABLE_SCHEMA';

-- change role to securityadmin for user / role steps
use role securityadmin;
   
-- create role for optable
create role if not exists identifier($role_name);
grant role identifier($role_name) to role SYSADMIN;
   
-- create a user for optable
create user if not exists identifier($user_name)
password = $user_password
default_role = $role_name
default_warehouse = $warehouse_name;
 
grant role identifier($role_name) to user identifier($user_name);

-- change role to sysadmin for warehouse / database steps
use role sysadmin;

-- create database for optable
create database if not exists identifier($database_name);
use database identifier($database_name);

-- create a schema for optable
create schema if not exists identifier($schema_name);

-- access to database
grant usage
  on database identifier($database_name)
  to role identifier($role_name);

-- access to schema
grant usage
  on schema identifier($schema_name)
  to role identifier($role_name);
    
-- read-only rights on schema
grant select
  on all tables in schema identifier($schema_name)
  to role identifier($role_name);

-- create a warehouse for optable to perform SELECT operations
create warehouse if not exists identifier($warehouse_name)
  warehouse_size = xsmall
  warehouse_type = standard
  auto_suspend = 60
  auto_resume = true
  initially_suspended = true;
   
-- access to warehouse resources
grant usage
  on warehouse identifier($warehouse_name)
  to role identifier($role_name);

Organizing Your Data

In order for your view to be successfully ingested into your DCN, it must follow an established schema which is outlined within our Tabular Data Schema section.

Once the columns in the table or view follow the proper schema, you will be able to carry your first party data into your node from Snowflake once the connection is complete.

Adding a Snowflake Source via DCN UI

Once you have executed the script and ensured that the schema is properly formatted, you can add a new Snowflake source to your DCN in the Integrations section.

You can add Snowflake as a source via the following path Integrations > Sources > Add Source and proceed to the Snowflake source form.

There are fields which have a default value set and only need to be modified if required, the default values are the ones that were used in the script to set up your Snowflake account.

You must fill out the Account ID, Password and Table View sections in order to connect with your Snowflake account and point towards the right view within your database.

At the ingestion frequency section, you can schedule the time at which you would like your DCN to ingest new data. The options are Now, Once, Daily and Hourly. Once everything is set, you can click on save and the source will be added to your DCN. Once done, a test connection will be performed and the ingestion will occur according to the set schedule. You can also manually force an ingestion from the listing by clicking on ingest, and the process will begin right away.

There are several states that your Snowflake Source can have such as Active, Configured, Processing and Error.

Adding a Snowflake Source via CLI

You can also add a Source via the CLI command optable-cli source create snowflake <name> <account> <password> <table>, an example of what a created Snowflake Source would look like:

{
  "source":  {
    "id":  11,
    "name":  "snowman",
    "state":  "SOURCE_STATE_ACTIVE",
    "kind":  "SOURCE_KIND_SNOWFLAKE",
    "created_at":  "2022-03-08T19:29:38.726938Z",
    "updated_at":  "2022-03-08T19:33:07.938140Z",
    "snowflake":  {
      "account":  "your_account",
      "user":  "OPTABLE_USER",
      "database":  "OPTABLE_DATABASE",
      "schema":  "OPTABLE_SCHEMA",
      "warehouse":  "OPTABLE_WAREHOUSE",
      "role":  "OPTABLE_ROLE",
      "table":  "your_view",
    }
  }
}

Refer to Snowflake's documentation for details on how to determine your account ID.

The remaining optional fields are:

--user         which is the username of the read-only snowflake account. Defaults to OPTABLE_USER. 
--database     which is the database assigned to the read-only snowflake account. Defaults to OPTABLE_DATABASE. 
--schema       which is the schema for the read-only snowflake account. Defaults to OPTABLE_SCHEMA. 
--warehouse    which is the warehouse used to power the ingestion operations. Defaults to OPTABLE_WAREHOUSE. 
--role         which is the role of the read-only snowflake account. Defaults to OPTABLE_ROLE.

If you use the recommended setup scripts for Snowflake, these values will be set as the defaults and you will not need to pass any of the above optional fields.

Additionally, you can pass:

--expiry       which is the time after which ingested data expires. Defaults to no expiration. It accepts a duration parameter like --expiry 10h for a 10 hour expiration or --expiry 2d for a 2 day expiration.

Scheduling Ingestion from Snowflake

Passing no scheduling parameters at all results in the assumption that you want to create a Snowflake Source that ingests only once at the current time. The scheduling parameters are designed to infer as much as possible and it is possible to modify them:

--at                 which is the start date/time. It must be passed in the ISO 8601 RFC 3339 Format - 2006-01-02T15:04:05-05:00. Defaults to the current date and time.
--ingest-frequency   which must be one of now, once, daily or hourly. Defaults to current date and time. now. 

If you only pass --ingestion-frequency and omit the --at option, that means you will ingest on the specified schedule (once, daily, hourly) starting at the current time and repeating at the current time.

Conversely, only passing the --at flag without the --ingestion-frequency flag means the ingestion will run only once at the time/date you specify. For example:

--at 2022-03-10T15:00:00-05:00 will create a snowflake source scheduled to ingest once on March 10, 2022 at 3 p.m.

Combining both flags allows you to specify any combination of the scheduling parameters. For example:

--ingestion-frequency daily --at 2022-03-10T15:00:00-05:00 creates a snowflake source scheduled to ingest every day at 3 p.m. starting on March 10, 2022.

Once complete, a test connection with your data warehouse will be performed while looking for the table or view that you have previously entered. If unsuccessful, you will be prompted an error message if the connection fails or the values entered do not match.

Source Update

You can also update source fields (except the scheduling fields) using optable-cli source update snowflake <id>. You need to specify any combination of fields you want to update. The options are name, account, password, table, expiry, user, database, schema, warehouse and role. They all represent the same values as in source creation.

Manual Ingestion

If you want to manually refresh the source and bypass the scheduling that has been set, use command optable-cli source ingest <id> and the source will refresh.

Once all of the parameters have been set, you will be able to see the Snowflake Source within the UI as well as CLI and ingestion will occur at the scheduled time.

Last updated