Ask or search…
K
Links

Google BigQuery

Loading data into BigQuery using external tables & data sources.
Optable Recommends using the Advanced Query Interface (AQI) to access your entire identity graph in BigQuery without data replication. It's a streamlined solution for viewing your ID graph without data replication.
This guide will help you understand how to export data from your DCN into Google BigQuery using Google Cloud Storage & Amazon S3 as intermediary storage buckets, and is meant to act as an alternative to the Advanced Query Interface (AQI). This leverages BigQuery's External Data Sources feature, which allows you to query data stored in an external stage as if it were inside a BigQuery table.

Prerequisites

Before proceeding, ensure that you have the following:
  • An active Google BigQuery account with necessary permissions.
  • An active Google Cloud Storage (GCS) bucket where data from your DCN will be exported to.
  • Alternatively, BigQuery supports Data Transfers from Amazon S3 buckets to load data from Amazon S3, be sure to have the proper IAM credentials to grant BigQuery access to read from your S3 bucket.
Please note that this guide assumes you have the necessary permissions to create and manage resources in BigQuery, GCP & AWS. Always ensure data is handled in a secure and compliant manner.

Step 1 - Export Data from your DCN:

The first step is to export the data from your DCN into the previously created Google Cloud Storage (GCS) or Amazon S3 bucket in Parquet format.
Once the data is exported to the GCS bucket in Parquet format, it can be queried in BigQuery via the external table you will create in the next step.
Exporting in Parquet to GCS & Amazon S3 buckets.

Step 2 - Configure BigQuery to ingest your data:

The next step will be to set up the ingestion pipeline into your BigQuery project/dataset by using External Tables (GCS) or Data Transfers (S3).
Google Cloud Storage
Amazon S3 (Data Transfers)

Step 1: Click "Add Data" in BigQuery.

In BigQuery navigate to the project you want to export your audience to and click + ADD this will open a dialog where you will be able to load
Adding data to your BigQuery project.

Step 2: Select "Google Cloud Storage" and enter your configuration details.

You will then be prompted to configure your Google Cloud Storage source.
  1. 1.
    Enter the URI pattern for the GCS bucket you previously exported to. This should include the whole path and wildcard after the audience. For example if you exported my-audience the path might look like gs://my-gcs-bucket/my-audience-name/*
  2. 2.
    Select Parquet as the format.
  3. 3.
    Select the Project, Dataset and Table you want to associate with exports of this audience.
  4. 4.
    Set the table type to External Table
  5. 5.
    Under Schema select the Auto Detect option.
Configuring your Google Cloud Storage source in BigQuery.

Step 1: Click "Add Data" in BigQuery.

In BigQuery navigate to the project you want to export your audience to and click + ADD this will open a dialog where you will be able to load
Adding data to your BigQuery project.

Step 2: Select "Amazon S3 Data Transfer" and enter your configuration details.

You will then be prompted to configure your Amazon S3 data transfer, this will create a BigQuery job to load data from the Amazon S3 bucket you exported to on a regular basis.
  1. 1.
    Set the Display Name of this data transfer. It should be specific to your DCN & recognizable.
  2. 2.
    Select the Dataset and Destination Table you want to associate with exports of this audience.
  3. 3.
    Enter the URI pattern for the Amazon S3 bucket you previously exported to. This should include the whole path and wildcard after the audience and include the .parquet extension. For example if you exported my-audience the path might look like s3://my-s3-bucket/my-audience-name/*.parquet
  4. 4.
    Enter your Amazon S3 IAM ID and Access Keys.
  5. 5.
    Select Parquet as the file format.
Configuring you Amazon S3 data transfer job.
  1. 7.
    Click Save
  2. 8.
    Click Run Transfer Now
Click Run Transfer Now
BigQuery will then enqueue a job to load data from the Amazon S3 bucket of your choice. The duration of this job can

Query Data in BigQuery:

You can now run SQL queries on the external table in BigQuery, as you would with any other table. Note that querying data in an external table might be slower than querying data stored natively in BigQuery. To improve performance, consider creating a materialized view based on the external table.
For more information, refer to the BigQuery External Tables Documentation.
Following successful data export and querying, you can now perform further data analysis, create data visualizations, or develop machine learning models using BigQuery's comprehensive set of tools.
Last modified 5mo ago