Google BigQuery

Loading data into BigQuery using external tables & data sources.

Optable Recommends using the Advanced Query Interface (AQI) to access your entire identity graph in BigQuery without data replication. It's a streamlined solution for viewing your ID graph without data replication.

This guide will help you understand how to export data from your DCN into Google BigQuery using Google Cloud Storage & Amazon S3 as intermediary storage buckets, and is meant to act as an alternative to the Advanced Query Interface (AQI). This leverages BigQuery's External Data Sources feature, which allows you to query data stored in an external stage as if it were inside a BigQuery table.

Prerequisites

Before proceeding, ensure that you have the following:

  • An active Google BigQuery account with necessary permissions.

  • An active Google Cloud Storage (GCS) bucket where data from your DCN will be exported to.

  • Alternatively, BigQuery supports Data Transfers from Amazon S3 buckets to load data from Amazon S3, be sure to have the proper IAM credentials to grant BigQuery access to read from your S3 bucket.

Please note that this guide assumes you have the necessary permissions to create and manage resources in BigQuery, GCP & AWS. Always ensure data is handled in a secure and compliant manner.

Step 1 - Export Data from your DCN:

The first step is to export the data from your DCN into the previously created Google Cloud Storage (GCS) or Amazon S3 bucket in Parquet format.

Once the data is exported to the GCS bucket in Parquet format, it can be queried in BigQuery via the external table you will create in the next step.

Step 2 - Configure BigQuery to ingest your data:

The next step will be to set up the ingestion pipeline into your BigQuery project/dataset by using External Tables (GCS) or Data Transfers (S3).

Step 1: Click "Add Data" in BigQuery.

In BigQuery navigate to the project you want to export your audience to and click + ADD this will open a dialog where you will be able to load

Step 2: Select "Google Cloud Storage" and enter your configuration details.

You will then be prompted to configure your Google Cloud Storage source.

  1. Enter the URI pattern for the GCS bucket you previously exported to. This should include the whole path and wildcard after the audience. For example if you exported my-audience the path might look like gs://my-gcs-bucket/my-audience-name/*

  2. Select Parquet as the format.

  3. Select the Project, Dataset and Table you want to associate with exports of this audience.

  4. Set the table type to External Table

  5. Under Schema select the Auto Detect option.

Query Data in BigQuery:

You can now run SQL queries on the external table in BigQuery, as you would with any other table. Note that querying data in an external table might be slower than querying data stored natively in BigQuery. To improve performance, consider creating a materialized view based on the external table.

For more information, refer to the BigQuery External Tables Documentation.

Following successful data export and querying, you can now perform further data analysis, create data visualizations, or develop machine learning models using BigQuery's comprehensive set of tools.

Last updated