Skip to content

Latest commit

 

History

History
91 lines (66 loc) · 3.69 KB

File metadata and controls

91 lines (66 loc) · 3.69 KB
title sidebar_label sidebar_position
BigQuery Loader
BigQuery Loader
2
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
import CodeBlock from '@theme/CodeBlock';
import LoaderDiagram from '@site/docs/api-reference/loaders-storage-targets/bigquery-loader/_diagram.md';
import DeployOverview from '@site/docs/api-reference/loaders-storage-targets/bigquery-loader/_deploy-overview.md';

Overview

:::tip Schemas in BigQuery

For more information on how events are stored in BigQuery, check the mapping between Snowplow schemas and the corresponding BigQuery column types.

:::

Configuring the loader

The loader config file is in HOCON format, and it allows configuring many different properties of how the loader runs.

The simplest possible config file just needs a description of your pipeline inputs and outputs:

https://github.com/snowplow-incubator/snowplow-bigquery-loader/blob/v2/config/config.kinesis.minimal.hocon
https://github.com/snowplow-incubator/snowplow-bigquery-loader/blob/v2/config/config.pubsub.minimal.hocon
https://github.com/snowplow-incubator/snowplow-bigquery-loader/blob/v2/config/config.azure.minimal.hocon

See the configuration reference for all possible configuration parameters.

Iglu

The BigQuery Loader requires an Iglu resolver file which describes the Iglu repositories that host your schemas. This should be the same Iglu configuration file that you used in the Enrichment process.

Metrics

The BigQuery Loader can be configured to send the following custom metrics to a StatsD receiver:

Metric Definition
events_good A count of events that are successfully written to BigQuery.
events_bad A count of failed events that could not be loaded, and were instead sent to the bad output stream.
latency_millis The time in milliseconds from when events are written to the source stream of events (i.e. by Enrich) until when they are read by the loader.
e2e_latency_millis The end-to-end latency of the snowplow pipeline. The time in milliseconds from when an event was received by the collector, until it is written into BigQuery.

See the monitoring.metrics.statsd options in the configuration reference for how to configure the StatsD receiver.

import Telemetry from "@site/docs/reusable/telemetry/_index.md"

<Telemetry name="BigQuery Loader" since="2.0.0" idSetting="telemetry.userProvidedId" disableSetting="telemetry.disable" />