SnowCannon setup guide

HOME > [SNOWPLOW SETUP GUIDE](Snowplow setup guide) > Step 1: setup a Collector > SnowCannon setup

Before You Begin

Assumptions

For the purpose of this guide, we are going to assume:

That you want to log the Snowplow event data collected by SnowCannon to Amazon S3
That you want to use Fluentd to handle the actual file upload to S3 (versus SnowCannon's built-in S3 sink)
That you will use Upstart as your service wrapper
That you will use Monit as your process monitor

If any of these assumptions are not true, then you may need to additionally consult the [SnowCannon README file] snowcannon-readme, the [Fluentd user manual] fluentd-manual and/or the manual for your preferred service daemon or process monitor.

Hardware and OS

If you can choose your hardware, we recommend running SnowCannon on a multi-core machine (preferably quad-core or better), to take advantage of SnowCannon's built-in clustering support.

If you can choose your OS, we recommend deploying SnowCannon on Ubuntu 12.04 LTS / Precise. This is the most modern Debian-based OS supported by Fluentd (which we use alongside SnowCannon).

Installing SnowCannon

Section to come.

Configuring SnowCannon

Section to come.

Setting up Fluentd

Overview

Fluentd fluentd is a lightweight log collector from the team at [Treasure Data] treasure-data. SnowCannon supports Fluentd as an event sink - and Fluentd in turn supports a variety of different "output plugins", meaning that SnowCannon to use Fluentd to send Snowplow events to Amazon S3 as well as other data stores.

Installation options

As explained in the [Fluentd installation guide] fluentd-install, there are four ways to install Fluentd:

Binary package (as td-agent, maintained by Treasure Data)
RubyGem
.tar.gz file
Git repository

If you are running one of the OSes supported by td-agent, then we strongly recommend installing the binary package, as it the most straightforward and also bundles jemalloc. If the binary package is not available, then we recommend the RubyGem installation using RVM.

Installation

Binary package

To install the binary package, please follow the appropriate [td-agent installation instructions] td-agent-install.

Happily, with the td-agent binary packages, the Amazon S3 output plugin fluent-plugin-s3 is installed by default.

Manual installation

For any of the three manual installation options, please consult the [Fluentd installation guide] fluentd-install.

With the manual options, you will need to install jemalloc and the Amazon S3 output plugin fluent-plugin-s3 separately. Installing jemalloc is out of scope of this guide; happily installing the Amazon S3 plugin is easy:

$ fluent-gem install fluent-plugin-s3

Testing

To confirm that Fluentd was installed successfully, run the following commands:

$ cd ~
$ fluentd --setup ./fluent
$ fluentd -c ./fluent/fluent.conf -vv &
$ echo '{"json":"message"}' | fluent-cat debug.test

The last command sends Fluentd a message ‘{“json”:”message”}’ with a “debug.test” tag. If the installation was successful, Fluentd will output the following message:

2011-07-10 16:49:50 +0900 debug.test: {"json":"message"}

Configuration

Setting up log folders

To come

Updating fluent.conf

To come

Rest of section to come

HOME > SNOWPLOW SETUP GUIDE > Step 1: Setup a Collector

Setup Snowplow

[Step 1: Setup a Collector] (setting-up-a-collector)
[Step 2: Setup a Tracker] (setting-up-a-tracker)
[Step 3: Setup EmrEtlRunner] (setting-up-EmrEtlRunner)
[Step 4: Setup alternative data stores] (setting-up-alternative-data-stores)
[Step 5: Analyze your data!] (Getting started analyzing Snowplow data)

Useful resources

Provide feedback

Saved searches

Use saved searches to filter your results more quickly