-
Notifications
You must be signed in to change notification settings - Fork 0
SnowCannon setup guide
HOME > [SNOWPLOW SETUP GUIDE](Snowplow setup guide) > Step 1: setup a Collector > SnowCannon setup
For the purpose of this guide, we are going to assume:
- That you want to log the Snowplow event data collected by SnowCannon to Amazon S3
- That you want to use Fluentd to handle the actual file upload to S3 (versus SnowCannon's built-in S3 sink)
- That you will use Upstart as your service wrapper
- That you will use Monit as your process monitor
If any of these assumptions are not true, then you may need to additionally consult the [SnowCannon README file] snowcannon-readme, the [Fluentd user manual] fluentd-manual and/or the manual for your preferred service daemon or process monitor.
If you can choose your hardware, we recommend running SnowCannon on a multi-core machine (preferably quad-core or better), to take advantage of SnowCannon's built-in clustering support.
If you can choose your OS, we recommend deploying SnowCannon on Ubuntu 12.04 LTS / Precise. This is the most modern Debian-based OS supported by Fluentd (which we use alongside SnowCannon).
Section to come.
Section to come.
Fluentd fluentd is a lightweight log collector from the team at [Treasure Data] treasure-data. SnowCannon supports Fluentd as an event sink - and Fluentd in turn supports a variety of different "output plugins", meaning that SnowCannon to use Fluentd to send Snowplow events to Amazon S3 as well as other data stores.
As explained in the [Fluentd installation guide] fluentd-install, there are four ways to install Fluentd:
- Binary package (as
td-agent
, maintained by Treasure Data) - RubyGem
-
.tar.gz
file - Git repository
If you are running one of the OSes supported by td-agent
, then we strongly recommend installing the binary package, as it the most straightforward and also bundles jemalloc. If the binary package is not available, then we recommend the RubyGem installation using RVM.
To install the binary package, please follow the appropriate [td-agent installation instructions] td-agent-install.
Happily, with the td-agent
binary packages, the Amazon S3 output plugin fluent-plugin-s3 is installed by default.
For any of the three manual installation options, please consult the [Fluentd installation guide] fluentd-install.
With the manual options, you will need to install jemalloc and the Amazon S3 output plugin fluent-plugin-s3 separately. Installing jemalloc is out of scope of this guide; happily installing the Amazon S3 plugin is easy:
$ fluent-gem install fluent-plugin-s3
To confirm that Fluentd was installed successfully, run the following commands:
$ cd ~
$ fluentd --setup ./fluent
$ fluentd -c ./fluent/fluent.conf -vv &
$ echo '{"json":"message"}' | fluent-cat debug.test
The last command sends Fluentd a message ‘{“json”:”message”}’ with a “debug.test” tag. If the installation was successful, Fluentd will output the following message:
2011-07-10 16:49:50 +0900 debug.test: {"json":"message"}
To come
To come
Rest of section to come
Home | About | Project | Setup Guide | Technical Docs | Copyright © 2012-2013 Snowplow Analytics Ltd
HOME > SNOWPLOW SETUP GUIDE > Step 1: Setup a Collector
- [Step 1: Setup a Collector] (setting-up-a-collector)
- [Step 2: Setup a Tracker] (setting-up-a-tracker)
- [Step 3: Setup EmrEtlRunner] (setting-up-EmrEtlRunner)
- [Step 4: Setup alternative data stores] (setting-up-alternative-data-stores)
- [Step 5: Analyze your data!] (Getting started analyzing Snowplow data)
Useful resources