SnowCannon is an event collector server for SnowPlow snowplow, written in node.js by Simon Rumble [email protected].
There are three benefits of using SnowCannon over SnowPlow's [CloudFront-based collector] cloudfront-collector:
- Allows use of a third-party cookie, making user tracking across domains possible for all except mobile Safari. This is something even the Google Analytics JS-set cookies approach struggles with
- Enables real-time analytics, if you want it. CloudFront has 10-20 minute delays to get the logs into S3
- Supports (via Fluentd fluentd) saving SnowPlow events to data stores other than Amazon S3 - e.g. Google Storage or Cassandra
In a nutshell: SnowCannon receives events from the [SnowPlow JavaScript tracker] snowplow-js, sets/updates a third-party user tracking cookie, and logs the events in gzipped files to Amazon S3.
In pseudocode terms:
if (request contains an "sp" cookie) {
Record that cookie as the user identifier
Set that cookie with a now+1 year cookie expiry
Add the headers and payload to the output array
} else {
Set the "sp" cookie with a now+1 year cookie expiry
Add the headers and payload to the output array
}
Every 10 minutes, push the data out to S3 as a gzipped JSON file
SnowCannon depends on the following NPM packages:
- cluster cluster, creates workers to exploit multi-core servers
- knox knox, an Amazon S3 library
- node-uuid node-uuid, generates UUIDs for user tracking
- measured measured, for SnowCannon performance metrics
- fluent-logger fluent-logger, for logging events to Fluentd fluentd
Note that using Fluentd or Knox for logging is optional - but the dependencies are required.
First, you need to [install Node.js] node-install.
Now checkout and install SnowCannon:
$ git clone git://github.com/shermozle/SnowCannon.git
$ cd SnowCannon
$ npm install .
And finally run it:
$ node snowcannon.js
If you are using the default HTTP port 80, you may need to run node using sudo
(and potentially supply the full path to your node binary).
You can run a manual check on SnowCannon by loading the following web page in your web browser:
tests/manual/async.html
To check this is working:
- Use e.g. Chrome Developer Tools to check that
ice.png
is being successfully fetched from your SnowCannon instance running onhttp://localhost
. - Check that your events are being printed to
stdout
in your terminal running SnowCannon
SnowCannon supports three different event sinks:
- stdout - events (and only events) are printed to
stdout
. Use your process control system (e.g. supervisord or daemontools) to handle thestdout
eventstream (e.g. uploading it to S3 or Google Storage). Also useful for debugging - s3 - events are collected by SnowCannon, and then regularly gzipped and uploaded to S3 by SnowCannon itself
- fluentd - events are sent by SnowCannon to Fluentd fluentd, a lightweight log collector. Configure Fluentd to forward the events on to S3, Google Storage or equivalent
You can configure your event sink in the config.js
file - to take each sink option in turn:
SnowCannon's event sink is set to stdout by default - you don't need to change anything to use this sink.
To change the event sink to s3, set the config.sink.out
variable like so:
config.sink.out = "s3";
And then update the following configuration section:
// S3 bucket name
config.sink.s3.bucket = 'S3 BUCKET NAME GOES HERE';
// AWS access details
config.sink.s3.key = process.env.AWS_ACCESS_KEY_ID || 'KEY GOES HERE IF ENV NOT SET';
config.sink.s3.secret = process.env.AWS_SECRET_KEY || 'SECRET GOES HERE IF ENV NOT SET';
Note that you do not have to add your AWS access details into this file if they are already available to node.js in your shell environment.
To change the event sink to fluentd, set the config.sink.out
variable like so:
config.sink.out = "fluentd";
Depending on how you have configured Fluentd is configured, you should be able to leave the config.sink.fluentd
variables untouched - they correspond to the Fluentd default configuration found in:
etc/fluentd/fluent.conf
Please note that setting up and configuring Fluentd is out of scope of this README - but the SnowPlow team has included instructions on this as part of their [SnowCannon Setup Guide] snowcannon-setup-guide.
Tested on an Amazon EC2 Small with Siege, SnowCannon handles up to about 10,000 concurrent requests a second before it starts dropping connections.
When deploying node.js apps to production, it is [generally recommended] node-js-deployment that you additionally setup:
- A service wrapper - e.g. Upstart upstart or Forever forever. A service wrapper runs SnowCannon in a separate process and restarts SnowCannon if it dies
- A process monitor - e.g. Monit monit or God god. A utility for monitoring and managing processes; use it to send a simple HTTP request to SnowCannon and restart SnowCannon if it does not respond
Setting up a service wrapper and process monitor is out of scope of this README; however the SnowPlow team will be including instructions on setting up Upstart and Monit as part of their [SnowCannon Setup Guide] snowcannon-setup-guide. In the meantime, default Upstart and Monit configuration files can be found here:
etc/monit/monit.conf
etc/upstart/snowcannon.conf
- Work on supporting infrastructure of auto-scaling and load balancing
SnowCannon is copyright 2012 Simon Rumble [email protected].
Licensed under the [Apache License, Version 2.0] license (the "License"); you may not use this software except in compliance with the License.
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.