Skip to content
This repository was archived by the owner on Jan 29, 2022. It is now read-only.

MapReduce Usage

Luke Lovett edited this page Feb 13, 2015 · 9 revisions

This page describes how to use the MongoDB Hadoop Connector with vanilla MapReduce.

Installation

  1. Obtain the MongoDB Hadoop Connector. You can either build it or download the jars. The releases page also includes instructions for use with Maven and Gradle. For Hive, you'll need the "core" jar and the "hive" jar.
  2. Get a JAR for the MongoDB Java Driver.
  3. Each node in the cluster will need to have access to the MongoDB Hadoop Connector JARs as well as the JAR for the MongoDB Java Driver. You can provision each machine in the cluster with the necessary JARs in $HADOOP_HOME/lib, or you may use the Hadoop DistributedCache to distribute the JARs to pre-existing nodes.

Writing MapReduce Jobs

There are a number of examples for writing MapReduce jobs using the MongoDB Hadoop Connector.

Clone this wiki locally