Skip to content

Added sum_mapreduce.py #94

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
76 changes: 76 additions & 0 deletions SimpleAddition/sum_mapreduce.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
'''
This program uses Map reduce techniques to get sum of numers.
How it works:
We first call the mapper.py that prints the relevent data to the terminal.

Then the reducer is called that prints out the sum of the numbers.
This is just a very basic example that will not be of much use.
But if you have gigabytes of numbers ot add , you cna run it with hadoop streaming.
'''
#!/usr/bin/env python

import sys

def mapper():
# Read input line
for line in sys.stdin:
# Strip off whitespace , and split on tab
data = line.strip().split('\t')

# We'll use the data stored in data/mapreduce/
# Having six columns of tab seperated values
# We'll make sure that correct data in sent through
print data
if len(data) ==2:
sr , num = data

# Now we'll print our data as required for the reducer task
# I need category and cost so we'll print that
print "{0}".format(num)

mapper()

'''

# Testing :

test_data = """1 54
2 64
3 1
"""

def main():

# Used for testing the mapper function

import StringIO
sys.stdin = StringIO.StringIO(test_data)
mapper()
sys.stdin = sys.__stdin__

'''

def reducer():
# Following the mapper.py , we need to get total
# sum.
Total = 0

for line in sys.stdin:

data = line.strip().split("\t")

# Our mapper gives out 2 values

if len(data) != 1:
# This keeps check of the data errors
continue

thisnum = data
Total += float(thisnum)

print Total


# Calling the reducer function

#reducer()