Lab Assignment 1 Finding Facebook mutual friends using Map Reduce , Comparing Hbase and Cassandra

Team Id : 14

Member 1 : Ruthvic Punyamurtula

Class Id : 16

Member 2 : Shankar Pentyala

Class Id : 15

Source Code : Click Here

Introduction

This lab assignment deals with understanding the concepts of hadoop - map reduce and also implementing a map reduce algorithm to find the mutual friends concept. The second part deals with comparison of Hbase and Cassandra based on a use case and user own data set.

Objective

1. Implement MapReduce algorithm for finding Facebook common friends problem and run the MapReduce job on Apache Hadoop. Show your implementation through map-reduce diagram

Approach

First, take the input as discussed in the use case as "A -> B C D, B -> A C D E, C -> A B D E, D -> A B C E, E -> B C D" Then in map phase find the mutual friends of two people. Group them based on the mutual pair key and finally reduce them to get the mutual friends list.

Workflow

Create a mapper class as shown in the code snippet below. Each line of the input file is split based on "tab". Then its length is computed = 2, where the first part is source or base user and the rest of the split is considered as list of friends of the user. Then the keys are prepared as (A,B) or (B,A) based on the integer values of A & B in the input.

Create a reducer class where the data is grouped based on the key values (A,B) or (B,C) and their list of friends as produced. Then finally reduced to find the mutual friends of (A,B).

A main method which acts as a driver to set mapper and reducer class which takes the input and produces the output.

Data set and Parameter

The input file for this can be found at https://github.com/Ruthvicp/CS5590_BigDataProgramming/blob/master/Lab/Lab1/Source/MutualFriend/Input3/demo.txt

Evaluation

Hadoop map reduce is very efficient in finding the common/mutual friends of two users when their list of friends are grouped together and filtered using reduce operation

conclusion

Representing the mutual friends problem using the map reduce diagram

2. No SQL comparison - Cassandra Vs Hbase

Introduction

Consider the use case of Netflix. We create a Netflix users database model in order to find the users based on region, last activity of trial users, find the paid users and their favorite genre to provide recommendations to the user.

Objectives

a)Consider netflix use case and use a simple data set. Describe the use case considered based on your assumptions, report the data set, its fields, datatype etc.

b)Use HBase to implement a Solution for the use case. Report at least 3 queries, their input and output. The query’s relevance towards solving the use case is important.

c)Use Cassandra to implement a Solution for the use case. Report at least 3 queries, their input and output. The query’s relevance towards solving the use case is important.

d)Compare Cassandra and HBase for your use case. Present a table with comparison of your use case being implemented in both NO SQL Systems.

Approach

Cassandra : Create a table in Cassandra to store the data set as shown below.

Hbase : In Hbase as well, create a similar table to process the Netflix users data.

Workflow

**Cassandra Queries : **

Insert data
Find inactive users
Find paid users
Find trial end date of a new user

HBase Queries :

Find trial users
Find users who watched a particular movie on netflix
Find the region and other personal details of user

Datasets & parameters

The input data set for both cassandra and hbase can be found here (Netflix data)

Evaluation

Conclusion

Cassandra Key characteristics involve High Availability, Minimal administration and No SPoF (Single Point of Failure) other side HBase is good for faster reading and writing the data with linear scalability.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Lab Assignment 1 Finding Facebook mutual friends using Map Reduce , Comparing Hbase and Cassandra

Team Id : 14

Member 1 : Ruthvic Punyamurtula

Class Id : 16

Member 2 : Shankar Pentyala

Class Id : 15

Introduction

Objective

Approach

Workflow

Data set and Parameter

Evaluation

conclusion

2. No SQL comparison - Cassandra Vs Hbase

Introduction

Objectives

Approach

Workflow

Datasets & parameters

Evaluation

Conclusion

References

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally