-
Notifications
You must be signed in to change notification settings - Fork 25
Lab Assignment 1 Finding Facebook mutual friends using Map Reduce , Comparing Hbase and Cassandra
Source Code : Click Here
Video/Demo : Youtube-click here
This lab assignment deals with understanding the concepts of hadoop - map reduce and also implementing a map reduce algorithm to find the mutual friends concept. The second part deals with comparison of Hbase and Cassandra based on a use case and user own data set.
1. Implement MapReduce algorithm for finding Facebook common friends problem and run the MapReduce job on Apache Hadoop. Show your implementation through map-reduce diagram
First, take the input as discussed in the use case as "A -> B C D, B -> A C D E, C -> A B D E, D -> A B C E, E -> B C D" Then in map phase find the mutual friends of two people. Group them based on the mutual pair key and finally reduce them to get the mutual friends list.
Create a mapper class as shown in the code snippet below. Each line of the input file is split based on "tab". Then its length is computed = 2, where the first part is source or base user and the rest of the split is considered as list of friends of the user. Then the keys are prepared as (A,B) or (B,A) based on the integer values of A & B in the input.
Create a reducer class where the data is grouped based on the key values (A,B) or (B,C) and their list of friends as produced. Then finally reduced to find the mutual friends of (A,B).
A main method which acts as a driver to set mapper and reducer class which takes the input and produces the output.
The input file for this can be found at https://github.com/Ruthvicp/CS5590_BigDataProgramming/blob/master/Lab/Lab1/Source/MutualFriend/Input3/demo.txt
Hadoop map reduce is very efficient in finding the common/mutual friends of two users when their list of friends are grouped together and filtered using reduce operation
Representing the mutual friends problem using the map reduce diagram
Consider the use case of Netflix. We create a Netflix users database model in order to find the users based on region, last activity of trial users, find the paid users and their favorite genre to provide recommendations to the user.
a)Consider netflix use case and use a simple data set. Describe the use case considered based on your assumptions, report the data set, its fields, datatype etc.
b)Use HBase to implement a Solution for the use case. Report at least 3 queries, their input and output. The query’s relevance towards solving the use case is important.
c)Use Cassandra to implement a Solution for the use case. Report at least 3 queries, their input and output. The query’s relevance towards solving the use case is important.
d)Compare Cassandra and HBase for your use case. Present a table with comparison of your use case being implemented in both NO SQL Systems.
Cassandra : Create a table in Cassandra to store the data set as shown below.
Hbase : In Hbase as well, create a similar table to process the Netflix users data.
**Cassandra Queries : **
-
Insert data
-
Find inactive users
-
Find paid users
-
Find trial end date of a new user
HBase Queries :
-
Find trial users
-
Find users who watched a particular movie on netflix
-
Find the region and other personal details of user
The input data set for both cassandra and hbase can be found here (Netflix data)
Cassandra Key characteristics involve High Availability, Minimal administration and No SPoF (Single Point of Failure) other side HBase is good for faster reading and writing the data with linear scalability.