You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardexpand all lines: README.md
+31-7
Original file line number
Diff line number
Diff line change
@@ -1,8 +1,26 @@
1
1
# Parsing Text Using Map-Reduce Programming Model
2
2
3
-
The evolution of big data systems is based on the foundational programming paradigm of Map-Reduce, involving high scale computation of data processing on a network of comodity hardware.This project is to illustrate on implementation of map-reduce and parallelize the process.
3
+
### TABLE OF CONTENTS
4
+
*[Objective](#objective)
5
+
*[Technologies](#technologies)
6
+
*[Data](#data)
7
+
*[Map-Reduce](#map-reduce)
8
+
*[Implementation](#implementation)
9
+
*[Results](#results)
4
10
5
-
<ins>**Concept of Map-Reduce**</ins>:
11
+
## OBJECTIVE
12
+
Perform processing of text and count the occurence of each word using map-reduce concept amd mimic Hadoop infrastructure with parallel processing. Multi-threading is used to execute two mapper and reducer functions.
13
+
14
+
## TECHNOLOGIES
15
+
Project is created with:
16
+
* Python - Multi-Threading
17
+
18
+
## DATA
19
+
The data is made available [here](https://github.com/skotak2/Pasrsing-Text-with-MapReduce-programming-Paradigm-with-multithreading/blob/master/Data/Data.txt)
Consider the following Text - "I am a human being. I am a Data Scientist"
8
26
@@ -29,16 +47,22 @@ Consider the following Text - "I am a human being. I am a Data Scientist"
29
47
30
48
Here we implement the concept of multithreading, to parallelize the process. Map Reduce is divided into sub tasks in parallel & aggregate teh results of sub-totals to final output. The process of mapping key to value and further aggregating them through reducers is achieved by the theards.
31
49
32
-
<ins>**Implementation:**</ins>
50
+
51
+
## IMPLEMENTATION
33
52
34
53
With the above concept in place, we implement the setup in the following steps:
35
54
36
-
**Step1** : Map for key value pairs with multiple mappers
55
+
*Step1* : Map for key value pairs with multiple mappers
37
56
38
-
**Step2** : Sort the values and load in to the partition holder
57
+
*Step2* : Sort the values and load in to the partition holder
39
58
40
-
**Step3** : Multiple Reducers to pic from the partition and aggregate them
59
+
*Step3* : Multiple Reducers to pic from the partition and aggregate them
41
60
42
61
The above steps will yield a list of outputs from the reducer, which could be concatenated and loaded into a datafram or a spreasheet
43
62
44
-
The code is available on - "mapreduce.py"
63
+
64
+
## RESULTS
65
+
The deployed model can be accessed from the url from any system to translate kannada sentences to english.
0 commit comments