Skip to content

Commit 8ad9c90

Browse files
committed
Add presentation text
1 parent b23b25b commit 8ad9c90

File tree

1 file changed

+127
-38
lines changed

1 file changed

+127
-38
lines changed

presentation.txt

+127-38
Original file line numberDiff line numberDiff line change
@@ -1,70 +1,159 @@
1-
------------------------
1+
------------------------------------------------
22

3-
Title:
4-
MapReduce in Scheme
3+
MapReduce System (MRS)
4+
Lars Johnson + Tej Kanwar
55

6-
------------------------
6+
[Diagram of Network]
77

8-
Objectives
8+
------------------------------------------------
99

10-
tl;dr Explore scheme / 6.945 ideas in a distribtued systems context.
10+
Project Objectives
1111

12-
------------------------
12+
* Explore applying concepts from 6.945 to systems programming
13+
14+
* Emphasize flexibility by creating a clean abstraction
15+
16+
* Enable operations on data sets in a simple and fundamental way
17+
18+
------------------------------------------------
19+
20+
What is MapReduce?
21+
22+
Google -- Jeffrey Dean and Sanjay Ghemawat
23+
24+
------------------------------------------------
1325

1426
What is MapReduce?
1527

16-
------------------------
28+
Map: (k, v) => (k', v')+
29+
30+
(doc-id, doc-text) => (word, 1)...
31+
32+
Reduce: (k', [list of v']) -> (k'', v'')
33+
34+
(word, [1, 1, 1 …]) => (word, total)
35+
36+
------------------------------------------------
37+
38+
Basic example using (mrs:run-computation)
39+
40+
[word count scheme code example]
41+
[+ simple network]
42+
43+
------------------------------------------------
44+
45+
Extending MapReduce:
46+
47+
Map: (k,v) --> (k', v')
48+
Filter: (k,v) --> #t/#f
49+
Reduce: (k,[v]) --> (k', v')
50+
Aggregate: [(k1,v1), (k2, v2), (k1, v3)] -->
51+
[(k1, [v1,v3]), (k2, [v2])]
52+
53+
Common theme: Operate on data sets
54+
[(k1, v1) (k2, v2) (k3, v3) ... ]
55+
56+
------------------------------------------------
57+
58+
Key Ideas:
59+
60+
1. Build a graph of data sets connected by operations
61+
62+
2. Feed data into data sets and it will be processed in a distributed
63+
manner across a worker pool
64+
65+
3. Abstraction system to allow for streaming implementations
66+
67+
4. Provide programmers with a combinator-like family of reusable parts
68+
69+
------------------------------------------------
70+
71+
Abstract Design
72+
73+
* Data set: (list (k1 . v1) (k2 . v2) (k3 . v3) ...)
74+
75+
* Operations: Full multi-map function (DS in --> DS out)
76+
- Combinator-like design, for higher-level operations
77+
- Special case: Aggregate
78+
79+
* Workers: Single instance of multi-map function
80+
- (k,v) in --> zero or more (k’, v’) out
81+
82+
------------------------------------------------
83+
84+
Combinators!
85+
86+
Reduce => [Aggregate + aggregated + Map]
87+
88+
------------------------------------------------
1789

18-
Types for map/reduce functions
90+
Design Implementation
1991

20-
------------------------
92+
User Operations:
93+
(mrs:map...)
94+
(mrs:reduce...)
95+
(mrs:create-data-set)
96+
(mrs:run-computation)
2197

22-
Simple examples of map/reduce paradigm
98+
Master:
99+
create-distributor
100+
create-data-sets
101+
manage data sets...
23102

24-
------------------------
103+
Distributor:
104+
create-workers
105+
distribute tasks
106+
poll data-sets and workers
25107

26-
[Have programming model, now discuss the actual system...]
108+
Worker:
109+
multi-map function
110+
111+
------------------------------------------------
27112

28-
MapReduce System:
29-
Master + Workers, etc.
113+
Design Implementation:
30114

31-
------------------------
115+
* Data sets: multi-reader, multi-writer queues of elements
32116

33-
How to implement in Scheme?
117+
* Operations: Distributor system with multi-map workers
118+
- Generic operators for flexibility
34119

35-
Need multitasking + interprocess communication
120+
* User control:
121+
- Build data sets and operation bindings
122+
- Feed inputs to data sets
123+
- (run-computation thunk)
36124

37-
------------------------
125+
------------------------------------------------
38126

39-
Many options:
40-
- conspire threads
41-
- actors
42-
- multiple processes..
127+
Demonstration
43128

44-
------------------------
129+
[Scheme + 2d/3d vector demo]
45130

46-
Extensions / Failure
131+
[Map/Reduce Network]
47132

48-
Failure
133+
------------------------------------------------
49134

50-
------------------------
135+
"Done" Propagation
51136

52-
Discuss interesting things we found while implementing?
137+
Problem:
138+
* Aggregator different from other operations
139+
* Branched case, looped case
53140

54-
- What are our abstractions that can be swapped out?
141+
Current solution:
142+
* Restrict to DAGs
143+
* Top-down propagation
55144

56-
- Demo?
145+
------------------------------------------------
57146

58-
- Instrument / stats?
147+
Future Work:
59148

60-
- Extensions?
149+
1) Correctly handle operation loops
150+
* Currently non-aggregator loops work
61151

62-
------------------------
152+
2) Proper REPL environment for interactive data set operations
63153

64-
User-facing methods / interface:
65-
interpreter?
154+
3) Additional implementation of generic workers, for multi-computer
155+
processing
66156

67-
------------------------
157+
4) Failure handling
68158

69-
Automatic inference / conversion to map/reduce paradigm?
70-
--- that'd be really cool to at least do something with?
159+
------------------------------------------------

0 commit comments

Comments
 (0)