|
1 |
| ------------------------- |
| 1 | +------------------------------------------------ |
2 | 2 |
|
3 |
| -Title: |
4 |
| -MapReduce in Scheme |
| 3 | +MapReduce System (MRS) |
| 4 | +Lars Johnson + Tej Kanwar |
5 | 5 |
|
6 |
| ------------------------- |
| 6 | +[Diagram of Network] |
7 | 7 |
|
8 |
| -Objectives |
| 8 | +------------------------------------------------ |
9 | 9 |
|
10 |
| -tl;dr Explore scheme / 6.945 ideas in a distribtued systems context. |
| 10 | +Project Objectives |
11 | 11 |
|
12 |
| ------------------------- |
| 12 | +* Explore applying concepts from 6.945 to systems programming |
| 13 | + |
| 14 | +* Emphasize flexibility by creating a clean abstraction |
| 15 | + |
| 16 | +* Enable operations on data sets in a simple and fundamental way |
| 17 | + |
| 18 | +------------------------------------------------ |
| 19 | + |
| 20 | +What is MapReduce? |
| 21 | + |
| 22 | +Google -- Jeffrey Dean and Sanjay Ghemawat |
| 23 | + |
| 24 | +------------------------------------------------ |
13 | 25 |
|
14 | 26 | What is MapReduce?
|
15 | 27 |
|
16 |
| ------------------------- |
| 28 | +Map: (k, v) => (k', v')+ |
| 29 | + |
| 30 | + (doc-id, doc-text) => (word, 1)... |
| 31 | + |
| 32 | +Reduce: (k', [list of v']) -> (k'', v'') |
| 33 | + |
| 34 | + (word, [1, 1, 1 …]) => (word, total) |
| 35 | + |
| 36 | +------------------------------------------------ |
| 37 | + |
| 38 | +Basic example using (mrs:run-computation) |
| 39 | + |
| 40 | +[word count scheme code example] |
| 41 | +[+ simple network] |
| 42 | + |
| 43 | +------------------------------------------------ |
| 44 | + |
| 45 | +Extending MapReduce: |
| 46 | + |
| 47 | +Map: (k,v) --> (k', v') |
| 48 | +Filter: (k,v) --> #t/#f |
| 49 | +Reduce: (k,[v]) --> (k', v') |
| 50 | +Aggregate: [(k1,v1), (k2, v2), (k1, v3)] --> |
| 51 | + [(k1, [v1,v3]), (k2, [v2])] |
| 52 | + |
| 53 | +Common theme: Operate on data sets |
| 54 | + [(k1, v1) (k2, v2) (k3, v3) ... ] |
| 55 | + |
| 56 | +------------------------------------------------ |
| 57 | + |
| 58 | +Key Ideas: |
| 59 | + |
| 60 | +1. Build a graph of data sets connected by operations |
| 61 | + |
| 62 | +2. Feed data into data sets and it will be processed in a distributed |
| 63 | +manner across a worker pool |
| 64 | + |
| 65 | +3. Abstraction system to allow for streaming implementations |
| 66 | + |
| 67 | +4. Provide programmers with a combinator-like family of reusable parts |
| 68 | + |
| 69 | +------------------------------------------------ |
| 70 | + |
| 71 | +Abstract Design |
| 72 | + |
| 73 | +* Data set: (list (k1 . v1) (k2 . v2) (k3 . v3) ...) |
| 74 | + |
| 75 | +* Operations: Full multi-map function (DS in --> DS out) |
| 76 | + - Combinator-like design, for higher-level operations |
| 77 | + - Special case: Aggregate |
| 78 | + |
| 79 | +* Workers: Single instance of multi-map function |
| 80 | + - (k,v) in --> zero or more (k’, v’) out |
| 81 | + |
| 82 | +------------------------------------------------ |
| 83 | + |
| 84 | +Combinators! |
| 85 | + |
| 86 | +Reduce => [Aggregate + aggregated + Map] |
| 87 | + |
| 88 | +------------------------------------------------ |
17 | 89 |
|
18 |
| -Types for map/reduce functions |
| 90 | +Design Implementation |
19 | 91 |
|
20 |
| ------------------------- |
| 92 | +User Operations: |
| 93 | + (mrs:map...) |
| 94 | + (mrs:reduce...) |
| 95 | + (mrs:create-data-set) |
| 96 | + (mrs:run-computation) |
21 | 97 |
|
22 |
| -Simple examples of map/reduce paradigm |
| 98 | +Master: |
| 99 | + create-distributor |
| 100 | + create-data-sets |
| 101 | + manage data sets... |
23 | 102 |
|
24 |
| ------------------------- |
| 103 | +Distributor: |
| 104 | + create-workers |
| 105 | + distribute tasks |
| 106 | + poll data-sets and workers |
25 | 107 |
|
26 |
| -[Have programming model, now discuss the actual system...] |
| 108 | +Worker: |
| 109 | + multi-map function |
| 110 | + |
| 111 | +------------------------------------------------ |
27 | 112 |
|
28 |
| -MapReduce System: |
29 |
| -Master + Workers, etc. |
| 113 | +Design Implementation: |
30 | 114 |
|
31 |
| ------------------------- |
| 115 | +* Data sets: multi-reader, multi-writer queues of elements |
32 | 116 |
|
33 |
| -How to implement in Scheme? |
| 117 | +* Operations: Distributor system with multi-map workers |
| 118 | + - Generic operators for flexibility |
34 | 119 |
|
35 |
| -Need multitasking + interprocess communication |
| 120 | +* User control: |
| 121 | + - Build data sets and operation bindings |
| 122 | + - Feed inputs to data sets |
| 123 | + - (run-computation thunk) |
36 | 124 |
|
37 |
| ------------------------- |
| 125 | +------------------------------------------------ |
38 | 126 |
|
39 |
| -Many options: |
40 |
| -- conspire threads |
41 |
| -- actors |
42 |
| -- multiple processes.. |
| 127 | +Demonstration |
43 | 128 |
|
44 |
| ------------------------- |
| 129 | +[Scheme + 2d/3d vector demo] |
45 | 130 |
|
46 |
| -Extensions / Failure |
| 131 | +[Map/Reduce Network] |
47 | 132 |
|
48 |
| -Failure |
| 133 | +------------------------------------------------ |
49 | 134 |
|
50 |
| ------------------------- |
| 135 | +"Done" Propagation |
51 | 136 |
|
52 |
| -Discuss interesting things we found while implementing? |
| 137 | +Problem: |
| 138 | +* Aggregator different from other operations |
| 139 | +* Branched case, looped case |
53 | 140 |
|
54 |
| -- What are our abstractions that can be swapped out? |
| 141 | +Current solution: |
| 142 | +* Restrict to DAGs |
| 143 | +* Top-down propagation |
55 | 144 |
|
56 |
| -- Demo? |
| 145 | +------------------------------------------------ |
57 | 146 |
|
58 |
| -- Instrument / stats? |
| 147 | +Future Work: |
59 | 148 |
|
60 |
| -- Extensions? |
| 149 | +1) Correctly handle operation loops |
| 150 | + * Currently non-aggregator loops work |
61 | 151 |
|
62 |
| ------------------------- |
| 152 | +2) Proper REPL environment for interactive data set operations |
63 | 153 |
|
64 |
| -User-facing methods / interface: |
65 |
| -interpreter? |
| 154 | +3) Additional implementation of generic workers, for multi-computer |
| 155 | +processing |
66 | 156 |
|
67 |
| ------------------------- |
| 157 | +4) Failure handling |
68 | 158 |
|
69 |
| -Automatic inference / conversion to map/reduce paradigm? |
70 |
| ---- that'd be really cool to at least do something with? |
| 159 | +------------------------------------------------ |
0 commit comments