Add presentation text

lrsjohnson · lrsjohnson · commit 8ad9c9061e1a · 2014-05-11T19:41:03.000-04:00
diff --git a/presentation.txt b/presentation.txt
@@ -1,70 +1,159 @@
-------------------------
+------------------------------------------------
 
-Title:
-MapReduce in Scheme
+MapReduce System (MRS)
+Lars Johnson + Tej Kanwar
 
-------------------------
+[Diagram of Network]
 
-Objectives
+------------------------------------------------
 
-tl;dr Explore scheme / 6.945 ideas in a distribtued systems context.
+Project Objectives
 
-------------------------
+* Explore applying concepts from 6.945 to systems programming
+
+* Emphasize flexibility by creating a clean abstraction
+
+* Enable operations on data sets in a simple and fundamental way
+
+------------------------------------------------
+
+What is MapReduce?
+
+Google -- Jeffrey Dean and Sanjay Ghemawat 
+
+------------------------------------------------
 
 What is MapReduce?
 
-------------------------
+Map: (k, v) => (k', v')+
+
+    (doc-id, doc-text) => (word, 1)...
+
+Reduce: (k', [list of v']) -> (k'', v'')
+
+    (word, [1, 1, 1 …]) => (word, total)
+
+------------------------------------------------
+
+Basic example using (mrs:run-computation)
+
+[word count scheme code example]
+[+ simple network]
+
+------------------------------------------------
+
+Extending MapReduce:
+
+Map:        (k,v) --> (k', v')
+Filter:     (k,v) --> #t/#f
+Reduce:     (k,[v]) --> (k', v')
+Aggregate:  [(k1,v1), (k2, v2), (k1, v3)] -->
+                     [(k1, [v1,v3]), (k2, [v2])]
+
+Common theme: Operate on data sets
+            [(k1, v1) (k2, v2) (k3, v3) ... ]
+	    
+------------------------------------------------
+
+Key Ideas:
+
+1. Build a graph of data sets connected by operations
+
+2. Feed data into data sets and it will be processed in a distributed
+manner across a worker pool
+
+3. Abstraction system to allow for streaming implementations
+
+4. Provide programmers with a combinator-like family of reusable parts
+
+------------------------------------------------
+
+Abstract Design
+
+* Data set: (list (k1 . v1) (k2 . v2)  (k3 . v3) ...)
+
+* Operations: Full multi-map function (DS in --> DS out)
+   - Combinator-like design, for higher-level operations
+   - Special case: Aggregate
+
+* Workers: Single instance of multi-map function
+   - (k,v) in --> zero or more (k’, v’) out
+
+------------------------------------------------
+
+Combinators!
+
+Reduce => [Aggregate + aggregated + Map]
+
+------------------------------------------------
 
-Types for map/reduce functions
+Design Implementation
 
-------------------------
+User Operations:
+  (mrs:map...)
+  (mrs:reduce...)
+  (mrs:create-data-set)
+  (mrs:run-computation)
 
-Simple examples of map/reduce paradigm
+Master:
+  create-distributor
+  create-data-sets
+  manage data sets...
 
-------------------------
+Distributor:
+  create-workers
+  distribute tasks
+  poll data-sets and workers
 
-[Have programming model, now discuss the actual system...]
+Worker:
+  multi-map function
+  
+------------------------------------------------
 
-MapReduce System:
-Master + Workers, etc.
+Design Implementation:
 
-------------------------
+* Data sets: multi-reader, multi-writer queues of elements
 
-How to implement in Scheme?
+* Operations: Distributor system with multi-map workers
+   - Generic operators for flexibility
 
-Need multitasking + interprocess communication
+* User control:
+   - Build data sets and operation bindings
+   - Feed inputs to data sets
+   - (run-computation thunk)
 
-------------------------
+------------------------------------------------
 
-Many options:
-- conspire threads
-- actors
-- multiple processes..
+Demonstration
 
-------------------------
+[Scheme + 2d/3d vector demo]
 
-Extensions / Failure
+[Map/Reduce Network]
 
-Failure
+------------------------------------------------
 
-------------------------
+"Done" Propagation
 
-Discuss interesting things we found while implementing?
+Problem:
+* Aggregator different from other operations
+* Branched case, looped case
 
-- What are our abstractions that can be swapped out?
+Current solution:
+* Restrict to DAGs
+* Top-down propagation
 
-- Demo?
+------------------------------------------------
 
-- Instrument / stats?
+Future Work:
 
-- Extensions?
+1) Correctly handle operation loops
+    * Currently non-aggregator loops work
 
-------------------------
+2) Proper REPL environment for interactive data set operations
 
-User-facing methods / interface:
-interpreter?
+3) Additional implementation of generic workers, for multi-computer
+processing
 
-------------------------
+4) Failure handling
 
-Automatic inference / conversion to map/reduce paradigm?
---- that'd be really cool to at least do something with?
+------------------------------------------------