index.html

<!DOCTYPE HTML>
<html>
<head>
    <meta http-equiv="content-type" content="text/html; charset=utf-8">
    <script src="http://yandex.st/highlightjs/6.0/highlight.min.js"></script>
    <link rel="stylesheet" href="http://yandex.st/highlightjs/6.0/styles/github.min.css">
    <link href='http://fonts.googleapis.com/css?family=Droid+Sans:400,700|Droid+Serif:400,400italic|Inconsolata' rel='stylesheet'>
    <script>hljs.initHighlightingOnLoad();</script>
    <title>Gevent Tutorial</title>
</head>
<style>
    body {
        font: 13px/20px "Droid Sans",Arial,"Helvetica Neue","Lucida Grande",sans-serif;

        color: #4C4C4C;
        background-color: #c9dbb2;
        margin-left: 0px;
        margin-top: 5px;
    }

    aside {
        position: fixed;
        float: left;
        width: 150px;

        background-color: #FAFBFC;
        border: 1px solid #FAFBFC;

        -webkit-border-top-right-radius: 10px;
        -webkit-border-bottom-right-radius: 10px;
        -moz-border-radius-topright: 10px;
        -moz-border-radius-bottomright: 10px;

        border-top-right-radius: 10px;
        border-bottom-right-radius: 10px;

        padding-left: 5px;
        height: 100%;
    }

    aside ul {
        margin: 0px;
        padding: 0px;
        list-style-type: none;
    }

    aside ul ul {
        font-size: 80%;
        padding-left: 5px;
    }

    #content {
        max-width: 960px;
        margin-left: 160px;
        padding: 10px;
        border: 1px solid #FAFBFC;

        background-color: #FAFBFC;

        -webkit-border-radius: 10px;
        -moz-border-radius: 10px;
        border-radius: 10px;
       
        -webkit-box-shadow: 1px 1px 1px 1px rgba(0, 0, 0, 0.5);
        -moz-box-shadow: 1px 1px 1px 1px rgba(0, 0, 0, 0.5);
        box-shadow: 1px 1px 1px 1px rgba(0, 0, 0, 0.5);
    }

    a {
        color: inherit;
    }

    pre {
        border: 1px solid #ccc;
        background-color: ghostWhite;
        font: 15px/19px Inconsolata,"Lucida Console",Terminal,"Courier New",Courier;

        -webkit-border-radius: 10px;
        -moz-border-radius: 10px;
        border-radius: 10px;
        padding: 5px;
    }

    p code {
        font-family: monospace;
        border-bottom: 1px dashed #ccc;

    }

    .label { 
        display: block;
        margin-bottom: 0px;
        padding: 0px;
        font-family: sans;
        font-weight: bold;
        width: 200px;
        padding: 0.5em;
    }

    .author {
        text-align: center;
    }

    .email {
        text-align: center;
        font-size: 10pt;
    }


    h1 {
        text-align: center;
        padding-bottom: 5px;
    }

    h2 {
        padding-bottom: 10px;
    }

    .green {
        color: #7c9a5e;
    }
</style>
<body>


<aside>
    <ul>
        <h3>Contents</h3>
        <li>
            <a href="#intro">
                Introduction
            </a>
        </li>
        <li>
            <a href="#parallel">
                Greenlets
            </a>
        </li>
        <li>
            <a href="#structures">
                Data Structures
            </a>

            <ul>
                <li>Events</li>
                <li>Queues</li>
                <li>Locks and Semaphores</li>
                <li>Actor Model</li>
                <li>Groups and Pools</li>
            </ul>

        </li>
        <li>
            <a href="#applications">
                Real World Applications
            </a>
            <ul>
                <li>Holding Side Effects</li>
                <li>WSGI Servers</li>
                <li>Long Polling</li>
                <li>Chat Server</li>
            </ul>
        </li>

        <h3>Other Resources</h3>

        <li>
            <a href="http://www.gevent.org/contents.html">Gevent Docs</a>
        </li>
    </ul>
</aside>

<div id="content">

<header>
    <h1><span class="green">gevent</span> For the Working Python Developer</h1>
    <h3 class="author">
        <a href="http://www.stephendiehl.com">Stephen Diehl</a>
    </h3>
</header>

<blockquote>
gevent is a concurrency library based around libev.  It provides a clean API for a variety of concurrency and network related tasks.
</blockquote>

<!-- ================================================= -->
<section id="intro">

<h1>Intro</h1>


<p>The structure of this tutorial assumes an intermediate level
knowledge of Python but not much else. No knowledge of
concurrency is expected. The goal is to give you
the tools you need to get going with gevent and use it to solve
or speed up your applications today.</p>

<p>
The primary pattern provided by gevent is the <strong>Greenlet</strong>, a
lightweight coroutine provided to Python as a C extension module.
Greenlets all run inside of the OS process for the main
program but are scheduled cooperatively by libevent. This differs from
<a href="http://docs.python.org/library/subprocess.html">subprocceses</a> 
which are new processes are spawned by the OS.
</p>

</section>
<!-- ================================================= -->

<!-- ================================================= -->
<section id="parallel">

<h2>Greenlets</h2>

<h3>Synchronous & Asynchronous Execution</h3>

The core idea of concurrency is that a larger task can be broken
down into a collection of subtasks whose operation does not
depend on the other tasks and thus can be run side-by-side (
<strong>asynchronously</strong> ) instead of one at a time (
<strong>synchronously</strong> ). For example:

<p>
A somewhat synthetic example defines a <code>task</code> function
which is <strong>non-deterministic</strong> 
(i.e. its output is not guaranteed to give the same result for
the same inputs). In this case the side effect of running the
function is that the task pauses its execution for a random
number of seconds.
</p>

<pre>
<code class="python">import gevent
import random

def task(pid):
    """
    Some non-deterministic task
    """
    gevent.sleep(random.randint(0,2))
    print 'Task', pid, 'done'

def synchronous():
    for i in range(1,10):
        task(i)

def asynchronous():
    threads = []
    for i in range(1,10):
        threads.append(gevent.spawn(task, i))
    gevent.joinall(threads)

print 'Synchronous:'
synchronous()

print 'Asynchronous:'
asynchronous()
</code>
</pre>

<p>
In the synchronous case all the tasks are run sequentially,
which results in the main programming <strong>blocking</strong> (
i.e. pausing the execution of the main program )
while each task executes.
</p>

The important parts of the program are the
<code>gevent.spawn</code> which wraps up the given function
inside of a Greenlet thread. The list of initialized greenlets 
are stored in the array <code>threads</code> which is passed to
the <code>gevent.joinall</code> function which blocks the current
program to run all the given greenlets. The execution will step
forward only when all the greenlets terminate.

<p>The output is:</p>

<pre>
<code>
Synchronous:
Task 1 done
Task 2 done
Task 3 done
Task 4 done
Task 5 done
Task 6 done
Task 7 done
Task 8 done
Task 9 done
Task 10 done
Asynchronous:
Task 2 done
Task 3 done
Task 5 done
Task 10 done
Task 8 done
Task 6 done
Task 9 done
Task 1 done
Task 4 done
Task 7 done
</code>
</pre>

<p>
The important fact to notice is that the order of execution in
the async case is essentially random and that the total execution
time in the async case is much less than the sync case. In fact
the maximum time for the synchronous case to complete is when
each tasks pauses for 2 seconds resulting in a 20 seconds for the
whole queue. In the async case the maximum runtime is roughly 2
seconds since none of the tasks block the execution of the
others.
</p>

<p>A more common use case, fetching data from a server
asynchronously, the runtime of <code>fetch()</code> will differ between
requests given the load on the remote server.</p>

<pre><code class="python">import gevent
import urllib2
import simplejson as json

def fetch(pid):
    response = urllib2.urlopen('http://json-time.appspot.com/time.json')
    result = response.read()
    json_result = json.loads(result)
    datetime = json_result['datetime']

    print 'Process ', pid, datetime
    return json_result['datetime']

def synchronous():
    for i in range(1,10):
        fetch(i)

def asynchronous():
    threads = []
    for i in range(1,10):
        threads.append(gevent.spawn(fetch, i))
    gevent.joinall(threads)

print 'Synchronous:'
synchronous()

print 'Asynchronous:'
asynchronous()
</code>
</pre>

<h3>Race Conditions</h3>

<p>The perennial problem involved with concurrency is known as a
<b>race condition</b>. Simply put is when two concurrent threads
/ processes depend on some shared resource but also attempt to
modify this value. This results in resources whose values become
time-dependent on the execution order. This is a problem, and in
general one should very much try to avoid race conditions since
they result program behavior which is globally non-deterministic.</p>

<p>One approach to avoiding race conditions is to simply not
have any global <b>state</b> shared between threads. To
communicate threads instead pass stateless messages between each 
other.</p>

</section>

<!-- ================================================= -->
<section>
<h3>Spawning threads</h3>

gevent provides a few wrappers around Greenlet initialization.
Some of the most common patterns are:

<pre>
<code class="python">import gevent
from gevent import Greenlet

def foo(message, n):
    """
    Each thread will be passed the message, and n arguments
    in its initialization.
    """
    print message
    gevent.sleep(n)

# Initialize a new Greenlet instance running the named function
# foo
thread1 = Greenlet.spawn(foo, "Hello", 1)
thread1.start()

# Wrapper for creating and runing a new Greenlet from the named 
# function foo, with the passd arguments
thread2 = gevent.spawn(foo, "I live!", 2)

# Lambda expressions
thread3 = gevent.spawn(lambda x: (x+1), 2)

threads = [thread1, thread2, thread3]

# Block until all threads complete.
gevent.joinall(threads)
</code>
</pre>

<p>
In addition to using the base Greenlet class, you may also subclass
Greenlet class and overload the <code>_run</code> method.
</p>

<pre>
<code class="python">from gevent import Greenlet

class MyGreenlet(Greenlet):

    def __init__(self, message, n):
        Greenlet.__init__(self)
        self.message = message
        self.n = n

    def _run(self):
        print self.message
        gevent.sleep(self.n)

g = MyGreenlet("Hi there!", 3)
g.start()
g.join()
</code>
</pre>

</section>

<!-- ================================================= -->

<section>
<h3>Greenlet State</h3>

<p>Like any other segement of code Greenlets can fail in various
ways. A greenlet may fail throw an exception, fail to halt or
consume too many system resources.</p>

<p>The internal state of a greenlet is generally a time-dependent
parameter. There are a number of flags on greenlets which let
you monitor the state of the thread</p>

<ul>
    <li><code>started</code> -- Boolean, indicates whether the Greenlet has been started. </li>
    <li><code>ready()</code> -- Boolean, indicates whether the Greenlet has halted</li>
    <li><code>successful()</code> -- Boolean, indicates whether the Greenlet has halted and not thrown an exception</li>
    <li><code>value</code> -- arbitrary, the value returned by the Greenlet</li>
    <li><code>exception</code> -- exception, uncaught exception instance thrown inside the greenlet</li>
</ul>

<pre>
<code class="python">import gevent

def win():
    return 'You win!'

def fail():
    raise Exception('You fail at failing.')

winner = gevent.spawn(win)
loser = gevent.spawn(fail)

print winner.started # True
print loser.started  # True

# Exceptions raised in the Greenlet, stay inside the Greenlet.
try:
    gevent.joinall([winner, loser])
except Exception as e:
    print 'This will never be reached'

print winner.value # 'You win!'
print loser.value  # None

print winner.ready() # True
print loser.ready()  # True

print winner.successful() # True
print loser.successful()  # False

# The exception raised in fail, will not propogate outside the
# greenlet. A stack trace will be printed to stdout but it
# will not unwind the stack of the parent.

print loser.exception

# It is possible though to raise the exception again outside
raise loser.exception
# or with
loser.get()
</code>
</pre>

<p>
Greenlets that fail to yield when the main program receives a
SIGQUIT may hold the program's execution longer than expected.
This results in so called "zombie processes" which need to be
killed from outside of the Python interpreter.
</p>

A common pattern is to listen SIGQUIT events on the main program
and to invoke <code>gevent.shutdown</code> before exit.

<pre>
<code class="python">import gevent

def run_forever():
    gevent.sleep(1000)

def main():
    thread = gevent.spawn(run_forever)

if __name__ == '__main__':
    try:
        main()
    except KeyboardInterrupt:
        gevent.shutdown()
</code>
</pre>

</section>

<!-- ================================================= -->

<section>
<h3>Timeouts</h3>

Timeouts are a constraint on the runtime of a block of code or a
Greenlet.

<pre>
<code class="python">
from gevent import Timeout

timeout = Timeout(seconds)
timeout.start()

def wait():
    gevent.sleep(10)

try:
    gevent.spawn(wait).join()
except Timeout:
    print 'Could not complete'

</code>
</pre>

Or with a context manager in a <code>with</code> a statement.

<pre>
<code class="python">import gevent
from gevent import Timeout

time_to_wait = 5 # seconds

class TooLong(Exception):
    pass

with Timeout(time_to_wait, TooLong):
    gevent.sleep(10)
</code>
</pre>

<!-- ================================================= -->

In addition, gevent also provides timeout arguments for a
variety of Greenlet and data stucture related calls. For example:

<pre>
<code class="python">import gevent
from gevent import Timeout

def wait():
    gevent.sleep(2)

timer = Timeout(1).start()

thread1 = gevent.spawn(wait)
thread1.join(timeout=timer)

# --

timer = Timeout.start_new(1)

thread2 = gevent.spawn(wait)
thread2.get(timeout=timer)

# --

gevent.with_timeout(1, wait)
</code>
</pre>

</section>

<!-- ================================================= -->

<section id="structures">
<h2>Data Structures</h2>

<h3>Events</h3>

Events are a form of asynchronous communication between
Greenlets.

<!--
<pre>
<code class="python">
</code>
</pre>
-->

</section>

<!-- ================================================= -->

<section>
<h3>Queues</h3>

<p>Queues are ordered sets of data that have the usual <code>put</code> / <code>get</code>
operations but are written in a way such that they can be safely
manipulated across Greenlets.</p>

<p>
For example if one Greenlet grabs an item off of the queue, the
same item will not grabbed by another Greenlet executing
simultaneously.
</p>

<pre>
<code class="python">import gevent
from gevent.queue import Queue

tasks = Queue()

def worker(n):
    while not tasks.empty():
        task = tasks.get()
        print 'Worker %s got task %s' % (n, task)
        gevent.sleep(0.5)

    print 'Quitting time!'

def boss():
    for i in xrange(1,25):
        tasks.put_nowait(i)

gevent.spawn(boss).join()

gevent.joinall([
    gevent.spawn(worker, 'steve'),
    gevent.spawn(worker, 'john'),
    gevent.spawn(worker, 'nancy'),
])
</code>
</pre>

<p>
Queues can also block on either <code>put</code> or <code>get</code> as the need arises. 

Each of the <code>put</code> and <code>get</code> operations has a non-blocking
counterpart, <code>put_nowait</code> and 
<code>get_nowait</code> which will not block, but instead raise
either <code>gevent.queue.Empty</code> or
<code>gevent.queue.Full</code> in the operation is not possible.
</p>

<p>
In this example we have the boss running simultaneously to the
workers and have a restriction on the Queue that it can contain no
more than three elements. This restriction means that the <code>put</code>
operation will block until there is space on the queue.
Conversely the <code>get</code> operation will block if there are
no elements on the queue to fetch, it also takes a timeout
argument to allow for the queue to exit with the exception
<code>gevent.queue.Empty</code> if no work can found within the
time frame of the Timeout.
</p>

<pre>
<code class="python">import gevent
from gevent.queue import Queue, Empty

tasks = Queue(maxsize=3)

def worker(n):
    try:
        while True:
            task = tasks.get(timeout=1) # decrements queue size by 1
            print 'Worker %s got task %s' % (n, task)
            gevent.sleep(0.5)
    except Empty:
        print 'Quitting time!'

def boss():
    """
    Boss will wait to hand out work until a individual worker is
    free since the maxsize of the task queue is 3.
    """

    for i in xrange(1,10):
        tasks.put(i)
    print 'Assigned all work in iteration 1'

    for i in xrange(10,20):
        tasks.put(i)
    print 'Assigned all work in iteration 2'

gevent.joinall([
    gevent.spawn(boss),
    gevent.spawn(worker, 'steve'),
    gevent.spawn(worker, 'john'),
    gevent.spawn(worker, 'bob'),
])
</code>
</pre>

</section>

<!-- ================================================= -->

<section>
<h3>Locks and Semaphores</h3>

<code class="python">
</code>

</section>

<!-- ================================================= -->

<section>
<h3>Groups and Pools</h3>

<code class="python">
</code>

</section>

<!-- ================================================= -->

<section>
<h3>Actor Model</h3>

<p>
The actor model is a higher level concurrency model popularized
by the language Erlang. In short the main idea is that you have a
collection of independent Actors which have an inbox from which
they receive messages from other Actors. The main loop inside the
Actor iterates through its messages and takes action according to
its desired behavior. 
</p>

<p>
Gevent does not have a primitive Actor type, but we can define
one very simply using a Queue inside of a subclassed Greenlet.
</p>

<pre>
<code class="python">
class Actor(gevent.Greenlet):

    def __init__(self):
        self.inbox = queue.Queue()
        Greenlet.__init__(self)

    def recieve(self, message):
        """
        Define in your subclass.
        """
        raise NotImplemented()

    def _run(self):
        self.running = True

        while self.running:
            message = self.inbox.get()
            self.recieve(message)

</code>
</pre>

In a use case:

<pre>
<code class="python">
class Echo(Actor):
    def recieve(self, message):
        print message

class Speaker():
    def recieve(self, message):
        if message == 'start':
            for i in xrange(1,5):
                echo.inbox.put('Hey there!')

echo = Echo()
speak = Speaker()

echo.start()
speak.start()

speak.inbox.put('start')
gevent.joinall([echo, speak])
</code>
</pre>

</section>

<!-- ================================================= -->

<h2>Real World Applications</h2>

<section id="applications">

<h3>Holding Side Effects</h3>

In this example we hold the side effects of executing an
arbitrary string, 

<pre>
<code class="python">from gevent import Greenlet

env = {}

def run_code(code, env={}):
    local = locals()
    local.update(env)
    exec(code, globals(), local)
    return local

while True:
    code = raw_input('>')

    g = Greenlet.spawn(run_code, code, env)
    g.join() # block until code executes

    # If succesfull then pass the locals to the next command
    if g.value:
        env = g.get()
    else:
        print g.exception
</code>
</pre> 

</section>


<section>
<h3>WSGI Servers</h3>

<pre>
<code class="python">from gevent.pywsgi import WSGIServer

def application(environ, start_response):
    status = '200 OK'
    body = 'Hello Cruel World!'

    headers = [
        ('Content-Type', 'text/html')
    ]

    start_response(status, headers)
    return [body]

WSGIServer(('', 8000), application).serve_forever()
</code>
</pre> 

</section>

<section>
<h3>Long Polling</h3>
</section>

<section>
<h3>Chat Server</h3>
</section>

</section>

<h3>License</h3>

<p>
This is a collaborative document published under MIT license. Forking 
on <a href="https://github.com/sdiehl/gevent-tutorial">GitHub</a> is encouraged
</p>

</div>

</body>
</html>