|
1 | 1 | ---
|
2 | 2 | layout: tutorial
|
3 | 3 | categories: tutorial
|
4 |
| -sections: ['Message Ordering', 'Selective Receive', 'Advanced Mailbox Processing'] |
| 4 | +sections: ['The Thing About Nodes', 'Message Ordering', 'Selective Receive', 'Advanced Mailbox Processing', 'Process Lifetime', 'Monitoring And Linking', 'Getting Process Info'] |
5 | 5 | title: Getting to know Processes
|
6 | 6 | ---
|
7 | 7 |
|
| 8 | +### The Thing About Nodes |
| 9 | + |
| 10 | +Before we can really get to know _processes_, we need to consider the role of |
| 11 | +the _Node Controller_ in Cloud Haskell. As per the [_semantics_][4], Cloud |
| 12 | +Haskell makes the role of _Node Controller_ (occasionally referred to by the original |
| 13 | +"Unified Semantics for Future Erlang" paper on which our semantics are modelled |
| 14 | +as the "ether") explicit. |
| 15 | + |
| 16 | +Architecturally, Cloud Haskell's _Node Controller_ consists of a pair of message |
| 17 | +buss processes, one of which listens for network-transport level events whilst the |
| 18 | +other is busy processing _signal events_ (most of which pertain to either message |
| 19 | +delivery or process lifecycle notification). Both these _event loops_ runs sequentially |
| 20 | +in the system at all times. |
| 21 | + |
| 22 | +With this in mind, let's consider Cloud Haskell's lightweight processes in a bit |
| 23 | +more detail... |
| 24 | + |
8 | 25 | ### Message Ordering
|
9 | 26 |
|
10 | 27 | We have already met the `send` primitive, which is used to deliver
|
@@ -86,6 +103,19 @@ subject was covered briefly in the first tutorial. Matching on messages allows
|
86 | 103 | us to separate the type(s) of messages we can handle from the type that the
|
87 | 104 | whole `receive` expression evaluates to.
|
88 | 105 |
|
| 106 | +Consider the following snippet: |
| 107 | + |
| 108 | +{% highlight haskell %} |
| 109 | +usingReceive = do |
| 110 | + () <- receiveWait [ |
| 111 | + match (\(s :: String) -> say s) |
| 112 | + , match (\(i :: Int) -> say $ show i) |
| 113 | + ] |
| 114 | +{% endhighlight %} |
| 115 | + |
| 116 | +Note that each of the matches in the list must evaluate to the same type, |
| 117 | +as the type signature indicates: `receiveWait :: [Match b] -> Process b`. |
| 118 | + |
89 | 119 | The behaviour of `receiveWait` differs from `receiveTimeout` in that it
|
90 | 120 | blocks forever (until a match is found in the process' mailbox), whereas the
|
91 | 121 | variant taking a timeout will return `Nothing` unless a match is found within
|
@@ -183,7 +213,101 @@ distributed-process can be utilised to develop highly generic message processing
|
183 | 213 | All the richness of the distributed-process-platform APIs (such as `ManagedProcess`) which
|
184 | 214 | will be discussed in later tutorials are, in fact, built upon these families of primitives.
|
185 | 215 |
|
| 216 | +### Process Lifetime |
| 217 | + |
| 218 | +A process will continue executing until it has evaluated to some value, or is abruptly |
| 219 | +terminated either by crashing (with an un-handled exception) or being instructed to |
| 220 | +stop executing. Stop instructions to stop take one of two forms: a `ProcessExitException` |
| 221 | +or `ProcessKillException`. As the names suggest, these _signals_ are delivered in the form |
| 222 | +of asynchronous exceptions, however you should not to rely on that fact! After all, |
| 223 | +we cannot throw an exception to a thread that is executing in some other operating |
| 224 | +system process or on a remote host! Instead, you should use the [`exit`][5] and [`kill`][6] |
| 225 | +primitives from distributed-process, which not only ensure that remote target processes |
| 226 | +are handled seamlessly, but also maintain a guarantee that if you send a message and |
| 227 | +*then* an exit signal, the message will be delivered to the destination process (via its |
| 228 | +local node controller) before the exception is thrown - note that this does not guarantee |
| 229 | +that the destination process will have time to _do anything_ with the message before it |
| 230 | +is terminated. |
| 231 | + |
| 232 | +The `ProcessExitException` signal is sent from one process to another, indicating that the |
| 233 | +receiver is being asked to terminate. A process can choose to tell itself to exit, and since |
| 234 | +this is a useful way for processes to terminate _abnormally_, distributed-processes provides |
| 235 | +the [`die`][7] primitive to simplify doing so. In fact, [`die`][7] has slightly different |
| 236 | +semantics from [`exit`][5], since the latter involves sending an internal signal to the |
| 237 | +local node controller. A direct consequence of this is that the _exit signal_ may not |
| 238 | +arrive immediately, since the _Node Controller_ could be busy processing other events. |
| 239 | +On the other hand, the [`die`][7] primitive throws a `ProcessExitException` directly |
| 240 | +in the calling thread, thus terminating it without delay. |
| 241 | + |
| 242 | +The `ProcessExitException` type holds a _reason_ field, which is serialised as a raw `Message`. |
| 243 | +This exception type is exported, so it is possible to catch these _exit signals_ and decide how |
| 244 | +to respond to them. Catching _exit signals_ is done via a set of primitives in |
| 245 | +distributed-process, and the use of them forms a key component of the various fault tolerance |
| 246 | +strategies provided by distributed-process-platform. For example, most of the utility |
| 247 | +code found in distributed-process-platform relies on processes terminating with a |
| 248 | +`ProcessKillException` or `ProcessExitException` where the _reason_ has the type |
| 249 | +`ExitReason` - processes which fail with other exception types are routinely converted to |
| 250 | +`ProcessExitException $ ExitOther reason {- reason :: String -}` automatically. This pattern |
| 251 | +is most prominently found in supervisors and supervised _managed processes_, which will be |
| 252 | +covered in subsequent tutorials. |
| 253 | + |
| 254 | +A `ProcessKillException` is intended to be an _untrappable_ exit signal, so its type is |
| 255 | +not exported and therefore you can __only__ handle it by catching all exceptions, which |
| 256 | +as we all know is very bad practise. The [`kill`][6] primitive is intended to be a |
| 257 | +_brutal_ means for terminating process - e.g., it is used to terminate supervised child |
| 258 | +processes that haven't shutdown on request, or to terminate processes that don't require |
| 259 | +any special cleanup code to run when exiting - although it does behave like [`exit`][5] |
| 260 | +in so much as it is dispatched (to the target process) via the _Node Controller_. |
| 261 | + |
| 262 | +### Monitoring and Linking |
| 263 | + |
| 264 | +Processes can be linked to other processes (or nodes or channels). A link, which is |
| 265 | +unidirectional, guarantees that once any object we have linked to *dies*, we will also |
| 266 | +be terminated. A simple way to test this is to spawn a child process, link to it and then |
| 267 | +terminate it, noting that we will subsequently die ourselves. Here's a simple example, |
| 268 | +in which we link to a child process and then cause it to terminate (by sending it a message |
| 269 | +of the type it is waiting for). Even though the child terminates "normally", our process |
| 270 | +is also terminated since `link` will _link the lifetime of two processes together_ regardless |
| 271 | +of exit reasons. |
| 272 | + |
| 273 | +{% highlight haskell %} |
| 274 | +demo = do |
| 275 | + pid <- spawnLocal $ receive >>= return |
| 276 | + link pid |
| 277 | + send pid () |
| 278 | + () <- receive |
| 279 | +{% endhighlight %} |
| 280 | + |
| 281 | +The medium that link failures uses to signal exit conditions is the same as exit and kill |
| 282 | +signals - asynchronous exceptions. Once again, it is a bad idea to rely on this (not least |
| 283 | +because it might fail in some future release) and the exception type (`ProcessLinkException`) |
| 284 | +is not exported so as to prevent developers from abusing exception handling code in this |
| 285 | +special case. |
| 286 | + |
| 287 | +Whilst the built-in `link` primitive terminates the link-ee regardless of exit reason, |
| 288 | +distributed-process-platform provides an alternate function `linkOnFailure`, which only |
| 289 | +dispatches the `ProcessLinkException` if the link-ed process dies abnormally (i.e., with |
| 290 | +some `DiedReason` other than `DiedNormal`). |
| 291 | + |
| 292 | +Monitors on the other hand, do not cause the *listening* process to exit at all, instead |
| 293 | +putting a `ProcessMonitorNotification` into the process' mailbox. This signal and its |
| 294 | +constituent fields can be introspected in order to decide what action (if any) the receiver |
| 295 | +can/should take in response to the monitored processes death. |
| 296 | + |
| 297 | +Linking and monitoring are foundational tools for *supervising* processes, where a top level |
| 298 | +process manages a set of children, starting, stopping and restarting them as necessary. |
| 299 | + |
| 300 | +### Getting Process Info |
| 301 | + |
| 302 | +The `getProcessInfo` function provides a means for us to obtain information about a running |
| 303 | +process. The `ProcessInfo` type it returns contains the local node id and a list of |
| 304 | +registered names, monitors and links for the process. The call returns `Nothing` if the |
| 305 | +process in question is not alive. |
| 306 | + |
186 | 307 | [1]: hackage.haskell.org/package/distributed-process/docs/Control-Distributed-Process.html#v:receiveWait
|
187 | 308 | [2]: hackage.haskell.org/package/distributed-process/docs/Control-Distributed-Process.html#v:expect
|
188 | 309 | [3]: http://hackage.haskell.org/package/distributed-process-0.4.2/docs/Control-Distributed-Process.html#v:match
|
189 | 310 | [4]: /static/semantics.pdf
|
| 311 | +[5]: http://hackage.haskell.org/package/distributed-process-0.4.2/docs/Control-Distributed-Process.html#v:exit |
| 312 | +[6]: http://hackage.haskell.org/package/distributed-process-0.4.2/docs/Control-Distributed-Process.html#v:kill |
| 313 | +[7]: http://hackage.haskell.org/package/distributed-process-0.4.2/docs/Control-Distributed-Process.html#v:die |
0 commit comments