Reasons for modules #569

masak · 2021-10-06T12:41:58Z

Issue #302, that enormous discussion that hovers around fexprs and Kernel but sometimes veers out into the unknown and asks really deep and important questions about computation itself (and how it's managed to manifest itself in our timeline's history of programming languages), contains the following paragraph (by me, replying to @raiph):

This week, as I've been pondering writing this down, I've been struck by something else, which I'm at even more of a disadvantage to express clearly: namely, how practically every major innovation or focal point in computer science manages to place itself on a spectrum between lambda and process: functions, co-routines, libraries, modules, tasks, threads, OS processes, cores, processors, network nodes, services... all of them "proto-actors" in the sense that they stake out some things that are separate, and others that are shared. Actors go the whole hog, jumping to the obvious end point where everything's separate. Only asynchronous messaging remains. The proto-actors are of course still interesting to study on their own, but maybe partly because they help tease out different aspects of actor separation.

I'm on vacation, so I thought I would tackle this "tower" of abstractions, describing them one by one. I will try to do all of them justice, extracting both their angelic "essence" or perfect form of each feature — what remains, in a sense, when you don't have to be all that concerned with the inconveniences of actual implementation — and also their demonic "substance", the imperfect form of each feature — what inevitably happens when you try to trap it in a machine.

(As a concrete example: the angelic essence of functions is something like the isolation guarantees and re-use that they give you, whereas the demonic substance is what leads to stack overflows or references outliving their stack-allocated data.)

As much as possible, I will also refer to real actual languages out there, especially when they make for pure exemplars of the feature.

I will try to get through all this in a single evening, aiming for coherence of thought rather than comprehensiveness. I'm not 100% sure what I will get out of this, except that I'm hoping to have a better overview afterwards of these different rungs on the abstraction ladder:

Functions
Coroutines
First-class functions
Continuations
Modules
Types
Packages
Containers
Components
Communicating Sequential Processes
Actors

There is still a clear connection with Alma: during the design of Alma I've sometimes encountered parts of the challenge of introducing these things into a language. (Maybe most keenly felt with modules and types, but also to some extent with coroutines.) I'm not saying that this issue will provide any insights that I wish I'd had earlier, but... it's something to aim towards, I guess. Or if not insights, then at least a much-needed vantage point.

This issue is called "reasons for modules", which is not a particularly good name. But it's how I've been thinking about it — modules sit somewhere towards the middle of the ladder of abstractions, and the word "module" means different things to different people, but they are the prototypical thing which helps bring structure and separation to a program. In some vague sense, all the features on the abstraction ladder are modules, of one kind or another.

masak · 2021-10-06T13:59:23Z

Functions

I've written "Functions (units of functionality)" in my notes. I'll have "units of <noun>" written down for each of the rungs on the abstraction ladder, but I realize that those easily become just empty words, so let me try to back it up: you put code inside a function, and that code forms a unit.

In the early days of assembly code, the whole code section was a single thing, and there was no strong syntactic boundary separating a function from its outside. It all ran on convention. When you went through the motions of jumping back to what called you, that's when the function ended. It was an easy thing to jump into the function at different starting points — somewhat weakening (what we now think about as) the integrity of the function boundary.

I recall vividly that in Knuth's TAoCP books, the MIX machine uses as its calling convention a kind of self-modifying code which sets the return address of the caller on the jump instruction at the end of the function. (Yes, really!) MIX is and old standard, and that particular design decision didn't stand the test of time — indeed, the newer MMIX architecture uses a more traditional stack-based approach. More on this under "First-class functions", below.

Functions provide a rudimentary separation of concerns. Many of the programs in "BASIC Computer Games" just ran on and did what they did with GOTOs. Maybe there was the occasional GOSUB and RETURN; I'll need to go back and check. If there was, it was still not fundamentally more "bundled up" into individual functional units than assembly code.

In the paper The Use of Sub-Routines in Programmes, David Wheeler begins with this sentence: "A sub-routine may perhaps be described as a self-contained part of a programme, which is capable of being used in different programmes." Already from the start, the motivation was a kind of modularity.

The separation of concerns that functions provide takes many forms, each helped by the pure idea of a function:

Isolation. What happens in Vegas stays in Vegas. The added idea of lexical scope helps splendidly with this.
Re-use. If two parts of a program (or larger system) both want to call the same function, they can do so. The cost of more and more callers of a function scales linearly with the number of callers, and scales more slowly than (say) re-writing the function by hand at each call site. (This is true whether we refer to program size or programmer time.)
Coupling/cohesion. By encouraging re-use, functions can help decrease coupling. That's very abstract when put like that; sorry. I guess the coupling they remove is the kind that would have appeared in the form of duplicated code... Heaven knows different forms of coupling and other problems are still possible, though. (Wikipedia lists a large flora of couplings.) Similarly, functions encourage but do not guarantee cohesion. There's a rabid "clean code" posse out there that insist methods should be small, sometimes indefensibly so. I'm pretty sure what they're after is this kind of "do one thing" ideal, poorly expressed.
By existing, a function establishes a consumer/provider relation. The ideal form of a function takes parameters as input and gives a return value as output. (We'll stay silent on whether a function is "allowed" to perform side effects, by which I mean we'll skirt the issue and implicitly allow it. Part of the reason is that the story about side effects is still very much being written/explored, by languages like Rust and Koka.)
Building on that last point, functions are also what opens the door to Design by Contract, and specifications. Whether these are specified formally and machine-checked, or informally with documentation and targeting the human reader, the main idea is that the function is no longer just its code, but also a set of assumptions and guarantees. This idea will come back later, several times.

Why does lexical scoping (also known as "static scoping") fit so well with functions? From the angelic perspective, it's because this scoping discipline provides a form of stability, tying variable scopes to "blocks" in the code — that is, in the written program text. It doesn't get much more clearer than that: your variable lives until the } of the block in which it was declared. From the demonic perspective, it doesn't fit particularly well, and is not the first scoping discipline that tends to come to mind — dynamic scoping is.

Or rather, dynamic scoping is the natural thing for an interpreter to adopt. The interpreter follows the control flow of the program itself; if it carries around a shallow ("one level deep") set of name bindings, then dynamic scoping is pretty much inevitable. Lexical scoping is the perspective taken by the compiler, which follows the program text rather than the control flow. It took from the late 50s to the 80s for the Lisp world to make the transition from one to the other; indeed, they started with interpreters and only gradually became keen on compilation.

Functions provide a guarantee which will be cast in a sharper light when it is broken later. It doesn't have a name in the literature that I know of, so let's call it the dynamic embedding guarantee. It has two parts:

The entire machine has a single thread of control.
When a function is called, the single thread of control is transferred/borrowed from caller to callee. When the function returns, it is handed back.

Think of it as a "spark" running around executing your program. (This is the entity that debugging/stepping through code makes manifest.) Calling a function means transferring that spark. Because the dynamic extent of the function's execution (the interval on the timeline when the function was active) is strictly contained in that of the caller's function's execution, the whole program run forms a perfectly nested structure, a tree of calls.

It is at this end of the spectrum that functions are perfectly oblivious, not "aware" at all of the process that's running them. Looking ahead towards the other end of the spectrum: actors are the total opposite — not only are they aware of the process running them, they've taken it over to the extent that they are the process that's running them.

Function calls compose well. This is why we can talk of call trees, because a function can take on the role both as a callee and as a caller. I'll save the discussion about re-entrant functions till the "First-class functions" section. Same goes for when functions compose in two other ways. Here I'll just note that it's interesting that the angelic nature of functions demands a lot of them, and functions, it seems, are up to the task.

There is a slight asymmetry between parameters and return values: in almost all languages, you pass 0 or more of the former, but receive exactly one of the latter. The introduction of tuples means there's a way to reconcile this apparent conflict — if what you're passing in is always a tuple, it can be both a single thing (to the implementation) and 0 or more things (to the user). It's a really nice idea, but I can't recall seeing it implemented perfectly in any language in practice. The same trick allows a language to emulate multiple return values — for some reason, I see this implemented well in many languages (C++, Perl, Raku, Bel).

masak · 2021-10-06T15:07:19Z

Coroutines

Known under many names: fibers, green threads. (The term "green threads" comes from the fact that some part of Java implementation was called "Green" at some point.)

I've written "units of cooperative multitasking" in my notes. The operative word is "cooperative" — if one agent refuses to pass on the baton, the multitasking doesn't work.

Recall the dynamic embedding guarantee. We now weaken this in the smallest way possible: there is still only a single thread of control, but there is now a way to transfer it between active functions. ("Active" here simply means "currently being called", that is, we're "in" that function's body; there's an activation frame for it.)

This breaks the LIFO assumption of call completion: functions now get the ability to "pause" execution and hand the baton to some other active function.

In a sense, it also breaks the assumption of "single thread of control" — but it depends what you mean, actually. It does break it if you mean "we have one and only one call stack". It doesn't break it if you mean "at any given moment in time, exactly one function is executing". (Real threads break that, but not green threads.)

Interestingly, this affords a greater separation of some concerns, even though it surely feels as if we're making things more mixed-up. The reasons for this I cannot verbalize well, but it has something to do with separating concerns along producer/consumer lines.

Coroutines enable stream processing. A producer can generate data in individual sequence items, and the consumer can accept those sooner than if it had to wait for the whole sequence to be computed. The total "delay" involved is the same if we don't have real threads, but we might see partial results sooner.

Unix pipelines are an excellent example of this. (Except that they can sometimes use more than cooperative multitasking.)

Russ Cox has a good article on the original reason for CSP (communicating sequential processes) being separation of concerns. That is, just as functions help us separate some concerns, coroutines help us separate others.

A demonic implementation concern that gets mentioned here is a "spaghetti stack" (or "cactus stack"). This is just the idea that the stack is no longer a singly linked list of call frames, but a tree of them. The leaves of this tree correspond to the active-or-paused functions.

Since this is our first encounter with suspending the execution of a function, we also need to bring forth what (demonically) needs to be saved for the function to be able to resume later:

The "instruction pointer" (which statement or part of expression we should continue on later)
Local state (local variables if you're the human reader, local registers if you're the machine)

This is enough as far as "information that needs to be saved" goes, but there's one additional detail that matters from an angelic perspective: assume that the keyword for "cede control to caller" is yield. What would be an appropriate value of a yield expression? The question at first appears not to make sense — we've just ceded control. But at some point, it will be handed back to us. And at that point, it might be handed back with a value, much as a function is called with an argument.

Python implements exactly this model. I think it was Erik Meijer who said that the bidirectional aspect of yield is what makes it especially difficult to give Python coroutines a sensible type.

Coroutines anticipate CSP (Communicating Sequential Processes) — more on which later — in the sense that you can implement the latter with the former. Or maybe better said that coroutines look like a line or a singly linked list (like Unix pipelines), but CSP looks more like a directed graph.

masak · 2021-10-06T16:05:42Z

First-class functions

Taking a step in an orthogonal direction: given how absolutely awesome functions are, is there any way they could be made more awesome?

According to Christopher Strachey, yes: we can make functions first-class, that is, reified values in the same sense booleans and numbers are values.

"First-class" is not a complicated concept. It means we're allowed to do these things:

Pass as a parameter
Store as a variable
Return as a return value

For some reason, Computer Science rewards those who make concepts first-class citizens. My best guess as to why: Denotational Semantics already talks about "valuations" of program fragments: turning a bit of syntax into an actual value. By making those values first-class, we are allowing the program to hold and refer to those valuations directly. This tends to bring more direct power into the language. I've written elsewhere about the drawbacks of making things first-class.

The Lisp family of languagues ended up being the ones who struggled with these questions. With the comfort of hindsight, it was simply a transition from dynamic scoping to lexical scoping... but in the coincidences of actual history, they were called "downward funargs" (function values passed as arguments) and "upwards funargs" (function values returned as return values). (John Shutt remarks on the fact that "funargs" implies that the "arguments"/downwards case somehow takes priority, which is indeed how the Lisp world thought of it.)

Take the "downwards funarg" case first. This is what happens when you give an anonymous function to map. Lexical scoping and dynamic scoping actually agree on what should happen here, which is what makes this the easier case: variables accessed that were declared in the outer function should still be accessible normally. Lambda calculus and the semantics of binders makes this a non-issue; it is the demonic aspects of making this work on a machine that causes any trouble at all.

Now then, "upwards funargs". These were, historically, the real problem. Lexical scoping says it should just work: the function value returned should still have access to the outer function's variables. But from the perspective of dynamic scoping, and from the demonic perspective of "returning from a function means de-allocating its activation frame", upwards funargs are a real problem.

Upwards funargs "escape" from the function that defined them. This is the first push from the world of stack allocation to the world of heap allocation.

The historical answer to all this is closures: the joining of a function with the (lexical) environment in which it was created. The environment is what gives the closure access to the necessary outer variables.

In the history of Computer Science, there is no greater story than that of FP and OOP setting off in two different directions and yet reaching the same goal. From the point of view of closures, the story is how closures accidentally take on all the power of objects. The koan about Qc Na recounts this. From a demonic perspective, the act of returning a closure from its active function means the closure has to save the active function's environment, thus turning into something the shape of an object.

To me personally, JavaScript was the language that demonstrated this, since its first-class functions are faithful enough... but historically, Scheme was the language in which these realizations were made, and embodied.

Next up is continuations. I would say that the closures⇄objects analogy gets even stronger in the form of continuations⇄actors. But all in due time.

masak · 2021-10-06T17:38:05Z

Continuations

I have nothing in my notes about continuations. Fortunately, I've spent the last week implementing them, again, in Bel. I needed them for my "fastfuncs" mechanism, which is a cheaty way to make the Bel interpreter faster without having a full-fledged compiler. (I had falsely believed the fastfuncs could make calls of their own freely, but this ends up interacting badly with other continuation-based Bel functionality (such as exceptions), and so in the end, the fastfuncs need to implement continuations as well.)

Continuations are what you get if you start from regular second-class functions, and then add both coroutine functionality and first-class-ness:

Continuations store exactly the same data as coroutines: the instruction pointer, and the environment to use at that point. (And they accept a single value as a parameter.)
Continuations don't have to be first-class, but when they are, they are indistinguishable from first-class functions.

Holey manoley! I've never seen anyone describe it like that in the literature, but I believe it's true: continuations are simply first-class coroutines. Categorically, they are the pushout of coroutines and first-class functions, taking plain functions as their common base.

A corollary of this is that if your languages has first-class continuations, you don't separately need it to have coroutines; they are but a special case.

My mental model is this: functions abandon their role as the fundamental unit of computation, and hand it to a smaller entity: what typically is called a "basic block", characterized by the fact that jumps/state transitions can only happen between basic blocks, never within them.

If it gives you warmth and comfort, you can think of these basic blocks as "mini-functions". They have the additional restrictions that they do not contain internal calls — a call counts as a jump, and so needs to happen at the end of a basic block.

There is a one-to-one correspondence between the "control-flow graphs" people used to draw in the 70s, and basic blocks/"mini-functions".

Continuations turn the call tree into a directed graph. In doing so, they also erase any distinction between "calling a function/passing a parameter" and "returning from a function/passing a return value" — these are both just instances of a invoking a continuation. This seems to be a difference that stems from turning coroutines first-class: by mixing "control" and "values", we destroy the idea of "caller".

Continuations are useful as a basis for advanced language features: coroutines, exceptions, finally semantics, backtracking semantics... The implementation can still choose to keep them continuations hidden and "second-class" in userspace, if it does not wish to confer such awesome power to the language implemented.

Coming from the other direction, even a language which itself does not expose first-class continuations could use a programming style which imitates them (using first-class functions). This is known as "Continuation-Passing Style" (CPS). The clearest example of this that I know about is Node.js with its callback parameters, circa 2009. It's now 12 years later, and async/await is recommended instead, but there was a point when CPS was your best bet.

Since I've spent the past week writing (Perl) code in continuation-passing style, I feel I'm somewhat an authority at this style, more than I ever was when I wrote Node.js code in that style. You're writing your basic blocks as functions, which either return normally, or invoke another basic block. All you need to make sure is that all the basic blocks you need from the current one are lexically in scope. You can do that like this:

Declare all the basic blocks that mutually call each other (but don't initialize them)
Initialize them (in any order)
Call one of them to get things started

This covers the hardest case, that of looping and forward jumps. The cases of sequencing and choice are easier, and I won't cover them. Altogether, we can express things in such a CPS style, using calls for anything that previously required if statements, while loops, or even more exotic control flow.

Pro tip: if you try this at home, you should really do it in a language that implements Tail Call Optimization. The reason is that, since continuations don't believe there's a call stack at all, they will be sorely disappointed if there is one in the implementing language, and it overflows. Amusingly, the way this resolves itself in Node.js (and in my Bel fastfuncs) is that the stack gets reset regularly by async calls.

(Very Late Edit: And, whatever you do, don't mutate your local variables! (Something like a for (int i = 0; i < L; i++) { ... } is a bit no-no.) What you want to do is emulate mutation by passing such values around, kind of like the SSA approach to variables. If you don't do this, someone will take a continuation and catch you red-handed. I learned this the hard way.)

It is perhaps here it's also worthwhile to mention that, as part of the development of Scheme, Steele and Sussman found out (by implementing it) that continuations and actors are equivalent... under certain other reasonable assumptions. More precisely, passing a value to a continuation and passing a message to an actor can be seen as equivalent in power. Perhaps the biggest difference in practice is that actors are fundamentally isolated on the process level (and therefore asynchronous), whereas continuations are agnostic in this dimension: they allow it, but don't require it. To Steele and Sussman, that didn't matter much; to Hewitt, it did.

I feel I could write a lot more about continuations. I also have literally dozens of papers queued up about continuations, most of which I have not even skimmed. It's clear to me that they are powerful and fundamental, and that I still do not understand them well enough.

The removal of the basic block's power to call other functions is significant. Here is the first time we're saying "you can do many things, but you cannot do this". Continuations do not work properly if we also allow normal function calls. It is by limiting them that we make them work as they should. This act of limitation is the start of removing control from functions, and making it a global/distributed concern rather than a local one.

masak · 2021-10-06T17:39:01Z

This is as far as I got tonight. Looking back at my progress, I think I would need two more evenings to finish up. I'll try to rise to that challenge.

vendethiel · 2021-10-11T14:16:07Z

The same trick allows a language to emulate multiple return values — for some reason, I see this implemented well in many languages (C++, Perl, Raku, Bel).

That's a weird list of languages with tuple support. I'm not sure what those share in common wrt multiple return values?

masak · 2021-10-11T15:23:36Z

That's a weird list of languages with tuple support. I'm not sure what those share in common wrt multiple return values?

Aye, I fear I did not do that bit justice. Not sure it's about tuples what I'm after. Let me unpack (pun totally intended) my given list of language examples in search of some unifying trend:

C++ actually does have tuples. Someone with a better understanding of compiler optimizations might do a better job explaining them, but my impression is that they are essentially zero-cost; the tuple's existence on the returning and receiving side of a return can be completely eliminated, and only the values themselves transferred. I guess this is only really true with a number of caveats involving inlining and whatnot. My main point is that under some circumstances, C++ truly allows multiple values to be returned with zero overhead.
Perl 5 does not have tuples, and makes none of the claims to efficiency that C++ makes. But what it does have is an admirable symmetry in its functions' inputs and outputs. The inputs are found in a dedicated array @_; the most general form of the outputs is a list of values. This list can easily be assigned en masse on the receiving side. Even without the efficiency, it gets full points for allowing multiple values to leave the function and be taken care of on the other side.
Raku is much the same (although it got civilized parameter lists on the way). I honestly forget whether what return returns in Raku is a List or a Seq, but at least it can contain multiple values. What's more, the binding operator can make it absolutely clear that passing values into a function and passing them out of a function are symmetric acts.
I can't speak for Common Lisp, which I know does something with multiple-value returns as well, but... Scheme has a mechanism called values which seems absolutely bizarre to me, like someone had an honest momentary lapse of language design sense. The Kernel spec similarly rips into R5RS on this point, pointing out how only a confusion of ideas could have led to something like values. Interestingly, the solution both Kernel and Bel arrive on instead is just good ol' destructuring — even JavaScript has that nowadays.

Python doesn't have destructuring as such, although it does have some mitigating ways to assign from an iterator into a tuple of lvalues.

Which only leaves Java among the more popular languages. Java has absolutely nothing in this area of design space, which seems strange until you realize that Java was put on this earth to make the von Neumann bottleneck as narrow and uncomfortable as possible.

vendethiel · 2021-10-11T15:34:30Z

Common Lisp and Scheme share that values mechanism. I'm not sure why they thought stealing that particular feature was a good idea, but they did nevertheless.
If you don't explicitly ask for all the return values, you only get one.

This is similar to how this happens in Lua, which is the only language with different semantics for "return values" than just unpacking (that lots of languages have):

In Lua, a multiple value is collapsed in several positions, but it "auto-splats" when used as the last argument of a function, inside an object, and is blocked by parentheses.

function f()
  return 2, 3
end
function g(...) end

local x = f() -- Cannot splat: assigns x = 2
local v = { f() } -- Can splat: assigns v = {2,3}

g(f()) -- No parentheses: calls g(2, 3)
g((f()) -- Parenthesized: calls g(2)

g(1, f()) -- Last arg: calls g(1, 2, 3)
g(f(), 1) -- Not last arg: calls g(2, 1)

masak · 2021-10-12T06:29:18Z

If you don't explicitly ask for all the return values, you only get one.

This isn't even my main beef with the feature (although that sounds like a pretty horrible default).

My main beef is that it feels like their goal was to deliberately create a second-class value in the language just for passing many values... OK in general maybe, but not in the context of a language whose central feature is to build rich tree structures out of a universal structure type. It's like the values feature was designed by someone who hadn't used Lisp/Scheme for an entire day.

In Lua, a multiple value is collapsed in several positions

hnn 😞

but it "auto-splats" when used as the last argument of a function, inside an object

eww 😱

and is blocked by parentheses

One thing that is really really hard in language design, for some reason, is to make parentheses in expressions keep meaning "just grouping, nothing else".

vendethiel · 2021-10-12T11:10:20Z

It's like the values feature was designed by someone who hadn't used Lisp/Scheme for an entire day.

Or most programming languages, really. I'm just describing it for the sake of completeness, but I certainly won't say I like it at all.

One thing that is really really hard in language design, for some reason, is to make parentheses in expressions keep meaning "just grouping, nothing else".

It'd be better, yes. Well, the language with the worst example of this that comes to mind right away is C++ (if a function is decltype(auto), the deduced type is different for return 3; and return (3);).

masak · 2021-12-25T08:16:26Z

I haven't forgotten about this thread, but getting back into it is going to require bigger chunks of time — realistically, the next time I'll have those will be around Spring Festival.

In the meantime, I just found From Procedures, Objects, Actors, Components, Services, to Agents – A Comparative Analysis of the History and Evolution of Programming Abstractions. It comes without a strong recommendation — I have only skimmed it very lightly, not read it in detail — but the title itself was similar enough to the thrust of this issue that I wanted to mention it.

Might as well mention Sans-Papiers as First-Class Citizens, which I will need to pick apart and review in detail at some point as well.

masak · 2022-02-24T02:58:57Z

On Abstraction by Eric Scrivner:

Substituting a call to a subroutine for the contents of a subroutine builds a lexicon which can be used to reduce duplication and heterogeneity. Using a pointer or reference instead of a value reduces duplication and copying. Both facilitate compression of data as well as compression of semantics. Both of these give us leverage - the ability to do a lot with a little.

It is precisely here that a risk arises. We can view the whole purpose of abstraction as the dissection of a program into its conceptual categories. This was the profound error and misstep of object-oriented programming (OOP). This style of abstraction leads to the traditional problems of reason, logic, and language as thoroughly covered by Beiser. Alternatively, we can view the purpose of abstraction as compression guided by resemblance and analogy. This style of abstraction has poetry and literature as its exemplary arts. We are given these techniques of substitution in order find the boundaries and limits of resemblance and analogy in pursuit of compression. The choice of path on this fork is the momentous decision that determines the whole of the method we pursue in construction.

This reminds me of two things. One is the point about premature abstraction that sometimes pops up when discussing abstraction with @jnthn — the hard part isn't creating an abstraction, the hard part is the mental work necessary to arrive at the right/useful abstraction. In the limit, refactoring makes us brave in constantly re-evaluating and re-creating abstractions as needed.

The other point is that Bel's predicate-based type system seems to fit the description about "compression guided by resemblance and analogy". Unlike in OOP, there's not really a reified "thing" corresponding to a class or type. Just a predicate that answers t or nil when you give it an object.

masak · 2022-03-09T03:33:25Z

First-class functions

I just realized one thing.

Here's a sentence from cppreference:

When a user-defined class overloads the function call operator, operator(), it becomes a FunctionObject type.

So, a "function object" (which I've heard people call a "functor") is an object that acts like a function, in that you can use the () function call operator on it.

This is approaching first-class functions from the other direction, adding "it's a function" to an object instead of (as is traditional) adding "it's a first-class value" to a function. It's first-class functions from an OO perspective.

It clicked for me when reading this SO answer:

There are a couple of nice things about functors. One is that unlike regular functions, they can contain state.

This is a very C/C++ perspective, when "regular functions" mean "lower-order functions" by default, and these don't contain state. (Um. Modulo the static keyword, of course. The thing they don't contain is the kind of "instance-level state" that comes from evaluating a function declaration into a closure at runtime.)

Of course, I realized all this on some rough level when writing the original comment about first-class functions:

In the history of Computer Science, there is no greater story than that of FP and OOP setting off in two different directions and yet reaching the same goal.

Despite this, there's something concrete and satisfying about seeing the path actually walked in an objects-and-classes language, adding () to an object just because we can, and seeing a closure pop out.

JavaScript doesn't have operator overloading, but the JavaScript MOP consists only of Object and Function, and the only real difference is that Function allows you to () on the value. Instead of directly overriding the () on a class, you just make sure to create it as a function declaration, and then you can treat it as a regular object with properties. If you squint, that's actually a kind of operator overloading, too.

masak · 2022-03-16T01:44:18Z

My mental model is this: functions abandon their role as the fundamental unit of computation, and hand it to a smaller entity: what typically is called a "basic block", characterized by the fact that jumps/state transitions can only happen between basic blocks, never within them.

I don't know if it helps in any way, but "jumps/state transitions can only happen between basic blocks, never within them" — or, equivalently, "basic blocks are stretches of sequentially executed instructions" — is a kind of quotient construction; roughly a way to decrease resolution, making the notion of "instruction" bigger and more inclusive, and the notion of "(conditional) jump" more universal. Something related happens when looking for strongly connected components in a directed graph; the result is lower-resolution, but a part of the structure remains.

If it gives you warmth and comfort, you can think of these basic blocks as "mini-functions". They have the additional restrictions that they do not contain internal calls — a call counts as a jump, and so needs to happen at the end of a basic block.

There is a one-to-one correspondence between the "control-flow graphs" people used to draw in the 70s, and basic blocks/"mini-functions".

Continuations turn the call tree into a directed graph. In doing so, they also erase any distinction between "calling a function/passing a parameter" and "returning from a function/passing a return value" — these are both just instances of a invoking a continuation. This seems to be a difference that stems from turning coroutines first-class: by mixing "control" and "values", we destroy the idea of "caller".

All of this is true, but (as I come back to re-read this) applies equally well to coroutines. I think the CPS transform has already happened, conceptually, at least, the moment we introduce yield into a language.

masak · 2022-04-02T08:18:56Z

The removal of the basic block's power to call other functions is significant. Here is the first time we're saying "you can do many things, but you cannot do this". Continuations do not work properly if we also allow normal function calls.

It bothers me a little bit that we need to talk about a removal of power here; continuations in general feel incredibly enabling, in that they literally enable other features like generators, exceptions, nondeterminism, and other wonderful types of control flow.

Note that in Continuation-Passing Style it is the style itself which imposes a restriction, and with basic blocks it is our definition of what a basic block is that imposes the restriction. It doesn't come from continuations themselves, which are highly unconstrained and non-constraining.

Also important that the "stack paradigm" naturally suggested by regular LIFO function calls is also very restrictive, and by introducing continuations, we're transferring to a "heap paradigm". The non-obviousness and pain of doing so was the essence of the funarg problem, especially the upwards half where the stack is insufficient.

The stack paradigm is great, not just because it's faster in practice but because it ties an important kind of RAII thinking into the structure of the code itself. One of the tricky parts of good compilation is to get enough static knowledge of data flow inside of a function to determine that nothing continuation-shaped escapes, and therefore things can be stack allocated and therefore faster. The nature of such determinations means that we can do this less often than we'd like (and even when we can, it takes a bit of work).

raiph · 2022-04-02T23:24:19Z

I'm curious if you get anything useful out of this.

(And/or a reddit discussion about that article.)

masak · 2022-04-08T07:02:19Z

I'm curious if you get anything useful out of this.

Yes! My immediate reaction is that "substructural type systems are cool and useful" — vide Rust, and Perceus, and, I dunno, unique_ptr (for all its flaws and weaknesses). Linear types can change the world, and all that.

But I think I'll leave it at that. There's a sense in which the whole substructural thing falls under the heading "Types", which I almost didn't even include in the abstraction ladder/table of contents/topic plan for this issue. I have an irrational fear of type systems, equal and opposite to Bob Harper's love of them. I mean, clearly the regimentation they impose ends up helping and helps us see further. But on whose authority except its own does the type checker reject my program? I'm not hating on type checkers, I just don't think we should be slaves under them.

I have more to say/write about well-foundedness, and the distinction between what Harper calls T++ and PCF++ (which mirrors what Hofstadter calls BLooP and FLooP) — it seems to me that ownership-tracking is to memory what T++/BLooP is to termination/totality/decidability. Possibly there's even a direct connection — but even if there isn't, the topics feel a little bit related.

masak · 2022-04-08T08:27:39Z

Relevant to the way continuations generalize regular functions and the stack paradigm (from a 2012 blog post by John Shutt about guarded continuations):

When a function f calls another function g, f gives up some of its ability to regulate the future evolution of the computation. The programmer tends to assume that not all such power has been lost — that (1) the call to g won't terminate without control passing back through the point at which f called it, and (2) that particular call to g won't return through that point more than once. In the presence of first-class continuations, though, even these weak assumptions needn't hold.

I called (1) above the "dynamic embedding guarantee". I don't think that I spelled it out that continuations obliterate the dynamic embedding guarantee.

Also note how (1) is about (returning) "at least once", while (2) is about "at most once". Continuations but with (2) are sometimes called "one-shot continuations", I think.

masak · 2022-05-25T09:01:26Z

A quote I just ran into, which corresponds to where I was planning to make this issue end up:

"The most important problem right now in computing is — how do we deal with concurrency and distribution?" — Philip Wadler - Propositions as Types (Q&A)

masak · 2022-07-12T08:07:40Z

Parts of this story gain a sharper edge by more carefully separating the pure/applicative/expression world from the monadic/heap-memory/sequential world, like PFPL does, which I'm currently reading with great appreciation.

On page 307 (of the paper book, second edition), a bit after the midpoint of the book, things culminate in this factorial procedure declaration:

proc (x:nat) {
  dcl r := 1 in
  dcl a := x in
  { while ( @ a ) {
      y ← @ r
    ; z ← @ a
    ; r := (x-z+1) × y
    ; a := z-1
    }
  ; x ← @ r
  ; ret x
  }
}

Given how what we're expressing here could be written in Raku as sub ($x) { my $r = 1; for 1..$x -> $i { $r *= $i }; $r }, some explanation of Harper's Modernized Algol syntax is in order:

dcl introduces a (stack-allocated) assignable
@ reads from an assignable (into the monadic world)
← binds the results of a monadic computation to a variable (in the pure world)
:= outside of a dcl evaluates its (pure) rhs and writes the value to an assignable
ret is the only way to communicate a value from the monadic world back to the pure world

Assignables only exist in the monadic part. Variables are typically associated with the pure part.

I wrote this earlier:

(We'll stay silent on whether a function is "allowed" to perform side effects, by which I mean we'll skirt the issue and implicitly allow it. Part of the reason is that the story about side effects is still very much being written/explored, by languages like Rust and Koka.)

The entire point of Harper/PFPL's separation is that side effects are locked into the monadic half of the language, letting the pure part keep many of its algebraic/equational properties. This is nothing new, of course; Haskell does exactly this with monads, and Koka does exactly this with effects.

All the talk about side effects anticipates the later talk about process isolation and actors. It's by making the side effects and the heap memory explicit (like PFPL does) that we can later isolate those things, and hide/encapsulate them inside something.

raiph · 2022-07-13T22:45:20Z

It's by making the side effects and the heap memory explicit ... that we can later isolate those things, and hide/encapsulate them inside something.

In recent years I've been thinking of 6model as "actor model ready". Do you get what I mean? Am I deluded? Have I mentioned / are you aware of Pony and ORCA?

masak · 2022-07-14T03:28:39Z

In recent years I've been thinking of 6model as "actor model ready". Do you get what I mean?

I think so, but to the extent I do, I'm not sure I agree. I'll quickly add that this might be because you have pieces of the 6model puzzle that I lack.

In order to answer in more detail, I need to anticipate a few points from the future of this thread, specifically the last two topics in the projected list of "module-like things":

Communicating Sequential Processes

Actors

But as a baseline, the story must start with the status quo of shared memory, threads, and locks. I'll take all those mechanisms as given — there's oodles of literature on it — but I will mainly point out here that locks don't compose and threads don't scale. This book, specifically, spends the early parts showing how a perfectly well-designed class for single-threaded use can break under many-threaded use — it's humbling — the problem is a bit like "monsters from the 9th dimension" in that it manifests orthogonally to the correct code you wrote — and then spends the later parts advocating safer/higher-order building blocks, like atomic data, built-in cuncurrent data structures, and architectures built on the producer-consumer metaphor. (Further confirmed with BG's answer here of also covering "fork-join, parallel decomposition, and the new parallel bulk data operations".)

Communicating Sequential Processes takes the initial interesting step that assignment (to shared memory) is not a primitive operation, communication is. Specifically, this is a two-party action, so every such communication is also a synchronization point, and establishes a happened-before relation à la Lamport. This simple shift in perspective makes things scalable, and the focus now shifts to the (quite possibly dynamic) structure of communication channels between processes. Shared mutable memory is no longer a problem, because there's no global memory to share; you can only share along channels.

Actors similarly take a new primitive operation: (asynchronous) messaging. Actors are by construction so well-protected from each other that not only is the "shared mutable memory" problem completely defined away — but by themselves, actors don't seem to be able to synchronize anything between themselves. I'm still learning about this, but one solution seems to be synchronizers, which could maybe be seen as a common region in which synchronization on shared resources between actors can happen.

These two models are clearly related. In fact, we can encode either model using the building blocks of the other. Which means we can zoom out to a sufficiently high-altitude perspective where they both look the same, and present a common solution next to threads/locks: while threads and locks start out with a fundamentally un-synchronized resource (shared mutable memory) and then tries unsuccessfully to apply massive amounts of cleverness and discipline to apply synchronization in just-enough places to restore correctness, CSP and actors start out with a fundamentally synchronized resource, and then spend the rest of their lives happily ever after. (Although actors also need to add synchronizers as an extra component.)

Threads and locks are still popular despite being fundamentally unworkable, because people naturally tend to think of a solution as a concrete feature. CSP and actors start out by removing and restricting; more of a "negative feature". There's a quote about Dijkstra somewhere about how his main insight about concurrency is that it's not about adding something (such as "threading"), it's about removing a restriction/assumption (such as "the order of the statements (alone) determines the sequencing of the operations at runtime").

The story doesn't end there, either. There's something really interesting going on with what Rust and Koka are doing, basically exploiting the fact that "has a unique owner" is a necessary and sufficient condition for mutability. This is like carving out a "safe space" from the shared mutable memory, and the fact that something shaped a bit like a type system can do that is... interesting. You don't need to write actor { ... } around things to protect them, you just show statically that unique ownership is maintained.

Tying everything back to whether 6model is "actor model ready" — it would need to be in terms of some of the above primitives, I think. CSP/channels, or asynchronous messaging with provably no sharing, or (à la Rust) just provably no sharing. I'm not sure if 6model is more or less ready for any of those than other object systems.

masak · 2022-07-14T06:03:59Z

Have I mentioned / are you aware of Pony and ORCA?

@raiph I don't think so, on both counts.

Skimming the linked page, it sounds amazing, but it's also extremely short on specifics. I would be a thousand times less doubtful of the claims (sound + complete + concurrent) if there was a link to a paper containing algorithms and a number of proofs. Several questions arise, such as:

How can a dedicated actor collect other actors using just message-passing?
How, specifically, are the very different form factors of reference counting (which is usually centralized) and actor distribution (which has no built-in synchronization mechanism) reconciled?
When the word "transitively" is used, that implies a non-local property that needs checking in non-constant time. How is that property checked in a world where a number of synchronization mechanisms are proudly enumerated as not being used as the solution? (I.e. once you have traversed the object graph in order to confirm the "transitively blocked" property, how do you know that the nodes you checked at the start are still unchanged? "Blocked" implies "has no pending messages in its queue" — what prevents new messages from arriving anywhere in the graph while we are traversing it to assert all actors have empty queues?)

MoarVM's garbage collector is pretty cool, but it's far from this design along several dimensions: it's tracing/generational, not reference-counting; and it's based around threads and condvars, not actors and a mysterious lack of a need for synchronization.

masak · 2022-07-14T08:25:20Z

There's a quote about Dijkstra somewhere about how his main insight about concurrency is that it's not about adding something (such as "threading"), it's about removing a restriction/assumption (such as "the order of the statements (alone) determines the sequencing of the operations at runtime").

Ah; the quote I thought about is from this blog post:

Non-determinism

In 1967 R.W. Floyd had introduced non-determinism as an additional language feature. The proposal concentrated on how to implement it. For EWD it was a subtraction: by avoiding what he considered overspecification, it became a natural consequence of the semantics of the guarded-command language [footnote].

masak · 2022-07-19T07:07:29Z

A further thrust at the question of 6model's actor-model-readiness. Here's a quote by Robin Milner from his Turing award lecture:

Now, the pure lambda-calculus is built with just two kinds of thing: terms and variables. Can we achieve the same economy for a process calculus? Carl Hewitt, with his actors model, responded to this challenge long ago; he declared that a value, an operator on values, and a process should all be the same kind of thing: an actor.

This goal impressed me, because it implies the homogeneity and completeness of expression ... But it was long before I could see how to attain the goal in terms of an algebraic calculus...

So, in the spirit of Hewitt, our first step is to demand that all things denoted by terms or accessed by names—values, registers, operators, processes, objects—are all of the same kind of thing; they should all be processes.

I think this pinpoints something important: when Hewitt says "everything is an actor", he means something concrete/specific about his imagined system. 6model can reasonably claim that "everything is an object", and back that up with evidence in terms of abstraction/encapsulation boundaries. For it to take the step to claiming that "everything is an actor", it would need to witness that in terms of process boundaries and inviolable message-passing mechanisms, as the foundation of the whole model. (But when it's put like that, I'm not sure that is, or should be, the goal of 6model.)

If what we mean is "some things can be actors" (rather than the whole system being actor-based), then I believe this was conclusively asserted with the release of jnthn's oo-actors module in 2014. 😄

masak · 2022-07-21T05:54:32Z

Here's another reference for future reading and incorporation into the themes of this thread: The Purely Functional Software Deployment Model (Eelco Dolstra's PhD thesis). It's the basis for the Nix package manager, whose main selling point is that identical inputs give identical outputs — that is, builds are reproducible.

As a relevant aside, it would appear to me that golang is following a convergent evolution here, independently discovering the value of reproducible builds [1] [2]. I hope to get back to both Nix and golang/vgo when I write about "Packages", listed in the OP as one of the tour destinations. The thrust of my argument will be just this, that as an abstraction, packages are not stable/usable unless they guarantee reproducible builds. It's a wonder we get anything done in mainstream software engineering, absent that guarantee.

masak · 2022-07-29T07:19:07Z

The abstraction ladder, identified in the OP, also spans another axis: that between values and resources. Computer science tends to talk about values, but software engineering has an understandable focus on resources.

What's a resource? A database handle is a resource, a file handle is a resource — or they stand in for/represent resources, maybe. It depends how you categorize things. Threads and green threads are resources. Memory (including too-big-to-copy values in memory) is a resource. Input/output on a port is a resource (or the port itself is a resource?). Communication bandwidth is a resource. Network access is a resource. The abstract concept of compute (as in, computational throughput) is a resource. If I plugged a quantum co-processor from 2043 into my machine, it could constitute a resource.

I can give these specific examples, and I feel there is something that they have in common, but I can't clearly delineate that something. Values can avoid being about "the real world" in some sense, but resources are anchored in the real, physical world, and inherit limitations/properties from it. There are probably physicists who love that connection, whereby physics kind of bleeds into computer science.

It feels like there would be some straightforward connection to effects (and coeffects). Are effects simply about acting on a resource? The resource is the noun, the effect is the verb?

At its most pure, lambda calculus can be all about values. At their most expansive, CSP and actor systems can be all about resources. The abstraction ladder seems to straddle the values-resources axis. I wish I understood better how these things are connected.

masak · 2022-07-30T03:20:43Z

But as a baseline, the story must start with the status quo of shared memory, threads, and locks. I'll take all those mechanisms as given — there's oodles of literature on it — but I will mainly point out here that locks don't compose and threads don't scale.

I just stumbled over the userver framework, which looks nicely designed and quite flexible/powerful. This is what met me in the first tutorial I decided to click on:

Warning

Handle* functions are invoked concurrently on the same instance of the handler class. Use synchronization primitives or do not modify shared data in Handle*.

In other words:

Here be dragons (race conditions).
By notifying you (the framework user) of these risks, we (the framework authors) have done our due diligence; race-free semantics is not part of the framework itself, it's part of a general vigilance/discipline wherein sharing/modification is always guarded by the appropriate synchronization.
What, you thought this would be easy?

Later edit: It's all good and well to be sarcastic, but when coming back and reading the above, I didn't want that to be my main point/final word. Specifically, the authors of the userver framework are operating well within the expectations given to them. This (concurrent access) is just that big of a challenge — I think it's fair to say none of us knows the final answer — people of the Rust and Koka communities can see a bit further, sure, but in the end we're all still hunting for solutions, trying them out, and learning for the next iteration.

masak · 2022-07-30T06:39:09Z

I want to call attention to this polemic post whose main message is this:

Provided your problem admits certain concessions that let you retreat from full-on garbage collection into reference counting, you're going to come out way ahead in the bargain.

I realized that I agree with this particular sentence because it's qualified, but then disagreed with the message of the rest of the post because I don't believe the qualification ("...your problem admits certain concessions...") applies in a general-purpose language.

In the post above about first-class functions, we showed that the special subtype of first-class function that people historically called "upwards funargs" have a tendency to escape their location of birth, thereby bringing their lexical context with them in a closure:

Upwards funargs "escape" from the function that defined them. This is the first push from the world of stack allocation to the world of heap allocation.

Of course I should have defined a static embedding guarantee that this breaks along the way:

When a function is called, the environment of the caller is a compatible subset of the environment of the called function.

This is true (a) if there's only ever one global environment and not many small lexical ones, or (b) if functions are not first-class/mobile. I know that's a little bit abstract, so let's break the guarantee three times:

In Bel:

$ perl -Ilib bin/bel
Language::Bel 0.58 -- msys.
> (def make-closure () (let env "closure" (fn () env)))
> (let c (make-closure) (let env "caller" (c)))
"closure"

In Perl 5:

$ perl -wle'
    sub make_closure {
        my $env = "closure";
        sub { $env }
    }

    my $c = make_closure;
    my $env = "caller";
    print($c->());
'
closure

In Alma (untested, but I have no reason to think it wouldn't work; #577):

func makeClosure() {
    my env = "closure";
    return func () { return env };
}

my c = makeClosure();
my env = "caller";
print(c());            // closure

Most modern mainstream languages have "given in" and introduced a mechanism that allows this kind of freedom of movement, whether via escaping first-class functions, or via heap objects being allowed to reference other heap objects. This is where the abstract graph of objects and references in memory turns from a tree into a graph, specifically one with cycles.

That's also the point where reference counting becomes problematic, and garbage collection turns into a more robust solution. The qualification in the post ("Provided your problem admits certain concessions...") is no more nor less than the static embedding guarantee: things are not allowed to escape their location of birth, or mutually reference each other, directly or transitively.

(Wait, what? Where did the "mutually reference" condition come from, weren't we talking about functions and lexical scopes? It comes from the Y combinator, which can be seen as a device that transforms firstclasshood of functions into function self-reference. Similarly, any "tying the knot" technique transforms freedom of heap reference (during object construction) into a cyclical object graph.)

The HN discussion seems pretty balanced. The top comment right now is saying that this is a never-ending and unwinnable debate. The top answer to it is reporting spectacular success with (mostly) reference counting.

The abstraction ladder also spans a spectrum between the specific (like reference counting) and the general (like garbage collection). Maybe the real lesson is that there is no one-size-fits-all point on that spectrum.

masak · 2022-08-09T03:05:18Z

I'm getting ahead of myself here, but not by a whole lot. I realized on the way to work today that there's a great transition between the idea of functions and the idea of modules.

Briefly, it's this: thanks to lexical scoping, functions own much of their data, and it's perfectly hidden from the world. (We'll sweep complexities around stack/heap allocation, first-class closures, and escaping under the rug right now. Just assume they are appropriately taken care of.) Modules are primarily that, a "perfectly hidden" guarantee for some data, but wrapping a group of functions. Keywords like export tend to switch off this perfect hiding; sometimes, inspired by the OO vocabulary, we'll be speaking of private and public instead, even though it's a module.

Various related concepts make this effect clearer. Separate compilation for optimizing the data parts that are private to the module. Types for communicating (statically) across module boundaries. Contract-based programming for refining ensure/provides relationships even further, and separation logic to reason about what memory is owned by what module.

That's it. Modules are, in this sense, a kind of "jumbo function" — the "jumbo" concept borrowed from Levy — they are the answer to the question "what is the biggest programming language feature we can imagine with the features we appreciate in functions?".

masak · 2023-10-02T14:11:48Z

Modules

You know, it's funny. A module, at its barest definition, is a "managed namespace" — literally a set of statically known key/value bindings. But that also means that it's the "step up" that we're looking for, from individual functions (or other types of "behavior", including continuations) to groups of them.

Why would you want to group functions together? I think quite possibly it's a human thing. The computer doesn't much care — the code all gets bundled together in the end, and damn the abstraction boundaries. But when I code up (say) a red-black tree, then there are functions that belong to the red-black tree abstractions, and functions that don't. From that point of view, modules are bounded contexts. No-one's forcing me to; I want to. I like boundaries.

The perfect storm that was ES3 could do modules, but it did it with a function (acting as a "module constructor" and as a privacy boundary) and a returned object (acting as the "managed namespace" of public exports). I'm not saying that's the only way to do it; but seeing it done with as little as functions and objects makes you think, doesn't it?

I think the one who got forever associated with this idea was David Parnas, with his 1971 paper "On the criteria to be used in decomposing systems into modules". It's still a good text, even though it shows its age. tl;dr: Don't modularize according to the "flow chart" (control flow and/or data flow) of your projeact — modularize according to separation of concerns and "information hiding" (that is, the separation of non-public implementation details from public API details).

The ulterior motive when grouping my red-black tree functions together into a module isn't just that I like to group things that I think thematically go together. It's that they share a common representation of the red-black tree that they are operating on. Any change to that representation, or assumptions surrounding it, would affect those functions within the module boundary together, and no functions outside of the boundary.

Things like types and interfaces, contracts and invariants, further help to make explicit those assumptions — which can serve both the module author inside of the boundary, and module users outside of it. I still think something like GOSPEL should be made part of the "you have to be at least this tall to ride" criterion of publishing a module. What especially thrills me about it is that it does for static analysis and interaction what classes do for API/language — it creates a new level of abstraction on which to be absolutely precise.

masak · 2023-10-12T05:04:43Z

But as a baseline, the story must start with the status quo of shared memory, threads, and locks. I'll take all those mechanisms as given — there's oodles of literature on it — but I will mainly point out here that locks don't compose and threads don't scale.

Case in point: this blog post by Chris Wellons, Emacs 26 Brings Generators and Threads, spends most of its expository text about the new Emacs threads pointing out that they don't work:

ThreadSanitizer (TSan) quickly shows that Emacs’ threading implementation has many data races, making it completely untrustworthy. Until this is fixed, nobody should use Emacs threads for any purpose, and threads should disabled at compile time.

As of this writing, we're up to Emacs 29.1, so the situation might for all I know have improved. But the bigger question is why this kind of thing happens with thread implementations — Emacs is just an example here. My main point is — nobody sets out to write a broken thread implementation! And still it happens all over the place.

On the way back to lunch, I got a mental image of a Turing Machine with one tape and N (uncoordinated) read/write heads, all competing and sterically clashing to do their work on the single tape. That's threads. It's a fundamentally unworkable idea.

The solutions, while obvious, are more complicated, or have more moving parts, or require more maintenance.

Processes (or actors) are like N Turing machines each with their own tape. There's no conflict and there can be no conflict. Pretty quickly there is a need for sharing and coordination, though.
Ownership/borrowing/immutability/uniqueness solutions do more to track, statically, the read/write capabilities of some data and references to it, as well as who (uniquely) owns the data or has borrowed it. This clearly seems like a good idea, and leads to a nice "shared XOR mutable" state of affairs; it's just more work, kind of like how maintaining your static types and keeping them consistent is more work.

masak · 2023-10-17T02:57:55Z

On the way back to lunch, I got a mental image of a Turing Machine with one tape and N (uncoordinated) read/write heads, all competing and sterically clashing to do their work on the single tape. That's threads. It's a fundamentally unworkable idea.

As a further (damning) data point on this, check out the opening paragraph of this recent blog article by Rachel by the Bay:

There are more than a few bear traps in the larger Unix environment that continue to catch people off-guard. One of the perennial favorites is thread safety, particularly as it applies to the environment manipulation functions under glibc. The usual warning is that if you run multiple threads, you'd best not call setenv, because if someone else calls getenv, you run a decent chance of segfaulting.

"If you do threads, you'd best not write to shared memory." Wise advice; what's stupid is that we created the possibility in the first place.

masak · 2023-10-19T05:37:13Z

Functions provide a guarantee which will be cast in a sharper light when it is broken later. It doesn't have a name in the literature that I know of, so let's call it the dynamic embedding guarantee. It has two parts:

The entire machine has a single thread of control.

When a function is called, the single thread of control is transferred/borrowed from caller to callee. When the function returns, it is handed back.

I'm watching Kevlin Henney's 2017 talk Procedural Programming: It's Back? It Never Went Away. He mentions the book Software Architecture in Practice which outlines an architecture style called "Main program and subroutine" (in other words, procedural programming), and which has this quote:

There is typically a single thread of control and each component in the hierarchy gets this control (optionally along with some data) from its parent and passes it along to its children.

(Emphasis mine, not Henney's or the authors'.)

Henney says as an aside about the "single thread of control": "yeah, that's gonna come back and bite us later".

It doesn't mention the children handing control back (in LIFO order) to their parents. Which, fair enough, that's not always the case, I guess. There's such a thing as tail calls. Other than that, it's quite stunningly similar to what I described as the "dynamic embedding guarantee" above.

masak · 2023-10-19T05:59:45Z

Whoa. And then later, towards the end of that same talk, Henney quotes a paper by David Gelernter and Nicholas Carriero named Coordination Languages and their Significance:

We can build a complete programming model out of two separate pieces — the computation model and the coordination model.

That is, lambda calculus in the small, pi calculus (or actors) in the large.

I'd call that a pithy restatement of the point of this whole thread.

(Edit: Two further good, clarifying quotes from the same paper.)

The computation model allows programmers to build a single computational activity: a single-threaded, step-at-a-time computation.

The coordination model is the glue that binds separate activities into an ensemble.

That is, mutability + sharing are bad when too close together; but tease them apart into computation and (separately) coordination, and they become manageable.

(Another edit: I just feel I need to quote verbatim from the conclusions section, which again aligns with the goals of this whole thread.)

A broad research effort aimed at the development of general-purpose coordination languages is long overdue. The tangible result would be a tool of great power and significance. The intangible one would be a better understanding of the root problems of computer science. There appears to be an unspoken consensus in much of the research community that every twist and turn in the hardware development path, particularly where parallel machines or networks are concerned, calls for a new language or programming model, a new design, new implementation and new coding methods. In the long run, this approach is intellectually crippling. What are the fundamental questions here?

masak · 2023-10-31T09:53:09Z

On the topic of modules, one of the most significant bits in the the design of a module system is whether to allow cyclic imports. This even hit Alma's very own module issue, #53, and is in some sense still unresolved.

The case for forbidding cyclical imports: They make the module system more complicated. More specifically, the naive/straightforward idea of a module fully binding its dependencies before itself loading, doesn't quite work.
The case for allowing cyclical imports: Mother Nature doesn't care if your modules are mutually dependent or not. More precisely, such a mutual dependency will happen sooner or later (for example, I have one here and here). And yes, they can always be eliminated using enough architectural contortions and dependency injection and stuff. But at that point we're talking about punishing the user on behalf of the implementor.

As you can probably tell, I'm currently leaning towards allowing cycles. In a sense, allowing cycles between modules is tantamount to defining modules via letrec; and when you put it like that, it doesn't sound so bad.

masak · 2023-11-02T06:58:48Z

While lurking around on nlab a while ago, I found this:

A partial function f: A → B is like a function from A to B except that f(x) may not be defined for every element x of A. (Compare a multi-valued function, where f(x) may have several possible values.)

A multi-valued function f: A → B is like a function from A to B except that there may be more than one possible value f(x) for a given element x of A. (Compare a partial-function, where f(x) may not exist at all.)

A while after writing the above, I took the time to understand what nlab means by span, and how that ties into the understanding (or category-theoretical representation/implementation, perhaps) of partial functions and multivalued functions.

It's actually quite simple. No, really.

Start with a normal function, f: A → B; drawn as a category-theory diagram, this looks like two objects A and B, with an arrow f going between them.

Now "split" this diagram into A ← D → B; the new object D in the middle (for "domain") is identical with A for the time being, and so the "→" arrow is the same f as before. The "←" arrow is an identity, since A and D are equal; being an identity, this arrow is both surjective and injective by definition. We have turned the thing into a span, but we've also effectively done nothing.

Now we're set up to implement partial functions and multivalued functions:

Partial function: by making A bigger than D — or perhaps better said, by making the A ← D arrow an embedding — we're saying that the span represents a function that only maps some elements of A (namely, those mapped to from D, the domain) to B. In other words, a partial function. A ← D is no longer surjective.
Multi-valued function: by making the A ← D arrow no-longer-injective — in other words, we allow distinct D elements to map to the same A element — we're saying that the span represents a function that maps certain elements of A to more than one element of B. In other words, a multi-valued function.

Sweet. I particularly like how we create a new backwards arrow from nothing, and it is then "tuned" to successfully explain the otherwise-baffling partial and multivalued functions.

Maybe a good way to say it is that the "→" arrow corresponds to a normal function, while the new factored-out "←" arrow corresponds to a kind of "multiplicity" knob we can tweak.

masak · 2023-11-13T07:11:59Z

Actors similarly take a new primitive operation: (asynchronous) messaging. Actors are by construction so well-protected from each other that not only is the "shared mutable memory" problem completely defined away — but by themselves, actors don't seem to be able to synchronize anything between themselves. I'm still learning about this, but one solution seems to be synchronizers, which could maybe be seen as a common region in which synchronization on shared resources between actors can happen.

Allow me to throw in here, without further comment, the first paragraph of the abstract of the paper When Concurrency Matters: Behaviour-Oriented Concurrency which I just found:

Expressing parallelism and coordination is central for modern concurrent programming. Many mechanisms exist for expressing both parallelism and coordination. However, the design decisions for these two mechanisms are tightly intertwined. We believe that the interdependence of these two mechanisms should be recognised and achieved through a single, powerful primitive. We are not the first to realise this: the prime example is actor model programming, where parallelism arises through fine-grained decomposition of a program's state into actors that are able to execute independently in parallel. However, actor model programming has a serious pain point: updating multiple actors as a single atomic operation is a challenging task.

masak · 2023-11-25T07:11:07Z

Inserting a small reminder here to myself, to later expand this comment into a small discussion about progress indicators, which are a nice illustration of... something. I guess independent processes (UI and task) with bidirectional communication.

As part of researching this, I also found a concept by Donald Norman called Gulf of Execution, a kind of reification of the lack of insight the user has of what the system is currently doing. That feeling when you hit a submit button and... nothing. Is the page submitting my form data now, or did my click accidentally miss the button? There's no feedback, so there's no way to know.

There's also the labor illusion, which arises from the UI and the task being two disconnected activities. The psychological feeling of seeing the progress bar move (either in real percentage or some indeterminate animation) is broken the moment we understand that the task will not progress any more, but the UI claims it will, or does. A bad or unreliable "remaining time" estimate is like a watered-down version of this.

raiph · 2024-01-06T23:21:54Z

Hi, and happy 2024,

I've been disciplined about remaining in pure lurk mode until I'm confident of contributing with a healthy signal-to-not-even-a-tangent ratio, so I will not mention stuff such as concurrency, actors / independent processes with communication, in case I accidentally mention other stuff like what relationship there is, if any, between things like the busy beaver game, unbounded indeterminacy, and quantum indeterminacy.

Instead I am convinced you are exactly the right person, and this is exactly the right thread to be in, to ask that, if you have time this year, please consider reading and doing justice to what is being discussed in this article that was published yesterday: scheme modules vs whole-program compilation: fight, with commentary like this:

This is an annoying result! What do other languages do? Well, mostly they aren’t programmable, in the sense that they don’t have macros. There are some ways to get programmability using e.g. eval in JavaScript, but these systems are not very amenable to “offline” analysis of the kind needed by an ahead-of-time compiler.

For those declarative languages with macros, Scheme included, I understand the state of the art is to expand module-by-module and then stitch together the results of expansion later, using a kind of link-time optimization. You visit a module’s definitions twice: once to evaluate them while expanding, resulting in live definitions that can be used by further syntax expanders, and once to residualize an abstract syntax tree, which will eventually be spliced into the compilation unit.

masak · 2024-01-08T03:05:27Z

Happy 2024, @raiph.

what is being discussed in this article that was published yesterday: scheme modules vs whole-program compilation: fight

Interesting. I will read. And yes, that does feel relevant here to this thread, maybe even to #302.

Will get back to you once I've read the article.

masak · 2024-01-08T06:57:37Z

On the way back to lunch, I got a mental image of a Turing Machine with one tape and N (uncoordinated) read/write heads, all competing and sterically clashing to do their work on the single tape. That's threads. It's a fundamentally unworkable idea.

I'm not the first one to rage against threads, of course. How about, for example, this slide deck Why Threads Are A Bad Idea by John Ousterhout from nineteen ninety five? (Also summarized here.)

Slides are two a page, so on page 3/slide 6, after some contextualization of threads, you finally find out why they are a bad idea:

Synchronization:

Must coordinate access to shared data with locks.

Forget a lock? Corrupted data.

"...one tape and N (uncoordinated) read/write heads, all competing and sterically clashing to do their work..."

Deadlock:

Circular dependencies among locks

Each process waits for some other process: system hangs.

"Use locks, they said. Your threads will be fine if you just use locks, they said." How utterly unfair! Not only do locks, as their primary job, cancel out the concurrency that was the entire point in the first place; there are also ample opportunities to simply hold them wrong and introduce new and painful ways to shoot yourself in the foot. As I wrote in an earlier comment:

locks don't compose and threads don't scale

It doesn't have to be this way. Threads and locks are just poor building blocks. Take somehing utterly comparable, Kahn networks, and see them rush away towards the horizon, accelerating all the while, leaving threads and locks in the dust. Kahn networks scale. Kahn himself is highly aware of this, as seen in his final words in the article:

Our last conclusion is to recall a principle that has been so often fruitful in Computer Science and that is central in Scott’s theory of commutation: a good concept is one that is closed

under arbitrary composition

under recursion.

This (well-earned) pride in how well Kahn networks compose still haunts me. How can we make sure to do more of that, instead of investing in the castle built in a swamp that is locks and threads?

masak · 2024-01-22T03:58:40Z

So what is a nondeterministic algorithm? I make here a provisional attempt at a definition.

A transition relation describes the transitions that are allowed between states inside the system. If this relation is univalent (or functional), that is, from each state (and event) the relation maps to exactly one target state, then the algorithm described by this transition relation is deterministic. Whether or not the univalence restriction holds, the algorithm is nondeterministic.

The inimitable Stephen Wolfram writes about something very similar which he calls multicomputation. The rough idea being this: when we deal with deterministic algorithms, running them is an unambiguous, stepwise activity. But when we deal with a nondeterministic algorithm, the interesting thing is the entire global transition graph — since, almost by definition, most of that graph constitutes the road not taken in any particular run.

Put a bit more simply or poetically, deterministic computation is narrow, but nondeterministic "multicomputation" is wide. There are things going on which involve choices of paths. Things such as winning a chess endgame, or angelic nondeterminism, take as their starting point that we can make informed choices now based on the desired result we want later.

masak · 2024-01-24T02:53:26Z

I find I keep thinking about concurrency. I just stumbled over Wikipedia's article about it, with a surprisingly cogent first two paragraphs, which I will quote verbatim (stripped of links and footnotes, though):

In computer science, concurrency is the ability of different parts or units of a program, algorithm, or problem to be executed out-of-order or in partial order, without affecting the outcome. This allows for parallel execution of the concurrent units, which can significantly improve overall speed of the execution in multi-processor and multi-core systems. In more technical terms, concurrency refers to the decomposability of a program, algorithm, or problem into order-independent or partially-ordered components or units of computation.

According to Rob Pike, concurrency is the composition of independently executing computations, and concurrency is not parallelism: concurrency is about dealing with lots of things at once but parallelism is about doing lots of things at once. Concurrency is about structure, parallelism is about execution, concurrency provides a way to structure a solution to solve a problem that may (but not necessarily) be parallelizable.

Several takeaways, in some order:

The description vibes well with jnthn's description (reproduced above), that "concurrency is in the problem domain, parallelism is in the solution domain". Also Russ Cox's piercing comment that concurrency "is interesting for reasons not of efficiency but of clarity", that is, it's the decomposition that matters here; a kind of design/analysis of the problem.
The whole thing reminds me of samefringe. Summarizing that problem, quickly: you have two binary trees. Define a tree's fringe as the sequence of its leaves in traversal order. We want to find out if the two tree's fringes are equal. A non-lazy solution will just generate two lists in memory, and compare them elementwise. A lazy solution can do traversal (of both trees) and comparison in an interleaved fashion, using less memory and break earlier upon finding a difference. But the beauty is indeed in the decomposition: the two traversals, and the task that governs them and does that equality comparison. A lazy language like Haskell will have a solution that is basically two lines of code, and which to the untrained eye looks sequential (because laziness/concurrency is baked in). In C#, the concurrency is moderated via the IEnumerable interface, and operators like Zip and All. Whereas Ada... seems to throw reams of code at the problem, involving tasks and whatnot, but in the end still not get a solution that stops at the first detected difference.
Finally, this put me in mind of Applicative vs Monad from Haskell. Conor McBride has a good description here (section 4.7):

I came to Haskell from Standard ML, where I was used to writing programs in direct style even if they did naughty things like throw exceptions or mutate references. What was I doing in ML? Working on an implementation of an ultra-pure type theory (which may not be named, for legal reasons). When working in that type theory, I couldn’t write direct-style programs which used exceptions, but I cooked up the applicative combinators as a way of getting as close to direct style as possible.

When I moved to Haskell, I was horrified to discover the extent to which people seemed to think that programming in pseudo-imperative do-notation was just punishment for the slightest semantic impurity (apart, of course, from non-termination). I adopted the applicative combinators as a style choice (and went even closer to direct style with "idiom brackets") long before I had a grasp of the semantic distinction, i.e., that they represented a useful weakening of the monad interface. I just didn’t (and still don’t) like the way do-notation requires fragmentation of expression structure and the gratuitous naming of things.

That’s to say, the same things that make functional code more compact and readable than imperative code also make applicative style more compact and readable than do-notation. I appreciate that ApplicativeDo is a great way to make more applicative (and in some cases that means faster) programs that were written in monadic style that you haven’t the time to refactor. But otherwise, I’d argue applicative-when-you-can-but-monadic-when-you-must is also the better way to see what’s going on.

Pure expressions are tree-like, and the data dependencies are more like the "partial order" mentioned in the Wikipedia article. (That is, there are usually many possible ways to topologically sort and execute the individual subexpressions; as long as everything's pure, the result comes out the same.) This expresses a kind of concurrency, a decomposition of the problem where "happens-before" relationships do not appear exactly everywhere. But in monads and strictly sequential computation, they do — the Applicative viewpoint represents an kind of loosening-up of the strict sequentiality, which can lead both to more beautiful decomposition of subproblems, and perhaps even to opportunities for optimization.

jnthn · 2024-01-24T10:39:37Z

Which a succinct description can't be expected to get into subtleties, the final clause of this:

In computer science, concurrency is the ability of different parts or units of a program, algorithm, or problem to be executed out-of-order or in partial order, without affecting the outcome.

Is glossing over the heart of why I say concurrency is part of the problem domain: how do we define an outcome as "not affected"? That's very much domain specific.

When I used to teach on these topics, I had an example involving a data structure where we first applied locking to all of it, and then switched to fine-grained locking to get better throughput on updates, and then I sprung the question: is this still correct? It was a trick question: the problem specification wasn't actually clear either way on whether the partially applied update that we'd now made observable was problematic or not.

masak · 2024-01-25T01:47:26Z

how do we define an outcome as "not affected"? That's very much domain specific.

Paraphrasing you in order to understand better myself, here's what I got:

Yesterady I happened to read about trace monoids (aka dependency graphs, aka history monoids), which are like regular strings except that some pairs of adjacent characters are "loose", commuting with each other freely in zero or more substrings of the string. (Other parts are "fixed", and work like regular substrings.) We define equivalence on traces so that two traces with the same fixed sections, and loose sections which can be permuted into each other, are considered equal. Seen differently, we "quotient away" all the commuting that can go on, and consider each big chunk of inter-commutable substrings as a single individual thing (and the fixed parts a boring special case) — comparison then becomes a matter of directly comparing such things.

The choice is in which things get to commute. ("Things"? They're characters when the traces are presented as strings, but more realistically we're talking about observable events. Hm, event structures seem a slightly distinct thing from trace monoids.) There's a spectrum of sorts between "nothing commutes" (and our trace degenerates to a string) and "everything commutes" (and our trace degenerates to a Bag, I guess). But the spectrum is not a line; it has some interesting lattice-like structure.

and then switched to fine-grained locking to get better throughput on updates, and then I sprung the question: is this still correct? It was a trick question: the problem specification wasn't actually clear either way on whether the partially applied update that we'd now made observable was problematic or not.

By switching to fine-grained locking, some new pairs of events get to commute in the trace monoid. Is this still correct? It depends if we were fine with those events commuting. Were we fine with that? That's a question of specification, and most projects do not have a formal specification process (at least not to that level of detail), and do not ask that question before it arises in practice. "We don't quite know what we want" is the norm, or rather, formal specifications are considered to be for NASA and Boeing, but not for most of the rest of us. Certainly not a random project choosing between a ConcurrentHashMap and a synchronized HashMap.

masak · 2024-04-22T14:26:24Z

Now we're set up to implement partial functions and multivalued functions:

Partial function: by making A bigger than D — or perhaps better said, by making the A ← D arrow an embedding — we're saying that the span represents a function that only maps some elements of A (namely, those mapped to from D, the domain) to B. In other words, a partial function. A ← D is no longer surjective.

Multi-valued function: by making the A ← D arrow no-longer-injective — in other words, we allow distinct D elements to map to the same A element — we're saying that the span represents a function that maps certain elements of A to more than one element of B. In other words, a multi-valued function.

This may or may not have been obvious when I wrote the above, but now I think think the above is about trying to invert a function.

If the function f is bijective, its inverse f⁻¹ is a (bijective) function
If the function f fails to be surjective, its inverse is a partial function
If the function f fails to be injective, its inverse is a multi-valued function

Because the red herring principle applies in the latter two cases, what we're saying is that inverting f didn't quite give us a function; the inverse is partial to the extent f fails to be surjective, and multi-valued to the extent f fails to be injective.

Edit: Applying this perspective to parsing is surprisingly fruitful. You fix a grammar, and take your f to be a forwards function from derivations ("parse trees") to their corresponding input strings. In other words, f just collapses the whole tree structure by straightforwardly concatenating all the terminals together. Now what about the inverse, f⁻¹?

If the function f fails to be surjective, its inverse is a partial function — meaning some input strings don't have a derivation. Very likely to happen, since intuitively "most inputs are not syntactically valid programs", no matter what your grammar is. The negative space, the "holes" where no valid programs reside, are there partly to throw up an error when you wrote some nonsense, instead of blithely accepting it. (Both Perl and APL have been unjustly accused of having too few such holes.)
If the function f fails to be injective, its inverse is a multi-valued function — meaning some inputs have several possible derivations, meaning the grammar is ambiguous.

masak · 2024-04-25T03:46:42Z

Coming back to Pony and ORCA:

Skimming the linked page, it sounds amazing, but it's also extremely short on specifics. I would be a thousand times less doubtful of the claims (sound + complete + concurrent) if there was a link to a paper containing algorithms and a number of proofs. Several questions arise, such as:

How can a dedicated actor collect other actors using just message-passing?

How, specifically, are the very different form factors of reference counting (which is usually centralized) and actor distribution (which has no built-in synchronization mechanism) reconciled?

When the word "transitively" is used, that implies a non-local property that needs checking in non-constant time. How is that property checked in a world where a number of synchronization mechanisms are proudly enumerated as not being used as the solution? (I.e. once you have traversed the object graph in order to confirm the "transitively blocked" property, how do you know that the nodes you checked at the start are still unchanged? "Blocked" implies "has no pending messages in its queue" — what prevents new messages from arriving anywhere in the graph while we are traversing it to assert all actors have empty queues?)

I just found this grimoire of memory management techniques, and Pony+ORCA are number 11 on the list:

11: Tracing garbage collection is familiar to all of us, but there's a surprising twist: there's a secret way to make a garbage collector without the stop-the-world pauses! Pony does this: by separating each actor into its own little world, each actor can do its own garbage collection without stopping the other ones. Its ORCA mechanism then enables sharing data between the worlds via an interesting reference counting message passing mechanism.

This is reassuring (especially as it's written by someone who seems to know what he's talking about). I will try to find out more about how, specifically, Pony achieves this.

masak · 2024-05-08T02:10:57Z

I found another quote by Dijkstra, which talks about parallelism, nondeterminism, and concurrency. (It's in an EWD which criticizes Ada, then called "GREEN".)

When they write (RAT p.1-25) "It was also felt that whereas _nondeterminism_ _is_ _the_ _essence_ _of_ _parallelism_, its introduction for sequential programming would not appear natural to most users." (my underlining) they reflect several misunderstandings in a single sentence. First of all they ignore that it has been recognized already several years ago that nondeterminism and concurrency are separate issues in the sense that both nondeterministic programming languages whose obvious implementation does not introduce concurrency and deterministic programming languages whose implementation obviously admits concurrency are not only quite conceivable, but even worth our attention. Secondly they confuse in their appeal to what seems "natural to most users" the notion "convenient" with the notion "conventional".

Two takeaways:

Dijkstra's main point is that the deterministic/nondetermistic axis is orthogonal from the non-concurrent/concurrent axis. That point is well taken, and I agree.
Dijkstra says "parallelism" when he literally quotes the GREEN rationale, and then consistently uses the word "concurrency" himself. I would write that off as confusion or coincidence, but he did exactly the same in that other quote I found. To me, the most likely hypothesis is that Dijkstra thinks of "parallelism" and "concurrency" as identical concepts, but he prefers to use the term "concurrency" himself.

masak · 2024-06-20T11:01:26Z

Again, on parallelism versus concurrency; I just found this very nice summary by Guy Steele, in his talk "How to Think about Parallel Programming: Not", @ timestamp 29:45:

I want to distinguish between parallelism and concurrency here.

There are some kinds of code which are modeling things which are going on in the real world, and are connected with other processes you have no control over, other people, the internet, that kind of thing — that calls for concurrency. Usually those are situations where you have different things going on at the same time, and they're competing for resources. They're competing for processors, they're competing for memory, for a database, for the user's attention. There's some form of competition.

I'm talking about parallelism where there's either one task at hand, but you have many resources and you're trying to bring all those resources to bear to solve one problem. They're working cooperatively rather than competitively.

In a typical application, there are some parts that need to be concurrent, there are some parts where you really care that something be sequential — I really care that this happens, then this, then this — and there's a large mass in the middle where it doesn't have to be sequential, it doesn't have to be strictly concurrent, but if there are parallel resources that can be brought to bear we'd like to do that, and do it effectively without having to worry too much about it.

Interesting. I take two things from this:

Concurrency is imposed upon us (by the problem domain), parallelism is a tool we choose (in the solution domain).
Parallelism is somehow in the middle of a spectrum, where the endpoints are "this needs to be sequential" and "this needs to be concurrent/is concurrent whether you want it or not".

masak · 2024-07-24T02:32:28Z

Dropping this blog post, Ruby methods are colorless, into this ongoing issue, because it feels like it provides additional value in the discussion. I like the nesting relation Fiber < Thread < Ractor < Process; it feels clean, somehow.

Hoping I can circle back and delve a bit more into the text of that post sometime later.

HN discussion.

masak · 2024-07-26T06:01:57Z

Picking up the partial function/multivalued function trail a little bit (which is next door, conceptually, to failure semantics and nondeterminism):

I found the first half of the video Converting Non-Deterministic Finite Automata to Deterministic Finite Automata quite helpful in gently providing a conceptual understanding of what's happening in automata:

First introducing "the empty set" as meaning "we transitioned to no state"
Then showing how we can model this introducing a fresh auxiliary "failure state"
(Somehow along the way sneaking in the notion that we are actually transitioning to sets of states, not states)
Then showing how nondeterminism in automata is nothing more or less than transitioning to a set of states

The second half of the video basically goes through the powerset construction and how to carry it out in practice.

Then showing how to "upgrade" states to actually mean "sets of states"; this move ends up being a closure operator rather than an infinite regress, because we can identify the (old) states with the (new) singleton states, and then we just get O(2^N) new states to handle

I plan to, when time permits, do a deeper dive into Finite Automata and Their Decision Problems; with all the confusion out there about nondeterminism, this seems to be the only way to ground things in a definitive truth.

masak · 2025-01-08T07:54:16Z

Just dropping in here to quickly dump Why I Want Tail Recursion (by Joe Marshall) into the stew that is this ongoing discussion of large-scale language features.

The reason I want tail recursion is [...] to write in continuation passing style if I need to.

If your language has lambda and tail recursion, it can implement any other control flow that might have been overlooked by the language designer. If it doesn’t, you're limited to the control flow the language designer bothered to implement.

I agree with this message, and think that TCO should be the default in more languages, not just Scheme, SML, OCaml and a few others. But there's something going on wherein enter/exit stack-based thinking acts as a very strong attractor, and the "majority vote", whereas continuation-based thinking is relegated to the sidelines as a fringe idea.

I hope to write more about that here, and something I've started thinking of as "the Standard Model" of computing/evaluation in programming languages. It encompasses the above stack/CPS axis, but also several more.

raiph · 2025-01-08T20:51:34Z

Happy 2025. 😊

something I've started thinking of as "the Standard Model" of computing/evaluation

.oO ( "Classic human thinking in early 2025!" -- o3 )

Excerpting the abstract of a paper published in April 2024:

In this work, we provide a complete characterization of the properties of control flow abstractions that are correctly realizable on a quantum computer. First, we prove [a theorem that] denies the ability to directly lift general abstractions for control flow such as the lambda-calculus from classical to quantum programming. In response, we present the necessary and sufficient conditions for control flow to be correctly realizable on a quantum computer. We introduce the quantum control machine, an instruction set architecture featuring a conditional jump that is restricted to satisfy these conditions. We show how this design enables a developer to correctly express control flow in quantum algorithms using a program counter in place of logic gates.

masak · 2025-01-09T01:59:38Z

Happy 2025. 😊

And to you as well, sir. Here's to this perfect square of a year 🍻; may it be the appropriate amount of perfect, especially as far as realizing the incipient Alma spec is concerned!

something I've started thinking of as "the Standard Model" of computing/evaluation

.oO ( "Classic human thinking in early 2025!" -- o3 )

Heh, I just realized that writing things in public about the current LLM/chatbot hype is extremely fraught, since things are changing so rapidly, and it's just overwhelmingly easy to end up on the wrong side of history somehow. 😄 So, in response, I'll just chuckle dryly, and then drop the subject...

Instead of outlining the entire Standard Model (for which the margin in which I write this reply is not wide enough), let me just give one obvious example: mainstream languages are deterministic by default. This is a strong default for "physical" reasons: the machine the program executes on has one "current state" per thread of execution, and the program makes progress by destructively updating that state. A properly nondeterministic setup would allow for the current state (a) to be duplicated into N copies, each of which is then destructively updated and each of which is equally considered the new current state, or (b) to be used as the basis for non-destructively creating N new current states (without destructive updates). Either way, physical machines only have one "slot" for the current state (usually in the form of physical parts of the CPU), and so can't model or even easily emulate nondeterminism. (And when we try, we end up with crude approximations, such as backtracking, where different potential paths "take turns" being the current state.)

And yet, there is a sense in which nondeterminism is a cleaner, more pure, less special-cased notion of computation. (Leslie Lamport explains it well, in my opinion. But it's really Dijkstra's old insight about nondeterminism being a way to avoid overspecification, to constrain the program less.) I guess the long and short of it is that if we think abstractly about computation, then nondeterminism is a beautiful (and powerful) tool; but if we think concretely about machines ("It's called Computer Science, not Adam and Steve!") then we're constrained by pesky physical limitations.

Excerpting the abstract of a paper published in April 2024:

In this work, we provide a complete characterization of the properties of control flow abstractions that are correctly realizable on a quantum computer. First, we prove [a theorem that] denies the ability to directly lift general abstractions for control flow such as the lambda-calculus from classical to quantum programming. In response, we present the necessary and sufficient conditions for control flow to be correctly realizable on a quantum computer. We introduce the quantum control machine, an instruction set architecture featuring a conditional jump that is restricted to satisfy these conditions. We show how this design enables a developer to correctly express control flow in quantum algorithms using a program counter in place of logic gates.

I will read this; thanks. By sheer coincidence, this week I've been cracking open my Mike and Ike in order to re-acquaint myself with quantum computing. I guess my feeling was that quantum computing is in the air these days, and it doesn't hurt to be up-to-date on how it works.

(One thing I learned, incidentally, is that although quantum teleportation is real and Actually Works, it cannot be used the way described in Singularity Sky by Charles Stross, for faster-than light communication using separated entangled qubits in a "one-time pad". The reason is that for each qubit instantaneously "sent", one needs to also send two classical bits, otherwise there's no way to read off the qubit at the other end. Much like the Eschaton in that story, Nature seems to have natural limitations that prevent causality violations.)

masak · 2025-02-07T03:20:45Z

Taking a step back, there's a sense in which this issue thread derailed (in the best of ways) to talking about all kinds of "in the small" issues: multiple values, the object/closure koan, event models, basic blocks, CPS, structural type systems, assignables/mutables, threads and locks, happened-before relations and weak memory models, the ever-elusive nondeterminism, effects, stack-based VMs, what bloody color your function is, parallelism and concurrency, angelic models, optional chaining and the management of partiality, cyclic imports, asynchronous tasks and progress indicators (which I still hope to write more about), synchronization and deadlock, Stephen Wolfram's bloody multicomputation, Guy Steele's adorable obsession with monoids and the associative law, and (barely mentioned but on my mind a lot, and lingering in the fringes of some of the discussions) linear types, substructural thinking, and resources. All of which is to say, when we dream about computing in our philosophy, there are more things in Heaven and Earth. Some of them end up in this issue thread. I like that.

I just wanted to make a relatively simple point in this comment, actually. Namely, the three terms "library", "module", and "package" have all been used for a similar concept during the history of computing: a unit of code promoting distribution, re-use, encapsulation, and modularity. But because of their different flavor/origin/connotation, they seem to put the focus on different things, even when there's overlap.

I thought it might be interesting to highlight the origins and etymology of the three words.

Library. From Latin libraria ("collection of books") and librarium ("container for books"). See also Ancient Greek βιβλιοθήκη (bibliothēkē), also meaning "book container". This idea was so obvious for computing that Babbage used the term "library" in the software sense in a paper in 1888. Fortran, Cobol, and Simula all seem to have encouraged use of the "library" concept in the software sense. IBM System/360 did, too. More than the other two concepts, libraries are more directly associated with the act of linking, which says more about our actual history of computing than about the original term.
Module. Latin modus ("measure" or "manner") is the same root as in e.g. "modal logic". But modulus is a diminutive and means "a small measure". This is a very popular word, in physics, mathematics, and computing. (For example, the "modulo operation" % is based on this root; probably an ablative modulō?) Sometimes other words are used for this concept, such as "assembly" (.NET), "package" (Go, Dart, Java), or "unit" (Pascal). Inextricably associated with the languages Modula and Modula-2 (especially the latter), the idea of a module is clearly born out of the preceived solutions to the software crisis of 1968 (which never truly was declared resolved). The focus is of a module as an interchangeable, independent part of a bigger software system. This leads naturally to other beneficial concerns such as information hiding (à la David Parnas), and APIs/interfaces (à la Reynolds and Liskov, to name just two).
Package. From Dutch pakkage ("baggage"); the ending -age is clearly French, and the word pack "appears early in Anglo-Latin and Anglo-Norman French in connection with the wool trade". A synonym is "bundle", also a word that comes from Dutch, and which seems to be a past participle of "bind", which comes from Sanskrit बन्धति ("bhandhati"). This word "bundle" has a more specific sense in the JavaScript community, ultimately rooted in the lack of a language-sanctioned module system pre-ES6. Anyway, the focus of "package" is about the container itself, enclosing or protecting a product for distribution and use. There's a sense in which the package is "active" during shipping to a destination, and then this outer shell is opened and discarded, and the sensitive contents are "unpacked" and used. Rust speaks here of crates, which is also a clear shipping metaphor. I don't know if crates are different from packages in any significant way. Docker has a whole container metaphor going; although now moving away a little from the pure library/module/package axis, clearly also a shipping metaphor. Docker's logotype is a whale carrying shipping containers. Some package managers, notably CPAN, make a noun out of "distribution" and treat that as a shippable unit, which then contains one or more packages.

Anyway, this has been today's etymology and connotation musing. I don't have a grand conclusion to this, but I find it endlessly fascinating how programming is inextricably made up of metaphor, and the words themselves are like rich lodes of buried ideas that we can mine for connection and meaning.

masak · 2025-02-07T03:36:22Z

The focus is of a module as an interchangeable, independent part of a bigger software system.

I wanted to also mention, but forgot, how this ideal of "interchangeable parts" was practically never reached in real-world languages and systems. There's probably a fascinating untold story about why that is — and digging deeper into it might be worthwhile — but as a first approximation, the long list of "in the small" aspects of programming that I summarized in the above comment tend to leak out into the modularization effort, and we're simply not at a point in history where we're on top of that leakage. Attempts at making this better sometimes fail (such as Java's checked exceptions), and sometimes fare better (Rust's ownership and borrow checker, Koka's effects, GOSPEL). But the story on plugging the leakage is clearly not fully written.

masak · 2025-02-07T05:28:36Z

[...] as a first approximation, the long list of "in the small" aspects of programming that I summarized in the above comment tend to leak out into the modularization effort [...]

Meanwhile, by Bartosz Milewski on Twitter:

If you define OO as programming against well-defined interfaces, it makes perfect sense. The problem is how much information is encoded in the interface and how much is hidden. OO opts for hiding state and side effects, which is fine as long as you don't care about concurrency.

(This was the insight I had when reading the first chapter of Java Concurrency in Practice. The feeling was mostly wordless, but basically "Oh wow, OOP really sucks at its job, doesn't it? It promised encapsulation of private data, and then it falls apart like a house of cards at the first data race." Like, if those are your fundamental building blocks, there's no reliable way to get it right.)

masak · 2025-02-07T06:40:59Z

A while after writing the above, I took the time to understand what nlab means by span, and how that ties into the understanding (or category-theoretical representation/implementation, perhaps) of partial functions and multivalued functions.

Another useful way to think about it is via preimages and fibers.

Visualize that usual image with two sets A and B (drawn as standing-up ellipses, traditionally). The function A → B is shown as directed arrows going left-to-right, from elements in A to elements in B. Now, as usual with functions (as opposed to more general binary relations):

Each element in A has at least one arrow going out of it. (The function is total, that is, it's not a partial function.)
Each element in A has at most one arrow going out of it. (The function is single-valued, that is, it's not a multivariate function.)

Note how we're imposing restrictions only on the domain A's end of the arrows; there are no restrictions at all on the codomain B's arrowheads. In particular, a function is allowed to fail to be surjective, which means there will be elements of B without an arrow pointing to them; and a function is allowed to fail to be injective, which means there will be elements of B to which more than one arrow points. (Again, none of this is much of a problem, until you start wanting to invert the function.)

The preimage or "inverse image" of a subset of B is the set of all elements of A having an arrow pointing into that subset of B. So it's "back-translating" the subset of B along the function. Since we're doing it on a subset and not a single element, we have already shielded us against the issues of partiality and multivariance — an element of B to which no arrow points simply doesn't show up in the resulting subset of A; an element of B to which several arrows point is simply represented by all those elements in A that point to it.

A fiber is a special case of a preimage, when we're considering a singleton subset of B; that is, we're asking the question about just one point of B. Wikipedia points out that there is frequent "abuse of notation": people want to treat this inverse operation as if it were a function, even though there's no guarantee that it is.

Is this useful? I say that it is. One of my favorite papers, "Parsing as a lifting problem and the Chomsky-Schützenberger representation theorem", invokes taking fibers on the task of parsing using a context-free grammar:

We will see that many standard concepts and properties of context-free grammars and languages can be formulated within this framework, thereby admitting simpler analysis, and that parsing may indeed be profitably considered from a fibrational perspective, as a lifting problem along a functor from a freely generated operad.

The setup is something like this: a grammar generates a language, which is a set of strings. This generation process is the "forwards" direction of the function. When we ask "does this input string parse according to this grammar", we're conceptually running the generation function in reverse, going from a possible string to zero, one, or many possible derivations/parse trees of that string. We're asking a question about a fiber.

masak mentioned this issue Aug 5, 2024

Relationship of P6 and/or 007 macros to FEXPRs #302

Closed

Reasons for modules #569

Reasons for modules #569

Comments

masak commented Oct 6, 2021 • edited Loading

masak commented Oct 6, 2021

Functions

masak commented Oct 6, 2021 • edited Loading

Coroutines

masak commented Oct 6, 2021 • edited Loading

First-class functions

masak commented Oct 6, 2021 • edited Loading

Continuations

masak commented Oct 6, 2021

vendethiel commented Oct 11, 2021

masak commented Oct 11, 2021

vendethiel commented Oct 11, 2021 • edited by masak Loading

masak commented Oct 12, 2021

vendethiel commented Oct 12, 2021 • edited Loading

masak commented Dec 25, 2021

masak commented Feb 24, 2022

masak commented Mar 9, 2022

First-class functions

masak commented Mar 16, 2022

masak commented Apr 2, 2022

raiph commented Apr 2, 2022 • edited Loading

masak commented Apr 8, 2022 • edited Loading

masak commented Apr 8, 2022

masak commented May 25, 2022

masak commented Jul 12, 2022 • edited Loading

raiph commented Jul 13, 2022

masak commented Jul 14, 2022 • edited Loading

masak commented Jul 14, 2022

masak commented Jul 14, 2022

Non-determinism

masak commented Jul 19, 2022

masak commented Jul 21, 2022

masak commented Jul 29, 2022

masak commented Jul 30, 2022 • edited Loading

Warning

masak commented Jul 30, 2022 • edited Loading

masak commented Aug 9, 2022

masak commented Oct 2, 2023

Modules

masak commented Oct 12, 2023 • edited Loading

masak commented Oct 17, 2023 • edited Loading

masak commented Oct 19, 2023

masak commented Oct 19, 2023 • edited Loading

masak commented Oct 31, 2023 • edited Loading

masak commented Nov 2, 2023 • edited Loading

masak commented Nov 13, 2023 • edited Loading

masak commented Nov 25, 2023 • edited Loading

raiph commented Jan 6, 2024

masak commented Jan 8, 2024

masak commented Jan 8, 2024 • edited Loading

masak commented Jan 22, 2024 • edited Loading

masak commented Jan 24, 2024

jnthn commented Jan 24, 2024

masak commented Jan 25, 2024 • edited Loading

masak commented Apr 22, 2024 • edited Loading

masak commented Apr 25, 2024

masak commented May 8, 2024 • edited Loading

masak commented Jun 20, 2024

masak commented Jul 24, 2024

masak commented Jul 26, 2024 • edited Loading

masak commented Jan 8, 2025

raiph commented Jan 8, 2025 • edited by masak Loading

masak commented Jan 9, 2025

masak commented Feb 7, 2025 • edited Loading

masak commented Feb 7, 2025

masak commented Feb 7, 2025

masak commented Feb 7, 2025

masak commented Oct 6, 2021 •

edited

Loading

masak commented Oct 6, 2021 •

edited

Loading

masak commented Oct 6, 2021 •

edited

Loading

masak commented Oct 6, 2021 •

edited

Loading

vendethiel commented Oct 11, 2021 •

edited by masak

Loading

vendethiel commented Oct 12, 2021 •

edited

Loading

raiph commented Apr 2, 2022 •

edited

Loading

masak commented Apr 8, 2022 •

edited

Loading

masak commented Jul 12, 2022 •

edited

Loading

masak commented Jul 14, 2022 •

edited

Loading

masak commented Jul 30, 2022 •

edited

Loading

masak commented Jul 30, 2022 •

edited

Loading

masak commented Oct 12, 2023 •

edited

Loading

masak commented Oct 17, 2023 •

edited

Loading

masak commented Oct 19, 2023 •

edited

Loading

masak commented Oct 31, 2023 •

edited

Loading

masak commented Nov 2, 2023 •

edited

Loading

masak commented Nov 13, 2023 •

edited

Loading

masak commented Nov 25, 2023 •

edited

Loading

masak commented Jan 8, 2024 •

edited

Loading

masak commented Jan 22, 2024 •

edited

Loading

masak commented Jan 25, 2024 •

edited

Loading

masak commented Apr 22, 2024 •

edited

Loading

masak commented May 8, 2024 •

edited

Loading

masak commented Jul 26, 2024 •

edited

Loading

raiph commented Jan 8, 2025 •

edited by masak

Loading

masak commented Feb 7, 2025 •

edited

Loading