losing language features: some stories about disjoint unions
Thursday, 17 July 2025 23:56![[personal profile]](https://www.dreamwidth.org/img/silk/identity/user.png)
A long time ago I wrote on twitter (now erased): "surprising how much computer stuff makes sense viewed as tragic deprivation of sum types".
Sum types (a.k.a. disjoint unions a.k.a. tagged unions a.k.a. safe variant types) are of course wonderful and nice and everyone should have them. ML and Haskell have them, and now Rust has them, as does Swift, and Scala with case classes, and C++ kinda with std::variant, and .. lots of modern languages are growing them. Great, hurrah.
But there is just a little bit of subtlety to doing them right. The subtlety is that you have to build the language to couple together two pieces of data -- a tag (or "discriminant") and an unsafe union -- with mandatory syntactic constructs (switch or match expressions) so that you only get to access the unsafe union elements when you've already checked the tag, and the type system and name resolution systems know you're in a given case-block, and so only let you access to the union-fields (properly typed) corresponding to the tag case. It's not a hugely complex thing to get right, but you have to get it right.
There are three main degenerate ways you can fail to get it right.
Ok so .. Casey Muratori gave a great recent talk on the origin of, well, certain OOP-y habits of encapsulation, which is mostly about Entity-Component-System organization, but .. also a bit about sum types. He spent a lot of time digging in the PL literature and reconstructing the history of idea transmission. I just watched it, and it's a great talk and you should go watch it, it's here:
One of the things he discusses in here is that safe and correctly-designed disjoint unions aren't just an ML thing, they were around in the early 60s at least. Wikipedia thinks Algol 68. Muratori places their origin in Doug Ross and/or Tony Hoare talking about amendments to Algol 60, linking to this Tony Hoare paper but of course Hoare references PL/I there and I'm honestly not sure about the exact origin, it's somewhere around there. Point being it predates ML by probably a decade.
But another thing Muratori points out is that is that Dahl and Nygaard copied the feature in safe working form into Simula, and Stroustrup knew about it and intentionally dropped it from C++, thinking it inferior to the encapsulation you get from inheritance. This is funny! Because of course C already had case #3 above -- completely unchecked/unsafe unions, they only showed up in 1976 C, goodness knows why they decided on that -- and the safe(ish) std::variant type has taken forever to regrow in C++.
I was happy to hear this, because it mirrors to some extent another funny story I have reconstructed from my own digging in the literature. Namely about the language Mesa, a Butler Lampson project from PARC, very far ahead of its time too. There's far too much to discuss about Mesa to get into here -- separate compilation, interface/implementation splits, etc. -- but it did also have safe variant types (along with a degenerate form called "computed" which is unsafe). Presumably it picked them up from Algol 68, or Simula 67, or one of probably dozens of languges in the late 60s / early 70s it emerged from. No big deal.
Where that relatively unremarkable fact turns into a funny story is during a "technology transfer" event that happened during Mesa's life: Niklaus Wirth took a sabbatical to PARC, and became quite enamoured with Mesa, and went back home to work on what became Modula and eventually Modula 2. But Modula 2 had only degenerate variant types! He copied them not from Mesa, but in the same busted form they existed in Pascal, completely missing that Mesa did them correctly. Modula 2 variants are degenerate case #1 above (if you turn on a special checking compilation mode) otherwise case #2: tags declared but no checking at all! He even writes it up in the report on Modula 2 and Oberon, criticizing its lack of safety while simultaneously citing all the ways Mesa influenced his design. Evidently not enough.
Anyway, all this is to say: language features are easily broken, mis-copied, forgotten or intentionally omitted due to the designer's pet beliefs. Progress is very circuitous, if it exists at all!
Sum types (a.k.a. disjoint unions a.k.a. tagged unions a.k.a. safe variant types) are of course wonderful and nice and everyone should have them. ML and Haskell have them, and now Rust has them, as does Swift, and Scala with case classes, and C++ kinda with std::variant, and .. lots of modern languages are growing them. Great, hurrah.
But there is just a little bit of subtlety to doing them right. The subtlety is that you have to build the language to couple together two pieces of data -- a tag (or "discriminant") and an unsafe union -- with mandatory syntactic constructs (switch or match expressions) so that you only get to access the unsafe union elements when you've already checked the tag, and the type system and name resolution systems know you're in a given case-block, and so only let you access to the union-fields (properly typed) corresponding to the tag case. It's not a hugely complex thing to get right, but you have to get it right.
There are three main degenerate ways you can fail to get it right.
- You can give users syntactically unguarded access to union members, say by using container.field syntax, in which case all you can do if the tag doesn't match that field at runtime is to raise a runtime error, which you can at least do systematically, but the ergonomics are lousy: it's inefficient (you wind up checking twice) and it doesn't help the user avoid the runtime error by statically forcing cases to be handled.
- You can do #1 but then also fail to even raise a runtime error when the tag is wrong. At which point the tag is basically "advisory", so then...
- You can even go a step further than #2 and not even require users to declare a tag. Just let the user "know the right case" using some unspecified method, an invariant they have to track themselves. They can use a tag if they like, or some bits hidden somewhere else, who knows.
Ok so .. Casey Muratori gave a great recent talk on the origin of, well, certain OOP-y habits of encapsulation, which is mostly about Entity-Component-System organization, but .. also a bit about sum types. He spent a lot of time digging in the PL literature and reconstructing the history of idea transmission. I just watched it, and it's a great talk and you should go watch it, it's here:
One of the things he discusses in here is that safe and correctly-designed disjoint unions aren't just an ML thing, they were around in the early 60s at least. Wikipedia thinks Algol 68. Muratori places their origin in Doug Ross and/or Tony Hoare talking about amendments to Algol 60, linking to this Tony Hoare paper but of course Hoare references PL/I there and I'm honestly not sure about the exact origin, it's somewhere around there. Point being it predates ML by probably a decade.
But another thing Muratori points out is that is that Dahl and Nygaard copied the feature in safe working form into Simula, and Stroustrup knew about it and intentionally dropped it from C++, thinking it inferior to the encapsulation you get from inheritance. This is funny! Because of course C already had case #3 above -- completely unchecked/unsafe unions, they only showed up in 1976 C, goodness knows why they decided on that -- and the safe(ish) std::variant type has taken forever to regrow in C++.
I was happy to hear this, because it mirrors to some extent another funny story I have reconstructed from my own digging in the literature. Namely about the language Mesa, a Butler Lampson project from PARC, very far ahead of its time too. There's far too much to discuss about Mesa to get into here -- separate compilation, interface/implementation splits, etc. -- but it did also have safe variant types (along with a degenerate form called "computed" which is unsafe). Presumably it picked them up from Algol 68, or Simula 67, or one of probably dozens of languges in the late 60s / early 70s it emerged from. No big deal.
Where that relatively unremarkable fact turns into a funny story is during a "technology transfer" event that happened during Mesa's life: Niklaus Wirth took a sabbatical to PARC, and became quite enamoured with Mesa, and went back home to work on what became Modula and eventually Modula 2. But Modula 2 had only degenerate variant types! He copied them not from Mesa, but in the same busted form they existed in Pascal, completely missing that Mesa did them correctly. Modula 2 variants are degenerate case #1 above (if you turn on a special checking compilation mode) otherwise case #2: tags declared but no checking at all! He even writes it up in the report on Modula 2 and Oberon, criticizing its lack of safety while simultaneously citing all the ways Mesa influenced his design. Evidently not enough.
Anyway, all this is to say: language features are easily broken, mis-copied, forgotten or intentionally omitted due to the designer's pet beliefs. Progress is very circuitous, if it exists at all!