Keep Working, Worker Bee!

11.08.2005

Are disjoint unions a lurking horror of typed programming languages?

I've been reading about the XDuce project recently for our PL reading group here at Chicago. XDuce is a domain-specific language for writing XML transformations in a totally type-safe way: not only do XML values need to match their schemas, but your program needs to respect that schema too. For instance, if you're writing a program to process XML documents that look like

<A><B>[some string]</B><B>[some other string]</B> ...</A>

then the type system won't let you try to select a tag named <C>...<C> out of it.

To make that happen, XDuce introduces a really neat idea called regular-expression types. Regular expression types are just what they sound like: in addition to ground types like string, you also get sequencing, Kleene-star, alternation, and so on. (Also, because it's XML, you get "tag types": A[ty] is a tag named A whose contents have the given type ty.) So a possible type for the XML documents above might be A[B[string]*]. Pretty cool. But the really cool part is that they define the subtyping relation to be the same as the subset relation: each type describes a set of trees, and if you have two types t1 and t2, and t1 describes a strict subset of t2 then t1 <: t2. For instance,

A[B[string],B[string],B[string]] <: A[B[string]*]

and
A[B[string]*] <: A[any]

This is given in the XDuce literature purely as a way of making a type system that would work with XML data, and I think it's obviously useful for that purpose. During our discussion, though, the reading group started speculating as to whether it would be useful for general programming too. The MLians in the group (i.e. everyone but me) argued that it wouldn't be, since ML's tagged union types give you the same power in a more organized way.

That got me thinking: at the most fundamental level, how is this system different from ML's normal system? I think the answer is, ML forces all unions to be completely disjoint. You can say
datatype myunion = A of int | B of int
but any value of type myunion can only ever be one of the two possibilities, not both simultaneously. At its core, I think XDuce's regular expression types don't force that on you: you might well have a single value that is an instance of both alternatives in a union type, and that's where both the power and the complication come in.

So here's what I'm wondering now. When programming in ML, I've never actively noticed myself contorting programs so that I can make a particular union type disjoint when it wouldn't naturally be that way, but is that just because I've never thought of that as an option? Are lots of ML programs more complicated than they ought to be because of this restriction?

I think every language has its set of "lurking horrors," limitations or design decisions that force you to write particular programs in unnatural ways. I call them "lurking" because you don't necessarily realize you're being limited while you're coding; usually you've got to have some sort of epiphany, like learning a new language that doesn't have the same restriction, to realize that you've been tied up the whole time. Lack of higher-order functions is a big example of a lurking horror; type systems without polymorphism or option types are another. Are disjoint unions a third?

0 Comments:

Post a Comment

<< Home