Keep Working, Worker Bee!


What's wrong with computer science?

At U of C, computer science is a part of the Physical Sciences Division (along with math; hey, I don't make the org charts). It turns out that twice a year, a committee comprising about thirty business and academic leaders from around the country and the world comes in to visit; the PSD puts on several presentations saying what it's doing and what it plans to do, and the committee gives it advice. The most recent meeting of this committee was yesterday, and as part of it I was asked to be on a discussion panel about my research experience.

I told them about PLaneT; I told them about the Freshman Dating Study (which incidentally just had some of its results published in a journal paper); I told them about going to Estonia and Aachen and so on. It was fun. After I and the rest of the panel had given our little introductions, the visiting committee asked us questions for about an hour. Most of them were about being a woman in the sciences, which I'm not really very qualified to answer, but I did get asked one really good question: what, a committee member asked me, was the worst part of the computer science department?

I had to take a second to think about that. What I ended up saying was that there is a tremendous amount of need in all the sciences for the expertise we've got, and we're letting a lot of it go unmet. We actually do a lot of outreach — the scientific computing people, the computational linguistics people, the whole bioinformatics group — but these efforts are drops in the bucket. Everyone in every scientific discipline needs to know enough about programming to make computers work for them, and we in computer science could be making sure that all those people learn how to do things the right way. As it is they learn from each other, which sounds to me like a recipe for propagating suboptimal programming techniques and depending on old and bad technologies.


Bruce Tate's Technologies to Watch article at O'Reilly's OnJava site points out four new technologies that he says may threaten Java's dominance. They are:
  1. Dynamic Languages
  2. Continuation Servers
  3. Convention Over Configuration
  4. Metaprogramming

Interesting ... Scheme has been at the center of three of these four new technologies since their beginnings.


Are disjoint unions a lurking horror of typed programming languages?

I've been reading about the XDuce project recently for our PL reading group here at Chicago. XDuce is a domain-specific language for writing XML transformations in a totally type-safe way: not only do XML values need to match their schemas, but your program needs to respect that schema too. For instance, if you're writing a program to process XML documents that look like

<A><B>[some string]</B><B>[some other string]</B> ...</A>

then the type system won't let you try to select a tag named <C>...<C> out of it.

To make that happen, XDuce introduces a really neat idea called regular-expression types. Regular expression types are just what they sound like: in addition to ground types like string, you also get sequencing, Kleene-star, alternation, and so on. (Also, because it's XML, you get "tag types": A[ty] is a tag named A whose contents have the given type ty.) So a possible type for the XML documents above might be A[B[string]*]. Pretty cool. But the really cool part is that they define the subtyping relation to be the same as the subset relation: each type describes a set of trees, and if you have two types t1 and t2, and t1 describes a strict subset of t2 then t1 <: t2. For instance,

A[B[string],B[string],B[string]] <: A[B[string]*]

A[B[string]*] <: A[any]

This is given in the XDuce literature purely as a way of making a type system that would work with XML data, and I think it's obviously useful for that purpose. During our discussion, though, the reading group started speculating as to whether it would be useful for general programming too. The MLians in the group (i.e. everyone but me) argued that it wouldn't be, since ML's tagged union types give you the same power in a more organized way.

That got me thinking: at the most fundamental level, how is this system different from ML's normal system? I think the answer is, ML forces all unions to be completely disjoint. You can say
datatype myunion = A of int | B of int
but any value of type myunion can only ever be one of the two possibilities, not both simultaneously. At its core, I think XDuce's regular expression types don't force that on you: you might well have a single value that is an instance of both alternatives in a union type, and that's where both the power and the complication come in.

So here's what I'm wondering now. When programming in ML, I've never actively noticed myself contorting programs so that I can make a particular union type disjoint when it wouldn't naturally be that way, but is that just because I've never thought of that as an option? Are lots of ML programs more complicated than they ought to be because of this restriction?

I think every language has its set of "lurking horrors," limitations or design decisions that force you to write particular programs in unnatural ways. I call them "lurking" because you don't necessarily realize you're being limited while you're coding; usually you've got to have some sort of epiphany, like learning a new language that doesn't have the same restriction, to realize that you've been tied up the whole time. Lack of higher-order functions is a big example of a lurking horror; type systems without polymorphism or option types are another. Are disjoint unions a third?


Between his talk here at U of C and his invited talk at ICFP this year, Bob Harper has convinced me I ought to look into Twelf: even us "regular folks" who aren't interested making up logics and seeing what neat theorems a program can deduce unaided (ivory tower!) but instead just want to prove regular old theorems about properties of programming languages (practically blue-collar!) can derive benefit from it. So I went to the web site, downloaded and installed the software and started in on Andrew Appel's tutorial.

First impression: cool! I'm not at all skilled at it yet — I've been able to prove that (A or B) implies (B or A), (A implies B implies C) implies ((A and B) implies C) and similar teensy little logical theorems but nothing bigger. Nonetheless, when I'm writing theorems in Twelf I feel like I'm in a cool place where I'm doing programming (easy!) but the result is math (hard!). Of course that's not really fair, since the programming is very mathematical and the proofs very easy; we'll see how I feel when I get to the point where I'm using it to prove type-soundness theorems and the like.

(Oh, and note to the myriad Twelf implementors that undoubtedly read this page regularly: the README doesn't tell you that if you've only got SML/NJ installed you've got to say ./smlnj/ rather than make. I was about to give up when I noticed that smlnj/ directory in the distribution archive.)