Keep Working, Worker Bee!: 2005.06

6.30.2005

Truly frustrating day today — I spent nearly the whole day trying to get a particular programming thing for the contest working, and couldn't get it working. There is lots and lots of technology in the way at the moment. No fun.

I also started, over the course of yesterday and today, writing up the multilanguage semantics paper after finally deciding that no new actual research was going to be done. What I've got so far is a pretty good picture of how to model the interactions between multiple-language systems, but it is lacking concrete examples of non-lambda-calculus-based languages and it has no large language system worked out. I had initially looked into modelling one of the two popular Java calculi (Featherweight and Classic) connected to a lambda-calculus-based system, but I quickly discovered that even though they are simple for Java models, they are not in any way simple enough that I could actually say interesting things about the embedding without huge extraneous gobs of gibberish about the Java model itself, and that's not something I really care to do. I was also recently pointed to the calculi by Abadi and Cardelli, but I haven't been able to figure them out yet and time is probably a bit too tight if I'm going to submit to POPL.

I definitely want to flesh out an example with OO, and since the author of ProfessorJ is here and recently got an OOPSLA paper about ProfJ/Scheme interoperability accepted, it seems like a perfect fit to try to prove something like soundness for that. But of course I can't do that before the POPL deadline. I think what I've already got now is a good story, and complete in itself, and worth reporting as is, but since I can see all these next steps I can't help but seeing it as a list of things that aren't done yet instead of a list of things I've already done. Oh well, we'll see what they say (if I can even get a draft ready in time!).

6.28.2005

PLaneT is free now. It's no secret that PLaneT's submission mechanism totally blows, and fortunately for me others think this too and decided to help me out. Thanks to them, people may get their packages updated. Thanks.

I spent an inordinate amount of time writing and rewriting the first section of my POPL paper. It is very very hard to get these ideas expressed; I hope they make sense three weeks from now. When I wasn't doing that, I was working on contest stuff and class stuff. Tomorrow we'll be doing the 3D lab.

6.27.2005

Contest phase 1 is over. Running the ICFP contest has been a blast so far; I recommend that you all do it. I was very worried everyone would hate the game, or we'd get no submissions, or whatever, but nothing seriously bad seems to have happened other than (1) a team discovering a degenerate strategy as a result of a dumb last-minute addition we made to the game board and (2) a few tempers flaring over some OS-specific problems (there's a nasty intermittent bug with pipes in Windows that people were getting angry about, apparently).

Now there's just the huge, daunting task of evaluating the submissions, coupled with the huger, daunting-er task of doing this whole thing again, faster.

Bob Harper gave a talk today, and I missed it due to class. He was talking about Twelf; I really wish I'd seen it, Twelf looks like it might be immediately applicable to my work but I can't figure out exactly how it works from glancing over the tutorial.

6.25.2005

You know what website I really hate? Slashdot.

As you may be aware, we launched the ICFP contest on Friday morning. Before we did, we submitted a story to Slashdot announcing that we'd done so. Very relevant to Slashdot, right? I mean, they've run a story about the contest every year for the past three or four years at least, and for last year's they actually ran three stories (maybe more). So we figured, people will be expecting to hear about it from Slashdot, we'll submit a story about it. We even gave away the contest overview a little bit early to them so that even if it took a while to get posted the information would still be relevant when it was.

Then we waited. 9:00 AM rolled around, and no slashdot. No worries, they're pretty slow with every story anyway, though it sure would've been good to get it on that page Friday morning so all the start-the-workday-with-Slashdot folks would see it (and maybe participate). But whatever, we figured, it'll get posted soon enough.

And we waited some more. We went to lunch. We came back. But every time we checked, the story was still 'pending'. At around 3:00PM or so, right as the workday window was closing, we decided to we should submit something else, shorter and punchier maybe. So we submit another story that points people to the contest, this one carefully edited to fit on the slashdot frontpage, no problem. And we waited some more. Nothing.

Saturday morning, both of us who had submitted stories checked, and both had been rejected during the night. Oh well, we decide, Slashdot must just not be interested in the ICFP contest this year. Its their website, they make the rules.

So now, today, halfway through the contest, they run some random guy's link to the contest web page that's basically the same thing we submitted. Thanks a whole friggin' lot, Slashdot. I guess you do want to run stories about the contest after all, but only after enough time has passed that there's no way anyone reading the article could actually participate in it if they wanted to. You guys are awesome.

Yes I'm grumpy about it.

6.24.2005

Okay, the programming contest is up now. Fun! People all over the world are now (possibly) chuckling at all the stupid little jokes we wrote in and (probably) puzzling their way through the gawdawful rules text that we couldn't figure out how to get rid of. On the plus side, it's now 13 hours in and the worst thing that has happened so far is that there've been some platform-specific problems with the support code we provided and we accidentally let slip that that game had to do with cops and robbers a bit early. Considering that we've been thinking about the game in more or less this form since, um, at least January or so, it's pretty good that we didn't let more than that out. It seems to be pretty well-received, though we're going to have to wait for the submissions to roll in and particularly for the second phase to really see whether we succeeded or not. But for right now, I'm just pleased that things have gone so smoothly — I have to say I was more or less imagining this whole ordeal somehow managing to set all our offices on fire and burn down our building.

In the midst of the contest going up, I also had to teach one of the key HtDP lectures: the dreaded "lists" lecture. Introducing kids to the notion of recursive data structures and recursive functions all in one fell swoop and making it seem natural is a pretty tough trick, and I was definitely not sure I could do it. I eased them into it using a trick I got from Matthew and Robby: you first give a "brain-teaser" of how to store two numbers in a box structure that has a left and a right; then you ask how to store four numbers using boxes (just store two copies of your solution for two numbers); then 8. 16, and so on. Then you ask how to store 5 numbers in a box (make a new box with a number on the left and your 4-number box on the right), then how to do six (repeat the same process), then 7 and so on. Then ask how to store one number, and relatedly how to store zero numbers. At that point, you've developed the idea behind linked lists on the board.

This strategy seemed to work as well as I could've hoped; some people looked kind of blown away, others really took to it, and some people thought it was kind of obvious, but everyone seemed to be able to grasp cons lists by the end of the lecture. We'll see how well they did it when the homeworks come in, of course, but people were answering my questions in class, so that's something.

6.23.2005

What did I do today? You'll find out all about it at 9:00AM tomorrow when the contest starts. As for now, I need to finish up my lecture notes for tomorrow's class - I'm teaching about recursively-defined data and the functions ~~who love them~~ that process them, which is a bit of a doozy considering the relative cakewalk the class has been up until now. I have had very little time to prepare the lecture, though, so let's hope I can make it all sound reasonable tomorrow.

Y'all had better like this damn contest :)

6.22.2005

Fun with exceptions!

How does your favorite foreign function interface deal with exceptions? I'm thinking particularly of the situation where you've got language A, let's say it's Scheme for the sake of argument, and it calls a language B (say Java) with a function argument. Then your language B in the midst of processing calls that function, and that function raises a (Scheme) exception. So what does the FFI do?

The way I see it there are several sane choices.

if any exception would have to "cross FFI lines" — i.e., from Scheme into Java or Java into Scheme — the program just dies. Moby, the language being developed by Kathleen Fisher and John Reppy, does it this way.

say that Scheme exceptions ignore the presence of a foreign function on the stack and propagate up to the next Scheme handler unconditionally. I believe that the PLT Scheme FFI does it this way.

if a Scheme exception reaches an FFI border, the FFI converts it into an "equivalent" Java exception and raises that on the Java side, and similarly if a Java exception reaches the border it is converted into a Scheme exception. This is how SML.NET works.

Are there any other reasonable strategies here I'm missing? There's also the whole complication that if one of your languages doesn't support exceptions (say you're doing a Scheme/C FFI) that you might want to map Scheme exceptions into C error status codes and certain C values into Scheme exceptions.

Anyway. Taught a class today, and did my first lab. It's really hard to come up with four hours' worth of material, but it worked out okay. The lab was pretty fun, and I was pleased that people seemed to be getting the material. It was a simple exercise, but still I was surprised and glad to see the first finishers do the whole thing in just 30 minutes or so.

6.21.2005

Preparing for class took me an amazing amount of time today. I can see why professors complain about teaching — for me, anyway, it seems like I don't have time to do anything else. At the moment the programming contest is only a few days away and the POPL deadline isn't that much further off, so it seems totally extravagant to be planning lectures, but what can I do? It's what they're paying me for.

6.20.2005

So, first day of class today. I was very surprised to discover that there were five women and one man in my class — I don't think there were five woment total in the 30-or-so-person intro class I took back in the day, so it's weird that they were the vast majority. Also, lots of econ majors, but I guess that's to be expected here. Anyway, as to how it went: meh. It's harder when you're on the other side of the blackboard, and though I've thought a lot about what I like in teachers I'm finding that it's really hard to actually make it happen. On my performance: I think I failed to thrill them, and it was like pulling teeth to get anyone to respond when I asked questions, but I don't think I was was super-boring either. The "teaching the experienced" bit turned out to be a basically moot point, since only one person had any serious experience and she didn't seem to have any problem with learning Scheme. I talked for a bit about how when you take English, you're not learning how to speak English, you're learning how to communicate effectively, and in the same sense this class was about teaching you to program effectively regardless of the language you choose to program in. People seemed to respond to that, so I think I'll use it again.

Wednesday is the next class, and due to an unfortunate schedule the (two-hour) lab follows the (two-hour) class after a 30-minute break. I have to keep people absorbing computer science for essentially 4 hours straight. Heaven help me.

By the way, the contest proceeds apace. 9:00AM on Friday I'll have a little heart attack and then I'll do a little dance.

6.18.2005

Ever have one of those days where you start coding something in the morning, and by the time you look around you realize it's after dark? Today was a gorgeous day in Chicago, and at about 8AM I started trying to figure out how to model various exception-handling strategies in a multi-language context; by the time I was done I had four systems in PLT Redex and it was nine o'clock at night. Research is fun! Much more fun than, say, sunlight or social interaction.

6.17.2005

Hard to believe it's Friday. I spent today divided pretty evenly among: 1) secret stuff involving the contest, 2) preparing for my class, and 3) taking stock and a few next directions for the interoperability paper. The secret stuff I obviously can't talk about, and I'd like to hold off on discussing the interoperability stuff for at least a while, but I do have something to say about teaching.

I'm going to be teaching my first intro CS class starting on Monday, and I'm going to do it using the How to Design Programs teaching methodology. This is my first time teaching it myself, but I've been involved in classes that taught that way many times, and while it's amazing how effective it is (particularly since the book lulls you into thinking it's kids' stuff the whole way through) one troubling problem always crops up. Intro classes always have a few people who have done a lot of programming in high school or elsewhere and are taking the intro class even though in their view they don't need to because they feel like they know whatever introductory material you could possibly be teaching already. Maybe they've even got AP tests or chess-playing programs or something that serve as evidence. These people tend to see the HtDP curriculum as a joke - "why are we programming in a goofy language like Scheme, where you can't even assign values to variables and there are no loops and you don't get printf and you've got all these silly parentheses? I wrote a chess player once, I'm obviously a computer genius who ought to be trusted the raw power of do while loops and languages that don't baby you by checking to see if you're writing outside array boundaries!"

The traditional way to deal with these students in the classes I've observed has been essentially to ignore them. The teacher knows best, and students ought to understand that their professors do in fact realize that there exist languages with for loops in them and chose against those languages for a good reason. There are a couple of problems with this tack in my view. First, there's a deeply engrained notion in just about everyone that if you're learning how to program and you're learning language X then you must necessarily be learning how to program IN language X and that any skills you pick up in the process don't apply to languages Y and Z, so it's a waste of time to learn any programming skills in a language you won't be using to write real programs. I've heard this opinion expressed in no uncertain terms by many people with PhDs in computer science, and in fact at my undergraduate university the electrical engineering department used to offer their own intro to programming class specifically because they thought it was a waste of time for their students to learn any language other than C. If it's a majority opinion among PhDs that you can't learn anything useful about programming by studying Scheme, surely we can't blame freshmen for thinking it too. Second, when people who have had a taste of a C-like language see Scheme as it's presented in HtDP, they can't really help but notice that you can't really do anything in it. Sure, we teachers know that they can do everything they need to for us to be able to teach about natural recursion or whatever, but they're thinking "video games! graphics! inline assembly!" and, well, none of the teaching languages anywhere in HtDP let you make any kind of practical program of more than a schoolwork interest. If we don't even tell people that these things are possible, how will they know? And can we blame them for not thinking they're learning anything useful?

I'm not sure exactly what I'd propose to do about it. I think I may have a little spiel at some point in the first class about how computer languages aren't like human languages in some ways, and that the skills you learn in one computer language really do transfer over to others to a much greater degree. I'm also going to try as hard as I can to make homeworks and labs that have students doing things they can imagine really being useful and/or cool, like the 3D lab I did last year that I'm pretty proud of (that was taught in the fourth week of class to people who had no prior programming experience, and most people got the whole thing working!). But I feel like that may not be enough. When I teach this class again in the fall, I'm thinking about making Practical Common Lisp an optional supplementary text and giving people extra credit for doing the practical sections of that book in Scheme (of course I'd have to help them with that, but that's fine with me). That might do it.

Has anyone out there taught a class and figured out a solution to this problem? Or been a student in a class where the teacher solved it particularly well? I'd like to hear about it if so.

6.16.2005

Hi Lambda the Ultimaters! Now that PLDI's over, I'm afraid you're going to have to hear me ramble on dully about my regular job. And since it's mostly about the programming contest you're not even going to get the juicy details. We aim to please, but we miss so often that observers question whether we're even really trying.

I finally explained to a third party what my recent semantics work has been all about. Kathy and Scott from the PLT are in town and Kathy's been working in the same general area as my semantics, so I got out the trusty whiteboard and gamma'ed it up with reduction rules and so on and showed them the overview of everything I've finished up to this point. I was encouraged; they both followed me and seemed to think it was interesting, which gives me hope that the POPL folks will think the same thing. Also, hopefully Kathy will want to take her OOPSLA paper about Scheme and Java and try to prove it sound using our framework. That'd be awesome.

Otherwise: we spent a pretty crazy amount of time working on a new iteration of the LiveCD for the contest, which would probably have been released a few minutes ago were it not for the fact that I forgot to get a list of changed packages and the guy who is in possession of said list currently on a bus headed back up to his Internetless apartment. Oh well, y'all'll live if it doesn't come out until tomorrow.

And naturally there was a lot of crazy hacking and various other work done that I can't really tell you about. Suffice it to say that the bits were a-flyin' at the Worker Bee! headquarters today. We made some progress, but the programming contest starts in 7 and a half days and that's incredibly nerve-wracking to me.

Also planned for my intro to computing class. They're actually going to let me teach over the summer, and it looks like they're even going to give me some students to teach to. I've got seven students enrolled in my class, everywhere from a high-school student to some college juniors. I've never taught a full class solo before, and that's pretty nerve-wracking too.

Man, I will be about 80 times more relaxed by the end of July.

6.15.2005

It's the final day of the Worker Bee! PLDI Roundup! Today had a lot of interesting talks, mainly I think because it focused more on the PLD part and less of the I part. Good stuff.

Invited Talk

The Transactional Manifesto: Software Engineering and Non-Blocking Synchronization
By Maurice Herlihy

As people have been saying for years now, the days of Moore's Law translating directly into faster execution of the same programs are ending. It's not that Moore's Law is over — it doesn't appear to be — it's that the era when cramming more transistors onto a chip actually makes a sequential computation take less time is over, meaning that all the major chip manufacturers are moving towards multicore systems that only make your programs faster if you can keep multiple independent processors busy. However, current parallelism techniques (by which the speaker meant pthreads) are a terrible solution to the concurrency problem and can't scale up to a computing environment in which everyone needs to write multithreaded programs all the time. For instance, he gave a challenge: write a library for a doubly-linked list that allows insertion and deletion from either end, but that allows inserts and deletes from opposite ends to happen concurrently unless that would lead to data corruption (e.g., the queue only has one element or something). After going through all the problems, he concluded that a nice elegant solution to this problem would be a publishable result. The solution to a problem this simple being a publishable result is not a good sign!

He had a vision for overcoming these obstacles, borrowed from the databases community: transactions. The idea is that you send a function that's supposed to be executed atomically to a transaction manager, which executes it and if a problem arises, rolls it back and tries it again. He's implemented a C# library that does this and says that while it's got a lot of problems, it allowed an undergraduate intern to code up a working concurrent red-black-tree implementation in a week — he said he's seen a PhD thesis a pthreads solution to the same problem.

I'm pretty happy with the CML-style concurrency in PLT Scheme, but that's actually more about addressing "concurrency" (= running multiple threads because it models your problem) rather than "parallelism" (= running multiple threads because you want your program to run faster). It seems like a very interesting idea, and I'd be interested to see how far the transactional-memory paradigm can go.

Session 9: Domain-Specific Tools

Programming by Sketching for Bitstreaming Programs
As Simon Peyton-Jones said during Q&A, this seems a bit like magic. You want to make a program that processes a bitstream, taking some input stream and producing an output, for instance dropping every third bit from the input. As it turns out there are several different algorithms for doing this that vary widely in efficiency, and even though what you want can be described concisely the algorithm is far from trivial to implement. So what these folks have done is found a way that you can describe what you want ("drop every third bit") and it'll automatically produce some algorithm for doing it; you can also add in "sketches" that guide how it chooses an algorithm, partial definitions of the algorithm that guide code generation and allow you to tune the output algorithm. For the drop-third program they found that users of their sketching tool made twice as fast algorithms as people writing in C, even when both groups were encouraged to tweak their solutions as much as they wanted to make performance better. It works because bitstream algorithms generally have the property that the goals are much easier to describe than they are to derive efficient algorithms for, but this technique might be applicable to other domains. (This paper shared the best paper award with the one I mentioned on Monday.)

PADS: A domain-specific language for processing ad hoc data
You've got log files in all kinds of ad hoc formats, and if you want to read them programmatically the typical solution is to hack up a custom thing in perl or something. This is a brittle solution, especially if you want to do any kind of error handling. PADS is a language for writing parsers for ad hoc data like this where you describe the file format very naturally -- "there's the string 'FILE' followed by whitespace followed by a number between 0 and 100 followed by ...." and PADS gives you back not just a parser with robust error handling but also a whole suite of fancy tools like an XML converter, statistical analyzer, and so on. The speaker said that their goal was to make the suite of free tools so compelling that people would be crazy not to use PADS if they're doing anything that has to do with ad-hoc data formats.

Composing Security Policies with Polymer
I'm not sure what to think of this. The idea is, you'd rather have security policies be both (a) centralized and (b) composable than the alternative. So, they built a framework of security policies that compose with each other: you have "security events" in your code and then a central security manager that says what to do about it. I could go on, but that's the gist. Here's the thing: security is very tricky, and I'm really uncomfortable using somebody else's security code for my application. I don't know, maybe it's totally benign and really works great, but the road to security breaches is paved with clever ideas about how to avoid having to worry about security.

Formal Loop Merging for Signal Transforms
The actual domain of this work was making a compiler optimization for a specialized signal-processing language so that it could merge loops that go over the same dataset multiple times. I don't know anything about signal processing, so that aspect of the talk went right over my head. However, the way they did it was illuminating: they observed that traditional approaches tried to special-case some particular construct in the source language into a particular object-code result, but that there's a better way to do it. What they did was extended one of the intermediate representations they used so that it explicitly tracked where loops occured, and then made the optimization a very very simple rewrite on that IR. Just that alone was enough to optimize all of the cases those special cases had handled, and several more besides. The moral of the story is that sometimes having the right information in your intermediate representation is very powerful.

Okay, that was PLDI. It was a very good conference, and I met a bunch of cool people and found a bunch of good leads for my current work. And the coveted Worker Bee! Paper-I-Thought-Was-Cool Prize goes to: Garbage Collection Without Paging! The best implementation papers at this conference weren't ones that blew your mind with things you couldn't imagine, but ones that took things that are easy to imagine and proved that they actually work. This one showed that if you puncture the barrier that hides real memory from the memory manager, you have an opportunity to avoid all the performance penalties the OS would otherwise incur. The larger picture it paints is that while abstractions like virtual memory can help us, sometimes they hurt us too by not letting us deal with the messy cases in the right way. In the case of garbage collectors, they hurt us very very much.

Next one of these I do will be for ICFP, I suppose.

6.14.2005

It's Worker Bee! PLDI Roundup, Day 2! My attendence today was less than perfect, so my report here isn't going to be all that complete, but on the plus side they served soft pretzels so that's something.

Session 5: Register Allocation

Register Allocation for Software Pipelined Multi-Dimensional Loops
Differential Register Allocation
Demystifying On-the-fly Spill Code

I missed this session, unfortunately, when I had a flash of insight on how to solve a particular theory problem and spent over an hour writing things on the backs of conference call-for-paper flyers proving a theorem over and over. So, sorry, but if you're trying to allocate registers you're gonna have to look to another weblog.

Session 6: Instrumentation and Testing

Pin: Building Customized Program Analysis Tools with Dynamic Instrumentation
Those wacky Intel folks were out in force today. This was a tool that allows you to write other tools that monitor programs. I have to admit, most of the talks today were very technical in areas I don't really care about all that much, so I didn't really follow them as well as I did yesterday. A lot of people during the Q&A said they really liked Pin, so I guess it's probably awesome.

TraceBack: First Fault Diagnosis by Reconstruction of Distributed Control Flow
This was kind of a compliment to the statistical bug finding talk from yesterday. These people annotate production code with enough information to reconstruct just the control flow that lead to the program's current state when it crashes (though not the values present at each step of computation up to that point). They have a fancy graphical debugger, et cetera along with it it, and they say it works well. During Q&A the speaker indicated that people tend to think not having the values along with control flow will make the reported information useless, but in practice people tend not to need it.

DART: Directed Automated Random Testing
So you want to unit-test some code, but that's a pain. Can the computer do it for you? Yes it can. Say some function takes only "flat" values as inputs (i.e. values that have no particular structure and that you can just pull out of a bag without too many problems). Pick some of them at random and run the function, also watching to see how they were examined by the program (did the function test to see if the number you gave it was greater than 0? That it was less than some global? Et cetera). For every constraint you gather this way, try to find inputs that violate the constraint and see what the program does for those. In that manner you can (at least try to) get complete code coverage rapidly and if there are bugs in any branch you'll find them.

This talk gave me the eeriest feeling that I'd heard it before somewhere, but I couldn't figure out where.

During Q&A the speaker clarified that the program only really works for functions whose inputs are easy to generate randomly. In particular, it doesn't do very well with function pointers because it's hard to pluck those out of the ether.

Session 7: Network Processing

Shangri-La: Achieving High Performance from Compiled Network Applications while Enabling Ease of Programming
Intel again. They introduced a DSL for programming network processors and talked about how to compile that DSL really well.

Automatically Partitioning Packet Processing Applications for Pipelined Architectures
Another talk about parallel processors. This one was kind of the opposite of the first: they took programs that had an explicit processing loop written in standard C and figured out how to chop it into something more parallelizable.

Both these talks are the sort of thing I find it hard to get in to. Sorry if these reports are a bit cursory, but that's the reason.

Programming Ad-Hoc Networks of Mobile and Resource-Constrained Devices
Okay, there are bunches of networkable devices these days: cell phones, PDAs, Game Boys, whatever. You'd like for them all to work together to solve problems, but to do that you need to be able to write programs that will automatically distribute themselves to the nodes on the network that have the resources, even in the face of geographic and power constraints. This work gives some linguistic tools to do that. The coolest part was the example applications: a game of Pac-Man where you and your friends are Pac-Man and the ghosts and you run around an actual physical board with your Palm Pilots tracking which dots you've eaten and whether you've been captured, and the idea of finding your friend at the airport by having every camera-phone simultaneously take a picture and then having some powerful processor do face recognition on each image taken this way. Totally awesome!

Threads

Threads Cannot be Implemented as a Library
Somebody set up us the Boehm! Hans Boehm talked about how you can't design a thread system if the underlying language specification doesn't support it. As an example he "picked on" the pthreads library and concluded that "it is impossible to write portable pthreads code" due to the fact that memory locations are integral to the thread model but undefined by the C standard and various related reasons. Convincing, interesting talk. Read the paper.

Mitosis Compiler: an Infrastructure for Speculative Threading Based on Pre-Computation Slices
This looked like a compiler for chopping up sequential programs into parallel threadable chunks. I skipped out on this talk, unfortunately, so I can't tell you how it went.

6.13.2005

It's the Worker Bee! PLDI Roundup, Day 1! (Yes, the exclamation point is part of the title.) Quick summaries of the PLDI talks.

Session 1: Bug Detection

A Serializability Violation Detector for Shared-Memory Server Programs
The basic idea here is to detect places where multithreaded code should have atomic regions, though it may or may not actually have them, so that a serializability detector can come along and find bugs. All this is done on the fly — serialization violations are only detected post-mortem, but hopefully quick enough to roll back the changes the faulty execution made and then restart that section of code with a coarser thread schedule (or at least alert a human operator). They do this by forming what they call CU's, for computational units, which are sections of code that need to be atomic because (e.g.) they read some data in from a shared variable and use it without reading it again. It's heuristic and based on guessing standard progamming patterns, but it still seemed pretty cool. Someone asked whether the standard programming patterns would be violated because compilers emit optimized and possibly un-paradigmatic code; the speaker said they run on binaries rather than source code already and it doesn't seem to be a problem.

Scalable Statistical Bug Isolation
Take a program, make up a whole bunch of properties that you might want to test (e.g., x < y, f() was called, etc), and scatter them throughout your binary as you compile it. Then when you distibute the code to your users, you get feedback from them automatically in the form of whether these predicates were true or not. Even though each run is only giving you a tiny sliver of information, you can aggregate it all and analyze it statistically to detect bugs -- just find the predicate most likely to predict a crash and from there you can start searching for your culprit. This was very cool research, made more cool by the fact that they found several bugs in real software with it. A questioner asked whether a single predicate was really the best way to find a bug; the speaker responded that in reality they get clusters of predicates that are likely to predict crashes and that in practice they use those.

VYRD: VerifYing Concurrent Programs by Runtime Refinement-Violation Detection
I was out in the hallway for this one, so I can't really say anything about it.

Path Slicing
There are lots of tools that will help you figure out problems given a program trace, and there are lots of tools that will come up with possible program traces that could cause problems. However, those traces might be infeasible, or they might be overly complicated. These guys found a way to improve program traces by eliminating unnecessary junk and possibly refining infeasible traces by finding related but feasible ones.

Session 2: Function Interfaces

Jungloid Mining: Helping to Navigate the API Jungle
I'm in Java and I've got a String object and I want an ParseTree object from it. How do I use the library to make that happen? Can't I just, you know, push a button and make the computer figure out how to get from here to there?

Yes. Yes I can.

This was so awesome.

Checking Type Safety of Foreign Function Calls
I was really interested in this talk. They took the OCaml-to-C foreign function interface, wherein OCaml provides everything as a (mostly) opaque lump into C, and built an inferrer that verifies that the C code does the right thing with those lumps. It was pretty cool, and talking with the speaker after the session I realized that we were looking into very related things. He gave me a really good tip on a paper to read, too.

Session 3: Types

Essential Language Support for Generic Programming
C++ lets you do some "generic programming" by means of template hackery. ML and Haskell also have some stuff like that, but the authors of this paper wanted some stuff from the ML solution, some stuff from the Haskell solution, and they wanted the whole thing to at least look like the C++ solution. So they invented F^G and threw in their laundry list. Someone sitting close to me pointed out that specializing code based solely on the type doesn't work; integers, for instance, form a monoid under both (0,+) and (1,*) and if you've got some Monoid<T> generic object you can only define one or the other.

Semantic Type Qualifiers
You'd like your programming language's type system to be extensible in the sense that you can add your own statically-verifyable constraints and the type checker will make sure they're true. There are many approaches to this already; these guys made another one that seems like it's got a particularly nice syntax and also actually relates the new "types" the user makes up with the program's dynamic semantics, which is cool.

Permission Based Ownership: Encasulating State in Higher-Order Typed Languages
So, if your language has a module system that hides a component's implementation with existential types, your code is encapsulated, right? Wrong! Consider (in Java-ey pseudocode):


class A {
  private int[] ints;
  public int[] bad() { return ints; }
  public int[] good() { return copy(ints); }
}

Since clients could mutate the array that bad() returns to them, they could change A's internal state (even though A objects can only be ascribed type A). The good() function has the exact same type but doesn't suffer from the same problem. Crazy, huh? The paper shows an extension to F_ω that adds ownership types to catch these problems; you say "the world can access the public interface and the public interface can access or update the private implementation, but the world can't update the private implementation" and the type checker makes sure that doesn't ever happen.

The notion that existential types didn't cover the entire problem was a real eye-opener for me, and the talk was worth it for that insight alone.

Session 4: Optimization

Code Placement for Improving Branch Placement Accuracy
The observation is this: branch prediction is very important to the performance of modern CPUs, and said modern CPUs implement branch prediction by storing statistics on each jump instruction in a table somewhere. Alas, the table has limited buckets, so some hashing has to go on, and that hashing is typically on-chip branch predictors suffer from "destructive interference" where one branch is usually taken and some other unrelated branch that happens to get stored in the same bucket is usually not taken. This paper suggests arranging your jumps so that the ones that get stored in the same buckets tend to have the same behavior, and interference is constructive rather than destructive. Great idea, but (a) the microprocessor people aren't telling how they choose those buckets and (b) you can't just arbitrarily move jump instructions around. So the author spent a tremendous amount of time trying to figure out how branch prediction worked on the CPUs he was working with, and got something that gives you a modest performance boost. I can't help but wonder whether all the work will be lost next time a chip comes out.

Optimizing AspectJ
They wanted to make AspectJ faster. They did. I find it hard to get worked up about it.

Automatic Pool Allocation
I really regret that this talk was the one I picked to ignore the first half of while I worked out a problem with the multilanguage stuff that's been bothering me, because from the second half it looked really cool. Anyway, they figured out a way to analyze the program to improve data locality in the heap objects it creates using points-to analysis. Sounds awesome, and it won the best paper award. Question asked: what should programmers do to make sure they can get the benefits of this stuff? Answer: don't use a custom allocator, don't do fancy pointer arithmetic.

Garbage Collection Without Paging
Make the virtual-memory agent in your kernel notify your garbage collector when it wants to page a particular page out, and let your garbage collector either (1) free some memory up in response or (2) nominate an alternate page instead, and you can get a garbage collection algorithm that essentially never pages memory in during collection and thus runs crazily faster than ordinary garbage collectors when memory is tight. Very neat.

Overall an auspicious first day with lots of interesting talks and very little that seemed completely bizarre, incomprehensible, or silly. I am a little put off by the emphasis on quantitative performance gains, but, as Ron Burgundy would say, "When in Rome ..."

6.12.2005

Another Worker Bee! weekend update, this time much shorter: I submitted the Scheme Workshop paper, the revision of which turned out to be not nearly as bad as I'd thought it would be. I also built a codebook for the speed-daters out in HTML -- it's just essentially an XML transformation on the source files that puts the name of the question in the database, its text, its type, and the possible answers. Ought to have been much easier than it was, but then again I knew I'd screwed up the question representation before we even started the study. We'll get it right for the real Topsl.

PLDI tomorrow!

6.11.2005

Special Worker Bee! Weekend Edition! I biked in along the lake-shore path from my Wrigleyville-area apartment to Hyde Park today to accomplish a couple tasks. (I did that bike commute somewhere between 1 and 5 times a week until the weather got really bad last year and as it's gotten warmer I've started up on it again.) We've been having something of a heat wave over the last couple weeks here in Chicago, so the path has been crowded like crazy. Today the season seemed to be conspiring against my commute at nearly every turn: there was a stiff headwind on the way down, not to mention a path full of pedestrians an casual cyclists who weren't really paying a lot of attention to where they were going clogging up the path. On the south side there was some kind of walk against sickle-cell anemia that blocked the entire path several times; I actually had to dodge off the path into the grass at one point to avoid a huge group of people who were just standing there in the middle of the road at one point. And once I got in, my normal route to my building was completely blocked off by (a) cops and (b) a nearly endless line of U of C students who were graduating. Hrmph. (Said graduating students also caused me trouble a few hours later when I was starving to death but couldn''t find a place to eat anywhere that wasn't packed to the gills with grads and their parents.)

Then work: I added one more table to the database the psychologists wanted. My website gathered lots of information about people before the event and after the event, but the psychologists actually had people fill out forms during the event on paper as well, and after they made an excel spreadsheet of the data they had me turn it into another table in the relational database. Not a big deal. I also made an editing pass on what is now the Scheme Workshop paper, basically fixing everything the ICFP editors specifically said they didn't like.

On the way home, more clogged path. Fortunately the stiff headwind on my way in became a strong tailwind on my way back, so I was booking it at like 20mph the whole way. The whole way, that is, except on the half of the path that is remotely close to anything scenic. On those stretches I went like 10mph stuck behind hordes of people who were evidently high or something. Public service announcement: when you are on a bike path, you are on a road. Would you wander around in the middle of a road without paying attention to your surroundings at all? No. So don't do it on a bike path either. Thanks.

The last bit of the return trip was the worst, because I live close enough to Wrigley Field that I'm affected by Cubs traffic (both vehicular and, more problematically, pedestrian). I got home right around the end of today's game. Cubs fans are even worse than pedestrians on the bike path; they walk in giant herds, there are many many more of them, and they're often drunk. Thankfully that was brief, and there were only a couple near misses in the block and a half of heavy Cubs-fans traffic I had to avoid.

6.10.2005

Though I generally don't pay much attention to this sort of thing, the end of the school year is here and it's now technically summer session. Whatever, it doesn't really affect me too much except that there are fewer people around and this year I'm teaching my first course. But this year several people are leaving, so it's a little more significant to me. Goodbye Dave! Goodbye (for the summer) Jon! Goodbye (in a few months) Jake and Don!

PLDI starts on Monday. The programming contest starts in two weeks. My class starts the following Monday. The second phase of the contest goes out two weeks from then, and then the POPL deadline is upon us. Not really much like the summers I used to have when I was in high school.

6.09.2005

I've been home sick today, and for whatever reason I couldn't stop thinking about PLT Scheme as a language for getting work done. I know that recently my research has been mostly theoretical, but I'm not actually much into that stuff — I'm much more interested in implementation. And as I've written more and more software in PLT Scheme, I think I've decided that I'm not a fan of its being untyped, as I think I've said; I really want static guarantees of correctness and I'm probably willing to deal with the occasional hacking around an artificial type shell game to get them (I'm thinking here of SML's system-interface libraries mostly). On the other hand, the ability to get complicated tree-structured data like XHTML without having to explicitly describe its shape first is useful.

But I'm not sure whether that's really appropriate for software that needs to be relied on; sure, unit tests are supposed to catch everything that type errors catch, but when you're coding at 100 miles an hour trying to get a feature done before tomorrow you'd really rather have it be caught before then. Also I can say from experience that taking a program written that way and releasing it to the world is an absolutely nerve-wracking experience no matter how much you've tested the code beforehand. (As it happens, I discovered exactly one type error on the speeddating site "in the wild"; in some cases I stuck something that wasn't supposed to be there onto an obscure error page that I'd forgotten to test. It affected a couple people before I fixed it.)

That said, I'm not really sure what the best alternative is. Whether it's purely an accident of history or not, the best web programming system I know of relies on Scheme and the typed alternatives seem pretty bad by comparison (with the exception of Haskell's WASH, though I like the Scheme server better). Maybe someone's doing something like the PLT server for OCaml -- that would be kind of exciting, actually, I really want to do some work with OCaml.

There are tremendous problems with web development in Scheme, some of which are shared by other functional languages (popular servers will cause resource-utilization issues eventually, but it's harder to get rid of them in languages that attempt to abstract all that stuff away than languages that put it in your face) and some of which aren't (PLT's database support still sucks and still needs to be improved dramatically --- another nerve-wracking part of launching the speed-dating site was that I was using a postgres driver I had no reason to trust with heavy load, though it turned out to work fine). But the killer language that would make me switch probably hasn't been invented yet. Maybe Links is what I want? I hope so.

6.08.2005

You might have noticed that I stopped posting things here sometime last week, despite my ostensible goal of posting something every work day. That's because I finally got tired of writing entries that basically said "I can't say anything about what I did today"; pretty much my whole academic experience for the past week has been programming contest related and there was nothing much to say.

But yesterday I actually did pick back up on some non-contest-related stuff, so I ought to mention that. My paper for ICFP that got rejected was semantics for Scheme, and it was rejected basically because the reviewers thought it wasn't interesting to the larger ICFP audience. That may or may not be true, but it will certainly be interesting to the Scheme community, so we're incorporating the reviewers' suggestions into our draft and sending the result to the Scheme Workshop. That's what I did for a fair bit of yesterday, along with moving a little closer in my understanding of the weirdness that is theoretical work on foreign-function interfaces (I'm about ready to make the claim that particular embeddings give you isorecursive types, which is kind of neat). As the POPL deadline approaches, I'm realizing that a big system with Java or something like that embedded may not be ready on time, but I think I'm going to write up what I've got anyway.

Also yesterday was some database monkey work that will find its way in some form into Topsl. It's great that so many people seem interested in it. I really want to get a real thing out the door; I think it has the potential to be huge.

6.01.2005

Much discussion was had on our mailing list that stemmed from misunderstandings of what our LiveCD was for and who could compete in what; in response I wrote up a FAQ and added it to the contest web page. Again, that's about the only public bit of knowledge I can report.

Yes, I do regret my poor timing in launching this particular writing endeavor precisely at the moment when I can't say anything about what I'm currently doing.

Keep Working, Worker Bee!