Keep Working, Worker Bee!

6.13.2005

It's the Worker Bee! PLDI Roundup, Day 1! (Yes, the exclamation point is part of the title.) Quick summaries of the PLDI talks.

Session 1: Bug Detection



A Serializability Violation Detector for Shared-Memory Server Programs
The basic idea here is to detect places where multithreaded code should have atomic regions, though it may or may not actually have them, so that a serializability detector can come along and find bugs. All this is done on the fly — serialization violations are only detected post-mortem, but hopefully quick enough to roll back the changes the faulty execution made and then restart that section of code with a coarser thread schedule (or at least alert a human operator). They do this by forming what they call CU's, for computational units, which are sections of code that need to be atomic because (e.g.) they read some data in from a shared variable and use it without reading it again. It's heuristic and based on guessing standard progamming patterns, but it still seemed pretty cool. Someone asked whether the standard programming patterns would be violated because compilers emit optimized and possibly un-paradigmatic code; the speaker said they run on binaries rather than source code already and it doesn't seem to be a problem.

Scalable Statistical Bug Isolation
Take a program, make up a whole bunch of properties that you might want to test (e.g., x < y, f() was called, etc), and scatter them throughout your binary as you compile it. Then when you distibute the code to your users, you get feedback from them automatically in the form of whether these predicates were true or not. Even though each run is only giving you a tiny sliver of information, you can aggregate it all and analyze it statistically to detect bugs -- just find the predicate most likely to predict a crash and from there you can start searching for your culprit. This was very cool research, made more cool by the fact that they found several bugs in real software with it. A questioner asked whether a single predicate was really the best way to find a bug; the speaker responded that in reality they get clusters of predicates that are likely to predict crashes and that in practice they use those.

VYRD: VerifYing Concurrent Programs by Runtime Refinement-Violation Detection
I was out in the hallway for this one, so I can't really say anything about it.

Path Slicing
There are lots of tools that will help you figure out problems given a program trace, and there are lots of tools that will come up with possible program traces that could cause problems. However, those traces might be infeasible, or they might be overly complicated. These guys found a way to improve program traces by eliminating unnecessary junk and possibly refining infeasible traces by finding related but feasible ones.

Session 2: Function Interfaces



Jungloid Mining: Helping to Navigate the API Jungle
I'm in Java and I've got a String object and I want an ParseTree object from it. How do I use the library to make that happen? Can't I just, you know, push a button and make the computer figure out how to get from here to there?

Yes. Yes I can.

This was so awesome.

Checking Type Safety of Foreign Function Calls
I was really interested in this talk. They took the OCaml-to-C foreign function interface, wherein OCaml provides everything as a (mostly) opaque lump into C, and built an inferrer that verifies that the C code does the right thing with those lumps. It was pretty cool, and talking with the speaker after the session I realized that we were looking into very related things. He gave me a really good tip on a paper to read, too.

Session 3: Types



Essential Language Support for Generic Programming
C++ lets you do some "generic programming" by means of template hackery. ML and Haskell also have some stuff like that, but the authors of this paper wanted some stuff from the ML solution, some stuff from the Haskell solution, and they wanted the whole thing to at least look like the C++ solution. So they invented FG and threw in their laundry list. Someone sitting close to me pointed out that specializing code based solely on the type doesn't work; integers, for instance, form a monoid under both (0,+) and (1,*) and if you've got some Monoid<T> generic object you can only define one or the other.

Semantic Type Qualifiers
You'd like your programming language's type system to be extensible in the sense that you can add your own statically-verifyable constraints and the type checker will make sure they're true. There are many approaches to this already; these guys made another one that seems like it's got a particularly nice syntax and also actually relates the new "types" the user makes up with the program's dynamic semantics, which is cool.

Permission Based Ownership: Encasulating State in Higher-Order Typed Languages
So, if your language has a module system that hides a component's implementation with existential types, your code is encapsulated, right? Wrong! Consider (in Java-ey pseudocode):


class A {
private int[] ints;
public int[] bad() { return ints; }
public int[] good() { return copy(ints); }
}


Since clients could mutate the array that bad() returns to them, they could change A's internal state (even though A objects can only be ascribed type A). The good() function has the exact same type but doesn't suffer from the same problem. Crazy, huh? The paper shows an extension to Fω that adds ownership types to catch these problems; you say "the world can access the public interface and the public interface can access or update the private implementation, but the world can't update the private implementation" and the type checker makes sure that doesn't ever happen.

The notion that existential types didn't cover the entire problem was a real eye-opener for me, and the talk was worth it for that insight alone.

Session 4: Optimization



Code Placement for Improving Branch Placement Accuracy
The observation is this: branch prediction is very important to the performance of modern CPUs, and said modern CPUs implement branch prediction by storing statistics on each jump instruction in a table somewhere. Alas, the table has limited buckets, so some hashing has to go on, and that hashing is typically on-chip branch predictors suffer from "destructive interference" where one branch is usually taken and some other unrelated branch that happens to get stored in the same bucket is usually not taken. This paper suggests arranging your jumps so that the ones that get stored in the same buckets tend to have the same behavior, and interference is constructive rather than destructive. Great idea, but (a) the microprocessor people aren't telling how they choose those buckets and (b) you can't just arbitrarily move jump instructions around. So the author spent a tremendous amount of time trying to figure out how branch prediction worked on the CPUs he was working with, and got something that gives you a modest performance boost. I can't help but wonder whether all the work will be lost next time a chip comes out.

Optimizing AspectJ
They wanted to make AspectJ faster. They did. I find it hard to get worked up about it.

Automatic Pool Allocation
I really regret that this talk was the one I picked to ignore the first half of while I worked out a problem with the multilanguage stuff that's been bothering me, because from the second half it looked really cool. Anyway, they figured out a way to analyze the program to improve data locality in the heap objects it creates using points-to analysis. Sounds awesome, and it won the best paper award. Question asked: what should programmers do to make sure they can get the benefits of this stuff? Answer: don't use a custom allocator, don't do fancy pointer arithmetic.

Garbage Collection Without Paging
Make the virtual-memory agent in your kernel notify your garbage collector when it wants to page a particular page out, and let your garbage collector either (1) free some memory up in response or (2) nominate an alternate page instead, and you can get a garbage collection algorithm that essentially never pages memory in during collection and thus runs crazily faster than ordinary garbage collectors when memory is tight. Very neat.


Overall an auspicious first day with lots of interesting talks and very little that seemed completely bizarre, incomprehensible, or silly. I am a little put off by the emphasis on quantitative performance gains, but, as Ron Burgundy would say, "When in Rome ..."

0 Comments:

Post a Comment

<< Home