cobra_kai, it's funny because I had the exact same thought process... When I failed to run Octave on either platforms, I looked into scypi since this actually might also become useful for my work. I spent a little bit of time looking at it (spyder on windows) and liked it enough (learning curve from Matlab is about 10x that of going to scilab though) to try to get similar tool for my mac at home. Turns out getting a nice interpreter like spider on the mac was way beyond my IT skills and I ended up forced to restore Lion from my backup . Then started the quest to dual boot Lion and Ubuntu, which went nowhere (MBP'11 not supported much, for example no wireless or trackpad so it was not really practical, plus I could never get it to boot anyway). Anyhow, I ended up installing Windows 7 and been actually very happy with that. The OS is almost as nice feeling as OSX and I can run python on it... In all, although I like to recommend macs for most anyone, I would say that in this particular case, it's been hell!!
TMoney: Stereophile's introduction to their standard speaker tests gives you some explanation about what it is and more importantly what it tells that a regular magnitude response function in 2D doesn't: http://www.stereophile.com/content/measuring-loudspeakers-part-two-page-5
As cobra_kai said concisely, the extension into the 3rd direction is time. Think of it as the frequency response magnitude of the headphone, but only analysing the trailing edge of the impulse response. The first line (at t=0) is actually the exact same curve as published by Tyll up to now, it is the frequency response magnitude of the headphone using the whole data from rise of the impulse to its full decay a few ms later. The FRF slices down the time line truncate the beginning of the data so you're looking essentially at the ringing signature of the headphone. What you typically see is that peaks in the standard FRF magnitude curve ring for some time such that a ridge is formed at those frequencies.
In the case of the HD800 example, it is a bit hard to see as the headphone measures near perfect with just a tiny bit of ringing at 6kHz. In the case of the HF2, acoustic resonances of the undamped rear chamber for example (somewhere between 2 and 4kHz) clearly show up as they take several ms to decay by 40dB or so. As mentioned in the Stereophile introduction, you're pretty much guaranteed to hear those resonances in the midrange / lower treble so the graph is very useful to really gauge the behavior of the headphone. 2D magnitude curves typically make you miss that point in case the resonances are not clearly standing out.
Now, the issue at hand with headphone measurements is that you may not be able to look at such CSD graph with standard data which includes reflections from artificial (or not) pinna. In the case of speakers, measurement is performed in an anechoic room or in a standard room but truncating the impulse response before you see the first wall reflections (such as 6-10ms). In the case of headphones, Purrin's data looks really good but the man got some talent!
However, personally, I am not convinced yet that a standard headphone test on dummy cannot be used. The reason is that you are looking at some reflections no matter the test setup, if only the acoustic reflection from the back of the chamber to the ear. In the case of speaker, Stereophile rightfully mentions the test should be performed in anechoic conditions, but it is simply to remove the room dynamics (else you'll see the room modes decay, not that of the speaker). As discussed in the HF thread, at least with Purrin's data, the reflections from the room where the test was done are almost in the noise floor (50dB down) and don't prevent to look at the CSD of the headphone. Tyll is doing his tests in anechoic conditions (at least above 1kHz) so this should return even cleaner graphs.
sorry for the long rumbling post