Thank you for posting that explanation.
Suppose you have your own HRTF measured with two stereo speakers in a low reverberation room (anechoic) and a DSP that not only convolves - without the addition of crosstalk - a ".wav" file recorded with a binaural head microphone, but also equalise your headphones to flat frequency response at your ear canal.
Do you think the elevation cues - filtered by the binaural head microphone transfer function - only change the listener perception of elevation of a recorded point source (in other words, the listener understand that the source is above or under 0 degree, but the listener doesn't realize the true/original elevation of the recorded point source) or completely ruin the elevation perception (the listener do not hear the source as it were above or under 0 degrees elevation)?
Now suppose you have a HRTF measured with an sphere arrangement of sixteen speakers (eight at 0 elevation, 4 at +45 degrees and four at -45 degrees) in a low reverberation room (anechoic) and the same DSP above (now there is no crosstalk between the speakers separated by an sagittal plane and you deal with comb filtering of speakers in the same side of such sagittal plane).
Does this second arrangement improve the listener elevation perception compared to the first arrangement?
If you think the second arrangement is worst than the first arrangement, how many channels the second arrangement would need in order to achieve the perception performance of the first arrangement?
I would like to know your opinion.