The problem of high fidelity – a small introductory guide 2018-02-02 17:11:48

The problem of high fidelity – a small introductory guide for folks into the problematic of the “best sound” – I presume you google HRTF and pink noise first and have a very general idea of what that is.





1) A small (unnecessary, but maybe fun) intro



The goal of HiFi (be it in ear monitors, headphones or speakers) is to create the best possible illusion of reality, to recreate most exactly the conditions of original sound - and any deviation in any aspect will change our perception – something will be „odd“ or „wrong“, or different at the very least – perhaps subtly, sometimes more noticeable.



Science has enabled us to calibrate microphones (artificial ears) and interfaces (artificial ear-nerve-brain impulse transformation) to near objective perfection. Using proper and quality tools we can pick up with our microphone, store and process – at their position in spacetime, sound energy that arrived to them at given spacetime(s). It also enabled us to create amazing measurement instruments for this spacetime – atomic clocks (the best), and different oscillating crystals – some even use GPS (atomic clock basically) to discipline themselves and be amazingly correct. One of our current best, probably flawed in some way - understanding is that space and time are really one and the same (thanks, Albert). There are models which do predict that „time“ dimension has a bit different properties than „space“ dimension, but generally they are the same medium. Imagine that. A distance in space is quite literally a distance in time. It profoundly affects reality – time is measured only locally. The only thing in reality about which everyone agrees is the speed of causality - a speed which dictates how fast one event at one „point“ of spacetime (energy transfer) can influence another event at another point. This speed is the absolute cosmic reference and maximum speed. It is, you maybe guessed, „c“. It just so happens that light in vacuum travels at this speed. If there was no speed limit – chances are reality could not exist – everything would affect everything, everywhere, instantly – events could not take place, on a global scale, time could probably not exist.



Sound in air propagates at another, much slower speed, also sometimes dubbed „c“.



Why we evolved in such a way to perceive spacetime to be „space“ and „time“ is probably because it was most useful way of doing so to adapt and survive. Energy exists „within“ this spacetime - perhaps more accurate to say – energy is a property of it. Each given point in spacetime has a set of descriptors.



Imagine a big ass painting. Now, overlay a fine measurement grid over it. You can say for every dot on the painting – yes this position has this colour, that other position has that colour, etc. Spacetime is a 4-d painting of everything there ever was and will be, in our universe. Overlaying it are 4-d „fields“: electron field, quark fields, Higgs field, photon field, … and positions within spacetime at which a given „wave“-field does not rest but has some sort of „energy“ or excitation – will be what we define as a particle – and at these spots those fields interact. Furthermore - backwards, this „particle“ can be described as having a position in space, and time, and some sort of energy or momentum, with an orientation – it's going to „go“ somewhere probably. It's very much impossible to measure both to complete precision. Universe won't allow it.



Frequency is a „rate of change“. It can be spatial. Draw a sinusoid. Through „time“ it remains the same. The ink's position on paper changes with space. It can be a rate of change through time, too, such as an oscillating ball-on-spring going up or down, or air particles moving forward and backwards. Now, air particles moving forwards and backwards in a sinusoidal-like (harmonic), or any other type of rate of change, and this behaviour propagating through space and time - are soundwaves. Each particle can be described with position and momentum ( = it's going where and how fast?).



Anyone furher interested in this should google up „quantum field theory“ and „double slit experiment“ to be properly amazed. Warning though – for an inquisitive soul – this is a big rabbit hole.



Most of what we can accomplish in standardization of time(space) measurement is much more than needed for audio measurement (both of the transducer – speaker or headphone, and amps or dacs too). A good quality quartz crystal in your oscilloscope or proper audio measurement tool should be way more than enough precise to analyse audible spectrum of frequencies. An amp measurement, or a speaker measurement - will be, if the gear is right and calibrated, experiment is properly set up, resolution is good enough, sampling is good enough, etc. – the objective truth.



So that's great and easy. Can I make a measurement and apply it to further my hi-fi dreams? Probably not. Yet.



While it's true that we can describe and model the behaviour of reality - sound too, to a very high degree, there are a loooot of „if's“.



2) Main part – probably still unnecessary



As Mr. Linkwitz said, you are „listening“ to whatever caught your attention in the superimposed soundwaves arriving at your eardrum. Since we were born, we have been using and adapting to our constantly changing tissue – skin, cartilage, bones, nerves, even brain. Each of us is different, and yet, we can all agree something is blue. Or something is a high tone, or a low tone. We probably perceive it in our minds differently – what you see as your blue, I might see as your red, my blue – but we all agree it's objectively blue. Humans can be fooled though. Really easily. Try some visual illusions. We tend to perceive reality through filters imposed by our sensory organs and our brains. These filters help us predict the nature of reality, while actually being „wrong“, but they are applied to everything, so our relative observations are quite OK, even if our absolute observations suck. What does that even mean? It means that context is the king in human perception. The exact same colour of red in a shadow or in light will not be the same red for us – the brain will understand that the red in the background with more light is dimmer than the red in the background with less light, and we will perceive it as different – and a real life object probably will be, even though the reality of the red light waves is that they are exactly the same. Context. Is. King. Also. Contrast. Too. But not for the microphone.



Properly calibrated mics and interfaces should all give the same result. They don't give a crap about „context“. They „see“ the raw unanalysed, unmodified reality of the sound particles hitting and pulling them. What does that have to do with anything? As it turns out, everything.



If humans were infallible to our judgements of context – we'd probably be utterly inept to see patterns as good as we do now, and probably wouldn't create and achieve everything we did. But, we'd be great for analysing audio objectively. A human who understands that he can be fooled by context, is trained to perceive things differently, to remove as much subjective bias as possible, will still pick up soundwaves coloured by his anatomy. But for him, it's been so his or her entire life. We learned that our modifications, which are applied to everything equally (eh… we'll see about that in a moment), are our standard. Using our own standard hearing properly, with as little bias as possible, we're not so bad as measurement tools ourselves actually. We'll still be trumped by mics and interfaces, though. They are not amazed by violins. I am.



What is hi-fi? What is the best illusion? What is the best, most trustworthy reproduction of the original? Well that depends. Here come those „If's“.



a) If we are speaking about speakers, again, it depends.



1) Do we want to achieve a perfect reproduction in a sense that our eardrums „hear“ what the „mic“ heard? (That would be wrong without hrtf – in a moment, but that's the general idea)



2) 2) Do we want to make speakers emit what the mics „heard“? Picked up?



If it's 1) – we want to achieve the highest fidelity in a sense that our head is now there where the mics were while recording, we're going to have a bad time. Considering even that we have a flat frequency response, very low distortion and the speakers are more than 1 m away from us. Why? Well, a) room. b) speaker crosstalk. c) speaker directivity.



a) Room will colour the sound. It can be helped a bit, but it's expensive and weird. You will perceive yourself being in your room, with the sound source being between the speakers.



b) One mic heard something. Other mic heard something else. Now both your ears here hear everything. This will damage the illusion. Ambiophonics DSP or whatever the name is can try to help here, but it will screw something else up. Eh. Such is life.



c) Even if you somehow magically solved problems a and b, you still get the fact that lower frequencies from the speaker go everywhere, and higher go much more like a cone or a reflector – defectively. It's only going to be „perfect“ at a very limited area in front of the tweeter.



If we want to 2) have the speakers barf up what mics picked up, then problems b and c remain, but problem a is solved of sorts. We accept that, for example a jazz band recorded in studio, will sound like it's standing between the speakers and playing, inside your room. That's not so bad.



But, for more orchestral or ambience music with the need for a large soundstage, that can be an issue of sorts. To try to kill the room effects and add ambience, one can use the dipole speakers. Learn about these from mr. Linkwitz. To try to get everything to be more directive, one can use horns. For bass they get stupid big. You could also go full retard like me and try to design a full omni speaker. What good that's going to do, I have no idea. Some people already tried, but not in a way I imagined. We'll see.



Anyway, measuring the response of the speaker is much easier at home. It's still a bit hard but much easier than headphones or iems. Basically what you do is put a calibrated mic right next to the woofer and ports, make nearfields (for low frequencies, high are not picked up properly) – then transform them into far fields, and make a proper time gated measurement of the speaker (kills the room reflections but is blind below a certain frequency), then splice the two responses. I suggest the REW software, it's free and very well supported and documented.



b) If we're talking about headphones, things get more complicated.



We're no longer in the „free field“. Our head, torso, body, no longer colours the sound from the headphones as much, it has a direct line of sight from the driver to our ear. That's good, no?



It's good for the problem of speaker crosstalk, as there is none or it's very small. It's very bad because, remember, we're used to hearing the coloured sound from our head and torso. It's our standard. If you suddenly could see and perceive(however that would look…) radio waves, for example, you'd be very confused. Confusion kills the illusion, fidelity. This is why the stereo sound sometimes seems to come „from inside the head“ – there was no head to block the sound when mics were recording. Unless, of course, there was – binaural recording – which is, of course, as good as the general hrtf can get.



Also, headphones are chambers. Open or not, there are special acoustic properties within that chamber which we need to account for. Basically, the goal here is to have your eardrum pick up the same frequency response as it would if you were listening to flat speaker. That's not going to happen. Why? Well, turn your head a bit. The entire sound illusion just turned as well. You are effectively fixed at one listening point, and you gave your brain the info you just turned your head a bit. But the brain expects the incoming sound to change, and it doesn't. This kills the illusion. Research is being made, and some products exist that try to address this, and, for example, in games, altering sound in such a way is called virtual surround (yes I hear the dragon on 5 o'clock, now it's behind me, while I'm running naked through Riverwood).



There are different approaches to measuring headphones. I personally can't decide which I like more, Tyll's dummy heads, or c.u.n.t from changstar/sbaf. What dummy heads do, is they have a more or less „known“ head related transfer function – that's basically all the linear colorations that happen to the sound wave while getting to your eardrum. If one measures the headphone with a dummy head with a known free field transfer function, get's a response, then applies the reverse of that function on it, the result should be what the ear would hear from a speaker with such a response in an anechoic free field – speaker is 1m or more away from us. On Tyll's measurements the red/blue curve is the response with a similar HRTF applied reversely, and the lot of the ones below are the raw measurement – with added coloration from the hrtf. Note that there is much more to it than what I'm writing, the orientation of the headphone driver, for example. These transfer functions are dependant on the angle. If you read more about them, and manage to get your own made, it's going to be a nice experience to toy with EQ.



C.U.N.T. is well documented on changstar/sbaf/headfi, I think.



If we go to IEM responses, we ignore even more of the ear and are limited only to the part of the ear canal to add colorations to which we're used to. That means, for it to sound as most hifi, to get the best illusion, the driver must produce the exact same colorations as our hears, head, torso, body, would. That's not possible, unless you get a custom measured hrtf and have a completely flat iem with your custom hrtf applied on it, which is a hard to achieve goal. Most iem's and headphones try to apply parts of a hrtf for the most average human head and auditory system, but it's not as good for you as it would potentially be with a custom one.







So. What to do about all of this?



Well, get yourself a calibrated mic and a dsp box if you're going to do your own speakers diy. You can get them really nice and flat.



Use the appropriate type of speaker that you want for the job. Normal omni/directional ones for experiencing sound in your room, or dipoles + maybe room correction to kill off the room as much as you can to get the ambience.



If you want to get the best possible ambience for music – binaural recording on iem's or headphones, if you apply eq with your custom hrtf, even better, if you apply some kind of head tracking to change the hrtf depending on angle – even better, maybe? No idea.



If you want to know which headphones will sound the most neutral to you, the good start is to check Tyll's measurements on innerfidelity or here on sbaf and see which are the most flat, have least spikes in the lower trebble range, and have low distortion. Then – I suggest donating to voicemeeter and making and applying the custom EQ, or do the same in foobar or minidsp amp or wherever. Try to get the sound as flat as possible as it is to YOU. YOU are the one listening with your ears. To do this, I recommend equalization with pink noise. I think there are some good tutorials online on how to achieve this. On top of that, the best possible ambience for gaming? Virtual surround with headphones. People usually recommend razer, as in software.







Disclaimer: some of the text are facts and can be easily googled, and don't ask me to provide proof – I can't be assed to do that – this is not a scientific paper. If you disagree or find errors, you can point out the wrong (and I'll be happy to investigate and learn, really). Some of it is my understanding of the subject, and written in such a manner to make it much easier for a regular reader to understand. I'm not perfect, very susceptible to bias as any human is, susceptible to poor understanding of science and quite possibly failed in my attempt to constructively transfer my “knowledge” to someone else – the reader – I know already. It's also entirely possible that science and views on audio from engineering perspective will change. Perhaps tomorrow, perhaps drastically. Who knows. It's much more likely that it won't and anyone trying to sell you magic is trying to rob you.



This is a small introductory guide to reality and reality and problem of hi-fi, written and given by me – and it should be treated as such. Now cringe: make of it what you will and take from it what you will. It was meant to be as a „guide of sorts“ for total newbies to the problem to have a general idea of figuring out what the person wants to achieve and what’s there to achieve. If it helps someone, I'm happy. Feel free to add whatever I missed, or you think is important. I hope this can generate some discussion from which me, and all of us can learn something new. If admins/mods deem this text to be stupid, by all means kill it. I won't even be mad and sorry for the inconvenience caused. Also, sorry if I placed this somewhere wrong, wasn't sure where to put it. I just feel there is no good short all encompassing FAQ for beginners, this was my (maybe bad/misguided) attempt. I plan to add small posts to get the regular dude ready to measure speakers next. Tell me not to do so if you hate this stuff.