2020, Chapter 3:

Lighted by the Blind

Phono cartridges (twice)

Tubes (double-blind!)

Delta-sigma versus multibit DACs (twice)

Magni 3+ and Heresy and Vali 2

Even a couple of prototypes

Why Blind Testing is Wrong

No differentiation pressure: the main question we’re asking is whether or not you like something better, not forcing you to pick which is which. Large choice of music: we want you to be able to choose something familiar, to maximize your chance of hearing any differences. Choice of transducers: when we’re doing headphones, we try to have several models available, so you can choose one you’re comfortable with. No time limits: so you can listen as long as you want, because sometimes it takes time to hear small differences. No expected outputs: there are no scoring sheets, no ranking system, no component list, no expectation to hear anything at all—everyone is told that “no difference” is a totally valid answer.

Level-match the output of the electronics into the actual transducer to 0.05db or better, to eliminate any difference in volume caused by output impedance variations. Carefully hide all the products under test. Eliminate the need to plug and unplug headphones between products. Provide a user-controlled instant switch that is as seamless as possible. Offer an external volume control that applies equally to all products under test.

When listening to phono cartridges as a group, you have a limited library, and you also have non-instant switching and volume changes…but those are more gross differences

When we did the tube test, we were sent tubes that had been sealed in PVC pipe, so nobody—not even Schiit staff—knew what they were, so we had a true double-blind listening session

Well, What Have You Discovered?

There were fairly large differences between tubes in a double-blind comparison. Fairly large, compared to what I expected anyway. We were testing PVC-embalmed tubes that a friend had sent in using Saga Plus. Saga Plus isn’t the best platform for showcasing tube differences, because it’s simply a buffer, and a hybrid one at that. So, a half-solid-state no-gain buffer demonstrating audible differences is pretty amazing. At least to me. When we switched from the first tube to the second, I really expected to hear nothing. But the difference in tonality and stage was readily apparent. Now, this was on speakers, in a group, so maybe it’s group hypnosis, but…who knows.

Fairly large, compared to what I expected anyway. We were testing PVC-embalmed tubes that a friend had sent in using Saga Plus. Saga Plus isn’t the best platform for showcasing tube differences, because it’s simply a buffer, and a hybrid one at that. So, a half-solid-state no-gain buffer demonstrating audible differences is pretty amazing. At least to me. When we switched from the first tube to the second, I really expected to hear nothing. But the difference in tonality and stage was readily apparent. Now, this was on speakers, in a group, so maybe it’s group hypnosis, but…who knows. There are always large differences between phono cartridges. This should be expected, since they are actually transducers. What’s interesting is some very anti-Grado folks turned over to liking Grado once the logos were hidden. Hmmm!

This should be expected, since they are actually transducers. What’s interesting is some very anti-Grado folks turned over to liking Grado once the logos were hidden. Hmmm! We’ve chosen True Multibit over Delta-Sigma once, but it was a bit of a flawed test with some changeover time between listening. This was on speakers in a group as well.

but it was a bit of a flawed test with some changeover time between listening. This was on speakers in a group as well. Due to the flaws, we decided to do True Multibit versus Delta-Sigma again, this time with two different DACs (Modi and Modi Multibit and Gungnir and Gungnir Multibit.) This time we chose Delta-Sigma over True Multibit both times! Yeah, like I said, the results can be unexpected.

both times! Yeah, like I said, the results can be unexpected. We used blind listening internally to answer a question about Bifrost 2 , when some staff thought that the old Bifrost sounded better. We set up a blind listening environment, and several of us listened with headphones. All chose Bifrost 2, answering the question definitively.

, when some staff thought that the old Bifrost sounded better. We set up a blind listening environment, and several of us listened with headphones. All chose Bifrost 2, answering the question definitively. Internally, blind listening has correlated with anecdotal and sighted listening of several in-process prototypes, surprisingly even on prototypes with measured performance so good that there should be no audible differences. Two of these prototypes were taken to the Schiitr, and the results there tracked the internal results as well.

The differences between electronics are very, very small. Smaller than many subjectivists would have you believe. If you’ve ever read reviews that talk about amp or DAC changes as night-and-day, chicken-crap-or-chicken-salad, revelatory, or life-changing, you’re gonna roll your eyes hard after doing any blind listening. I’d expect most people to have the same reaction I did: spin the knob and say, “What difference?” If you’re impaired by noise, distraction, or booze, you’re not going to hear much. Differences collapse when people are yammering in the background, or if you feel pressured, or if you’ve had a couple glasses of wine. Do some blind listening, and you’ll immediately consider any “reports from the audio show floor” as delusional or superhuman, because it’s simply going to be impossible to discern much through the din. Measurements don’t correlate well with blind listening results. It’s insanely hard to pick Vali 2 from Magni Heresy, even though one amp has 1000x more distortion and its AP report would probably engender much projectile spewing from the pure objectivist camp. At the same time, we’ve reliably heard, and correlated, differences in prototypes running at the 120dB SINAD level (yes, we are serious…perhaps we are insane.) It seems that some unexpectedly larger differences exist. Tube type doesn’t change the measurements of Saga Plus’ buffer very much, but that’s one of the larger electronics differences we’ve heard. Again, maybe this was group delusion, but we had a couple of people who picked their favorite tube multiple times (from a group of 8), without knowing what was what. And again, these differences are still small. Just not TINY. Sighted and blind listening results have a tendency to correlate. What we hear sighted is what we hear blind. At least in limited testing. Furthermore, products that have been described by other subjectivists as having a certain sound have been described the same way during blind testing. No, we don’t have lab-grade data on this. But there has been enough anecdotal evidence to suggest there may be something going on here.

What Does This All Mean?

Supposition, as in “guess?” Sure, we like guessing.

Limited evidence? Yep, got that.

Starting point for further investigation? Sounds good.

Stoddard’s Hypothesis ​

​

There are small audible differences in audio electronics that cannot be readily explained by measurements, or that exist below the level commonly accepted as inaudible; furthermore, some people can hear these small differences, and some of the people who can hear them may consider them important. ​

This is a hypothesis. As in, a guess based on limited results. I’m drawing on Tyll’s Big Sound and our own blind testing results in formulating this, as well as a couple other ad-hoc tests I’m aware of. Hypotheses can be disproved. It’s entirely possible I’m full of it. The point is, this gives us something to shoot at, right or wrong. Once, we thought tectonic shift was a load of bollocks. Of course, we also believed in aether. So who knows how this will go. A lot of people are gonna flip out anyway. Because they didn’t read the preceeding narrative or the hypothesis itself carefully. Key words: SMALL. As in tiny, tiny differences. That’s also why it says “some people” and “can.” This is not a judgement. This is just a fact that human capabilities vary. Also note that it says that only some people consider these tiny differences important.

Moffat’s Corollary ​

​

Human hearing seems to be more integrative than differential, so those small differences between components may be magnified over time, and therefore seem larger and more important than during rapid switching. ​

“This DAC is a total POS and sounds like absolute butt compared to the Arglebargle X2000!” Ah, nah. Again, unless it’s actually broken (like, fried output stage, a couple of tenths of DC offset, one rail gone and clipped, etc), it’s not going to be THAT different. Not even if it costs like a car. Or a house.​

​

“I changed this cable and it totally changed everything, the soundstage opened up and the inner detail emerged, I can’t believe mouthbreathers don’t think cables make any difference!” Yeah, unless the cable is broken, that’s a didn’thappen.com​

​

“This designer is a total idiot because he mixes tubes and transistors, and sometimes even adds an op-amp, you can’t trust him, once there’s anything in there but glass, it’s gonna sound horrible!” Uh-huh. Hey guys, I’ve designed with all of those. They all have their place. And it’s a real eye-opener when, in blind listening, most people can’t tell a tube amp from a solid state one. No, seriously.​

​

“These measurements are horrible, I can’t believe the SINAD is only 65dB! With incompetent engineering like this, it can’t sound good!” Yep. Cool. Let’s do a blind test between that and a 115dB amp and let me know how it goes. I suspect you’ll find the differences much, much smaller than you expect.​

​

“Audio performance can be distilled to a single measurement, and only an idiot would get something with lesser measurements, unless it’s out of their budget.” Oh really? Then how do we have people who have reliably distinguish between two amps running 0.0002% THDish in level-matched blind testing (and no, it’s not noise floor, they are the same in that regard). Maybe we are insane. ​

​

“All of the high-end is just a scam to separate dumb people from their wallets, since there’s no correlation between price and measurements.” Yeah, and I’ll agree there’s some silly-priced stuff out there, but if there are small differences, and if those small differences are audible, who knows? Maybe to some it’s worth it.​

I may be going off the deep end here.Yeah, I’ve written about controversial stuff before, like “ Measurements (With A Side Order of Sanity) ”, “ The Elephant In the Room, ” and “ The Subjectivist/Objectivist Synthesis. But this one goes a bit farther. It claims a bunch of stuff has been done wrong, it reports results that will likely irritate both the subjectivist and objectivist crowds, and advances a couple of pretty nutty ideas.So what’s this all about?Two words:“Ooh, scary!” some subjectivists are thinking, imagining some stressful A/B/X test composed entirely of baroque chamber music with an Orwellian proctor standing cross-armed behind them.Nope! Note the terminology. We’re not saying “blind testing.” We’re saying “blind listening.”And we think that’s an important distinction. Because nobody wants to take a test. They don’t need the stress.What is blind listening?To us, it’sAfterwards, you might pick a favorite, comment on any differences you might hear, or try to identify the product. But there’s no forced choices, no time limit, no pressure…and “there is no difference” is a totally cool answer.If this sounds a bit weird, it is.After all, 99.9%+ of all audio reviewing is not blind, but sighted. The reviewer knows exactly what product or products they are reviewing. They see the finely-crafted front panel, the badge carrying the storied name…or the chintzy casework and unknown brand. They touch the finely-weighted controls, or flick the rubbery switches. They also know the price, whether burger-like or car-adjacent.Furthermore, sighted listening is the order of the day, whether or not the reviewer is a subjective or objective reviewer.Stop. Go back. Read that again.To repeat:In an ideal world, craftsmanship, name, and price wouldn’t matter. Everything would just be about the sound.But this is the real world. And to think that seeing, touching, and knowing the gear has absolutely no impact on the review…now that might be expecting a bit too much.And it’s not just reviewers who can be misled. Audio designers (like us) are going to find it difficult to be 100% unbiased. If they’ve always done discrete, like me, they might expect an op-amp based design to automatically sound worse. If they’ve always done gear based solely on measurements, they might immediately dismiss a tube amp as being inferior. Plus, there’s always the “new baby syndrome” when your latest creation is up and running.So, to take away these unconscious biases, we started “going blind.”Over the past year, we’ve been running a series of blind listening events at the Schiitr. We’ve listened to a whole lot of different stuff, in a whole lot of different ways, and we’ve gotten some really surprising results. We’ve done:At the same time, we’ve started to incorporate blind listening into product development. This has reached the point where it’s integral enough to change the direction of a product; the prototype listening mentioned above was an outgrowth of two possible product directions we’d been wrangling over internally for some time. We see blind listening as immensely valuable from every angle—from developing products to bringing things together.“I still don’t like it,” some subjectivists might say. “What if I like the cheap one?”we’d tell you.“Or the wrong one?” the subjectivists continues.we would ask.“Or if I don’t hear any difference?” they gasp.we’d say.Of course, some objectivists may have their own questions or objections.Like, “Well, once you reach a certain level of measurement, everything will sound the same.”we’d ask in return.“Nobody can hear the differences at those levels,” they might assert.we’d say.Or, the final be-all, end-all: “How can we trust you if you don’t follow the proper scientific procedure and produce data that can be peer-reviewed for publication?” they might say. “This is all just…marketing stuff.”we’d say.Plus, in our opinion,Gasp if you want, but read on…Most blind testing that has been done to date is of the A/B/X variety: we play you “A,” we play you “B,” and then we play you “X,” and you try to determine whether X is A or B. Or maybe you have control, but the focus is still on "you must discern one from another!"Sounds like a ton of fun, right? Yeah. Like a math test.And that’s the problem. Blind testing ain’t fun. It’s hella stressful. People are gonna get the cold sweats doing this kind of test. They’re gonna screw up. They’re gonna check out. They’re simply not even gonna show up.And it gets worse. In some cases, you don’t get to choose the music. I don’t know about you, but if I don’t know the music, I have exactly zero chance of trying to pick out any differences between gear.And worse: frequently, you won’t be familiar with the gear. Depending on the headphones or speakers, you’re gonna dramatically increase the chance of a null result.Of course, it could be made even worse: add, say, a time limit, and you have the perfect storm of stress and unfamiliarity to get absolutely no usable results. It’s almost as if the test was designed to obscure any actual differences!That isn’t to say people haven’t tried to make things better. Most notably, in 2015, Tyll Hertsens at Innerfidelity tried a different approach: blind listening in a much more relaxed environment.Tyll invited a wide range of people to listen to a whole bunch of gear, including some not-so-expensive and price-no-object stuff. He allowed them to choose their music and take their time. As one of his challenges, however, he did ask the listener to try to identify specific products, which could be stressful.And…here’s the amazing thing: Tyll had a couple of guys who went through and scored wayyyyyyy beyond chance—one identifying 14 out of 15 amps correctly!Objectivists largely ignored the results, which showed that there were some people who could perceive small audible differences in level-matched blind comparisons, at least with amplifiers.Go back and read that again.The subjectivist holy grail!There should be dancing in the streets, right?Weeeeeeeellllll….maybe not. As Tyll said, the differences wereAnd not everyone did so well on the blind test.But the results of Big Sound 2015 always stuck with me, and when it came time to set up our own blind listening, I used Tyll’s results as a signpost. And I added some more goals. Specifically: let’s make it easy, familiar, and comfortable.This meant:At the same time, though, we:Again, this isnot blind testing.Now, all of this didn’t happen overnight. All the parts of our blind listening evolved over time (the large choice of music and transducer choice came about most recently, where we used it with Magni 3+ and Heresy.) And sometimes you can’t implement all of it, or you can implement more:And yeah, I know, there are plenty of criticisms that can be leveled against our approach.First and foremost, the group sessions are gonna be thrown right out. Group dynamics mean the Loud Shouty Man will dominate the opinion. And people will throw out their opinions freely, despite us asking them to **** until the event is over.Furthermore, critics will cite lack of level matching and instant switching in phono cartridges, but that’s a bigggg undertaking. Maybe we’ll take that on someday, but that will require multiple phono preamps and turntables…eeek.Beyond that, the lack of structure in what we’re asking for may come under fire. Or the fact that the blind listening events at the Schiitr are fun, public events where people might or might not have been drinking.In response, I’ll say this: we’re not looking to publish a paper. We’re looking to have some fun…and maybe discover something in the process.And oh boy, has it been interesting.Let me start with an anecdote about when we did the blind listening with Magni 3+, Magni Heresy, and Vali 2.This was at the Schiitr, on a night when we first deployed a level-matched, instant-switching system to the public. We had three setups, each with a different headphone. Each had a rotary switch that allowed instant switching between three amps. For this test, we level-matched with the actual transducer as a load, so it would be as precise as possible. Level-matching was done to 0.02dB or thereabouts.Listening was easy. Sit down, put on the headphones, select the music you wanted (from our own library, Tidal, or Qobuz), and switch between the amps.There was only one problem: all the amps were shorted together. Or at least that’s what it seemed like to me. Because there were no differences between them. They all sounded the same.I called David over. “Is this working?” I asked, twisting the rotary knob.David laughed. “Great matching, huh?”“No, I mean, there’s no difference at all,” I told him. I switched the knob back and forth, trying to hear any difference. Hell, I couldn’t even hear the switch glitching when it changed to the next amp. It had to be broken.“Try the other headphones, they’re more resolving,” David said, motioning me at another listening station.“I really think they’re shorted together,” I persisted.David grinned, reached around the back of the switchbox, and unplugged one of the cables connected to it. Immediately one of the amps dropped out of the rotation.“Ah,” I said, thinking, Where is your god?I mean,“So it’s working,” I said, finally.David nodded, looking very pleased with himself.Argh.“And one of these…is the Vali 2?” As in, the amp with 1000x more distortion than a Magni Heresy?“Yep,” David assured me.That one should be easy to pick out! But, switching back and forth, I really didn’t hear a difference!“Take your time,” he said. “And seriously, go for the other headphones, it’s easier to hear a difference there.”Hmm. Well, I wasn’t about to give up yet. Instead, I went back to the music library and picked something that I was very familiar with, rather than the generic Audiophile Approved stuff that had been playing. (Don’t laugh, it’s Crash Test Dummies, The Psychic. I also use stuff like the B-52s Deadbeat Club. Yeah. Bite me. I can also use the mythical golden Muddy Waters Folk Singer album, recorded through Mike’s insane GAIN 1 system.)Now…now I could hear some differences!But man oh man, they wereStill, one amp seemed a bit softer in the highs, so that had to be Vali 2, and one was crispier, so that was probably Heresy. I went back and forth for another minute or so, then announced to David I knew which amps they were.“Go for it,” he said.I named them.David laughed. “One hundred percent wrong!”Yeah. Boom. Duh.To make a very long anecdote a bit shorter, I tried again on the most resolving headphones, and this time I was able to call all three amps right.But. Still. TINY.I mean, these were really small differences…even between op-amps and a tube hybrid!It got weirder, because when we did more blind listening using the same system back at Schiit, the differences were more apparent, and things were easier to call.Still.So why was it harder to pick them at the Schiitrmeet? Several reasons, but mainly because it was noisy and drinky. People weren’t quiet, and I’d had a couple of beers. This isn’t conducive to hearing differences.But that wasn’t the only blind listening we did. In other tests, we discovered:So what can we learn from all of this?In short, we really like blind listening, and we’re going to continue using it and reporting on the results. It allows us to be honest with ourselves, which allows us to make even better products.“Sounds great!” someone might say. “So how come there hasn’t been more blind listening?”Simple:Yes. Even as comfortable and stress-free as our methodology is, many find it a really, really scary prospect to face the switch—both subjectivist and objectivist alike!I mean, if you’re a subjectivist, what happens if you choose Modi 3 over a car-priced DAC? What will your friends think? Do you lose your Golden Ears award? Does it mean there is no meaning to life and all is lost?No.Worse, though, if you’re a subjectivist, and the differences between electronics are so small, what does that mean for cables and power cords and fuses and magic stickers and stones and Schumann resonance tuners and all the other crazy stuff people say “transformed” their systems?Well,It’d be easy to set up a blind listening event at the Schiitr, right?No, seriously. Bring it, it would be fun to try!But objectivists may be feeling uneasy as well. Those who worship at the altar of numbers might find themselves in a cold sweat if they can’t tell a Vali 2 running 0.3% THD from a Magni Heresy running 0.0003%.I was surprised as well.Or, what if they listen to, say, a Lyr 3 and a Magni Heresy, and prefer the Lyr? What does that mean? Is their lifetime Card of Objectivity null and void? Did they just join the dark side? Is there no hope for sanity and reason in audio?No.Nothing wrong with that.So, relax. Have a listen. A blind listen. You might be surprised what you hear.(Or not, because, like I said, TINY.)Not so fast.We haven’t been doing blind testing for all that long, and we certainly don’t have an AES-quality methodology. We started doing this on a lark, we kept doing it because the results were surprising, and we continue doing it because it’s interesting and fun. To imply we have enough data for a summation (or summary judgement) is just silly.That said, I think we have enough for a hypothesis.Before you start screaming, here’s the definition of a hypothesis: "a supposition or proposed explanation made on the basis of limited evidence as a starting point for further investigation."So here we go:Sounds pretty basic, huh?Well, it’s not. It’s a big departure from past blind testing of electronics, which has usually (or at least seemingly) been seeking to disprove audible differences.Now, let’s be totally clear:Furthermore, before we get into the yelly bits, let me offer a corollary:This is based on Mike’s 40+ years of experience in audio, and it coincides with my own experience, as well as countless anecdotal reports. Again, not AES-level stuff, but also again, it’s a guess, a hypothesis…And heck, if it’s true, it pretty much completely blows up the “burn-in” thing. If this is true, there is likely no burn-in except brain burn-in; your brain simply adapted over much listening time to the small differences in your current components, so when you swap one out, the difference seems much bigger at first.Okay, so now let’s start the yelling.While we don’t have enough data for summary judgement, or to prove our hypotheses (and we may never have), I do think blind listening has made one thing abundantly clear:What do I mean?Ah, come on:Yes. The BS comes from both the objectivist and subjectivist sides.And, all too frequently, the BS is weaponized.Did you notice each example above contains an insult?Do you know how people respond when they are insulted?There are times when I think we need a middle path between obectivists or subjectivists…what, call it Sensiblist or Independent or something like that. Because that seems to correlate better with what we hear—that yes, the differences are small, but they are there…and who are we to tell someone else what to enjoy?I would really like to see more blind listening, more research into where measurements matter, and more open conversations about both. Because I think both sides have a lot to learn from each other.We just need to stop shouting.And start listening.