The purpose of this article is to highlight challenges in HiFi testing and share the author’s own experience related to these challenges. It is not intended to convince anyone about differences or lack thereof between different HiFi equipment – readers are encouraged to draw their own conclusions in this matter reflecting upon their individual experiences.
Regardless what HiFi equipment is being compared, whether it is loudspeakers, amplifiers, DACs, cd players or turntables, the challenges described below will apply.
Challenge 1 – Associated Equipment & Location
The idea of any test is to eliminate as many other variables as possible, and concentrate on the one that we want to test. This is no different for testing HiFi equipment, and yet you often see people (including myself) comparing what they have heard at a dealer or an audio show to what they have at home.
So, what’s the problem you may ask… Well, first of all, regardless of what piece of equipment you are interested in, all other equipment associated with is likely to be different than what you have at home. Thus, you are not only comparing one piece of equipment but whole setups.
Even if you were lucky enough to find a dealer that had the same components as you, and was able to only change the one you are interested in, that still leaves us with the room problem. Yes, the room has significant impact on what we hear – the same equipment will sound different in different rooms. I have lived in 3 different properties since I got into HiFi, and I had one setup which I really liked in my old listening room but wasn’t at all happy with it in my new house. The only thing that changed was the room, all equipment was exactly the same.
Despite all of the above, people often make negative or positive comments about particular equipment in comparison to something else, based on the listening sessions conducted in different locations. Yes, you may not have liked the sound you’ve heard in that location, but most of the time, you don’t have enough evidence to attribute that to a particular component.
When you compare HiFi equipment, try to change only one component at the time and keep everything else the same. And more importantly test the equipment in the same room, ideally in the same physical location if testing speakers.
Challenge 2 – Audio Volume & Effect on Our Perception
According to Everest and Pohlmann (2009) the minimum difference in loudness that humans are able to detect is 0.25dB. This of course varies with frequency changes and SPL level, and in certain circumstances we may need as much as 9dB to detect the difference.
Moreover, according to the research conducted by Fletcher and Munson (1933) and later by Robinson and Dadson (1956), our subjective perception of loudness for different frequencies changes with the SPL level. To put it simply, if you play music quietly, you hear less bass and treble. If you play music louder, what you hear is more linear.
If we then combine these two findings, one can easily see how small changes in listening volume can affect our perception of what we hear. Despite this, even when people listen for very subtle differences between the equipment, volume matching is rarely taken into consideration. Furthermore, based on my own experience, providing that we are testing equipment at normal listening level, the equipment that plays louder is often perceived as more engaging and more detailed. Thus, the question that you want to be asking is – is my preference of one piece of equipment over the other a result of the difference in equipment or difference in the volume levels?
You should always adjust the difference in volume levels should always be adjusted to ensure that comparison happens on a level playing field. For equipment such as CD Players, DACs, and Amplifiers this can be done using amplifier volume control and measuring output voltage at the speaker terminals, when playing given test tone (either from pre-recorded CD or a streamer depending on your setup). Just make sure that your multimeter or voltmeter is capable of accurately measuring AC voltage at frequencies higher that standard 50Hz or 60Hz.
This is a little more difficult with loudspeaker comparisons. One will still control the volume at the amplifier, but voltage measurement at amplifier terminals is no longer an option because speakers may have different sensitivities. Using something like a universal dB meter is a good idea but it is far from accurate. It is still better than not doing anything. One could therefore set the dB meter to a dBa weighting and place it in a fixed position (ideally not far behind your head in your usual sitting position). You could then play something such as white noise using either a pre-recorded CD or a Signal Generator mobile app plugged into one of your amp inputs. I suggested white noise, but it can be any noise that covers broad frequency spectrum. The idea is that loudspeaker frequency response is far from flat, so if we use a sine tone at one specific frequency, we may end up having far less accurate results (i.e. one set of loudspeakers may have a dip at that particular frequency and the other may have a peak, thus using that specific frequency to balance the volume would be far from ideal).
Once the volume of the first set of loudspeakers is set to the level that we are happy with, we mark the volume control position on the amplifier (I find that narrow strips of masking tape are quite good for this). Then we plug in the second set of speakers and repeat the same process.
Now when we switch between them, we can quickly set the volume on the amplifier using previously marked locations. Again, it is not a most accurate solution, but it gets us closer to the desired position.
Challenge 3 – Auditory Memory & Timing Between Listening Sessions
How confident are you that the sound you remember is accurate? How long do you think you could reliably remember it for? 5 sec, 1min, 10min, 2h, a day? Let’s say that you are comparing two amplifiers and you are listening to the same track. Couple of minutes on amplifier A, then you stop, take a few minutes to plug in the other amplifier and then you listen to amplifier B for a couple of minutes. In my experience, the longer the break between the listening sessions, the harder it is to reliably judge the differences.
Let’s use our visual memory for an analogy… if you have two photos which are virtually identical but have some subtle differences, if you open them on your computer or a tablet and flick between them, you will easily spot the differences, because your brain will be attracted to them. If you however look at the first photo, than have a 5 min break and look at the second photo, then the likelihood of you spotting these differences is rather small. I’ve put together a couple of pictures to illustrate the point.
Open the first photo below, look at it and then close it. Wait for a couple of minutes and open the second one and see if you can spot the difference. Can you see it?
Now open the one of the photos in the gallery below and quickly flick between them (this works best on a computer because there is not swiping effect – one instantly changes into the other) and see how much easier it is to notice the difference.
This works very similar with audio, with one exception – according to the study conducted by Cohen et all (2009), our auditory recognition memory is inferior to our visual recognition memory.
Where possible, try to switch between the equipment with minimum delay. This is fairly straight forward to achieve with most of the gear, where a remote-controlled relay-based switch could be used to switch between two components. Yes, purists may argue that the relay will deteriorate the sound and so on, however, if this was the case, the sound will be deteriorated equally for each component you are testing. Moreover, most amplifiers have relays in them, so providing you purchase a decent quality relay, I would not worry too much about the signal being deteriorated.
This solution becomes somewhat problematic for loudspeaker testing. You could put together a similar remote-controlled relay-based switch or use an amplifier with two sets of speaker outputs. Although, this makes the switch instant, it requires you to have two sets of speakers near each other, which is bound to have an impact on how they sound. Not to mention different sensitivity of speakers which will impact their volume when switching instantly without attenuating it (see previous paragraph). Nonetheless, as a first stage of testing I still prefer this method because it gives us a good indication of major differences between the loudspeakers.
Challenge 4 – Biases
For those of you not familiar with the term – bias is a disproportionate weight in favour of or against an idea or a thing (Steinbock, 1978). Psychologists identified a number of different cognitive biases (this is a great article if you are interested), but for the purpose of HiFi comparisons I’m going to outline the ones that affect us the most:
Some of you may be familiar with this from testing medicine where during an experiment a group of test subjects were led to believe that they are being given medicine, whereas in fact they were given ‘empty’ pills. Despite this, the condition of the test subject improved because they believed they were getting a real thing.
This is one of the biggest biases in HiFi. For instance, if you read a number of articles about how great something is supposed to sound, you start your testing preloaded, and the likelihood is that this will skew your perception. At this point you are probably thinking “I’m not affected by it, I’m very open minded”. To which I say – rubbish! The whole thing about biases is that they work on a subconscious level that we don’t have any control of. Let me give you an example from my own humbling experience…
My best friend flew over from Germany to stay with us for a couple of days. He is into HiFi too, but in contrast to me, he is really big on streaming high-resolution files. He brought some samples of the same recordings in various bitrates with him. I was always curious how different bitrates compare to CD quality, so we connected his phone and started playing these samples while looking at the display. We both could clearly hear differences as he switched to the higher bit rate files. I was so impressed that I started thinking of selling my collection of over 2000 CDs and getting into streaming – seriously!
But having an appreciation for science, I proposed a blind test, but to make it easier for us, we only compared the lowest quality file with the highest one. We did 20 tries each and while one of us was switching which sample file was playing, the other one was listening and making notes of each playback. First thing that struck me was that I could not hear as much difference as before. As a matter of fact, I was really struggling, and it was more of a guess than confident choice when I was making my notes… When we compared the results for both out us, it turned out that we got 55% of our answers wrong…
Thus, my brain is telling me that something sounds better when I see that it should be better on the screen, and yet when I eliminate that knowledge and use only my ears I cannot hear the difference? Suffice to say that I did not end up selling my CD collection and it is still growing…
Halo Effect & Implicit Bias (Stereotyping)
The Halo Effect bias relates to attribution of certain traits based on other traits, without having any evidence to support that (Nisbett and Wilson, 1977). For instance, you may assume that a person with a kind face is a kind person. The reality is that you don’t have any evidence to support that, and in fact, that person may well be a serial killer.
According to Greenwald and Banaji, stereotyping relates to having beliefs that a member of a group will have certain characteristics that are common to other members in that group, despite not having any information about that individual. A good example of that would be a belief that a product is well engineered because it is German. Yes, there are a lot of well-engineered products from Germany, but it does not mean that all products coming form Germany are well-engineered.
So how do these affect the HiFi you may ask… For instance, when listening to a premium set of loudspeakers designed by a high-end company, you naturally expect high quality sound. This may not always be the case.
Another example is that if you know that an amp is a solid-state amp, you may expect it not to sound warm. You don’t have any evidence to support this expectation, and in fact this particular amp might have been designed to have a warm and tube-like sound.
These expectations preload you before the listening test and feed into the placebo effect previously described.
To illustrate how bad this can get, let me bring up something discovered during loudspeaker comparisons conducted by Harman. It turned out that listeners evaluating loudspeakers rate them differently when they see what loudspeaker is playing than when they do not. Why do you think that is…? Could it be that look, brand, size and our preconceived opinions affect our judgment?
Confirmation Bias & Selective Perception
Confirmation bias is the tendency for us to search for and favour information that confirms our beliefs (Nickerson, 1998). For instance, if you believe in a particular political leader, you will tend to look for news articles that present that person in a favourable light. When you combine this with selective perception, which according to Griffin (2013) is the tendency to not notice stimuli that contradicts our previous believes, you will not only look for information that confirms what you believe but also ignore information that contradicts it. This is partially the reason why in this day and age you still have people believing that the earth is flat, despite the overwhelming scientific evidence that it is not.
When you think of these two biases in the context of HiFi, you can easily see how they can preload your expectations and affect your judgment during a listening test.
I have experienced these first hand, when I have ‘fallen in love’ with a particular speaker brand. I have not only searched for the reviews that were positive about speakers that I wanted to purchase, but also paid a lot less attention to any negatives mentioned in these reviews.
Most of the biases listed above rely on visual cues. It is therefore a good idea to test equipment blind. Moreover, depending on the test framework, it may be required to conduct a double-blind test. Let me expand on that… if you are testing equipment with your friend and he is aware which piece is playing when switching it for you, you may pick up on his body language or face expressions if he has a preference for one over the other. Consequently, although you may not know yourself what is playing , the test results may be skewed by the person running the test. In a framework like this, it is advisable that neither the person running the test, nor the listener know what is playing and when.
Challenge 5 – Lack of Recording Standards
I sometimes read in reviews that a particular signer’s voice does not sound correct and I wonder how the reviewer knows that? Afterall, he or she probably only heard that voice through other speakers in their living room or though some PA systems at the concert. And this is where the problem lies… there are no standards for recording. Moreover, albums are mastered on different loudspeakers in different studios. Thus, the only way for someone to know how a particular track should sound would be to be in that studio during the final mastering.
Based on all of the above, it is rather problematic to test speakers using a recording and trying to judge which one sounds more real. For instance, if an album was mixed on a bass heavy speaker, it will most likely sound bass-shy on balanced speakers. If it was mixed on dull sounding speakers, it will most likely sound bright on neutral sounding speakers, etc. It is therefore very probable that certain tracks will sound closer to how they were intended to sound on certain speakers, and others will sound closer to how they were intended to sound on other speakers.
When comparing speakers choose a various range of tracks. Not only the best recorded audiophile albums but also other songs that may perhaps not sound as good on your current system.
If possible, try to attend unamplified concerts, as this should give you a reference point of how certain instruments are supposed to sound.
If you enjoy the music more than playing with gear, try to look for loudspeakers that make most of your recordings sound good.
Challenge 6 – Reference Point, Adaptability & Long-Term Listening
Humans are creatures of habits with an incredible ability to adapt, and this can be a double edge sword. If you are used to particular sound, you may perceive differences as negatives. For instance, if you have a set of speakers with a boosted bass and for the past 3 years you’ve been playing most recordings through these speakers, you have that sort of sound balance ‘engrained’ in your brain. If you then compare your speakers in an A/B test against speakers that have a more balanced bass response, you will most likely perceive the new speakers as not having enough bass. Consequently, your perception is heavily affected by your reference point. To illustrate this; have a look at this optical illusion below – which red circle appears larger to you?
Also, let me bring up my experience with Harbeths…. I always had relatively bright sounding speakers, which added to the atmosphere of live recordings and made well recorded albums sound very impressive. Whenever I compared my speakers to Harbeths speakers during an A/B test, I never liked Harbeths. They always appeared to sound dull and not as airy as the speaker I’ve had back then. It was only after I’ve listened to them for a week and allowed my ears to adjust to a different tonal balance, then I was able to appreciate the qualities that Harbeth speakers have to offer.
This would suggest that apart from instant switch A/B testing, it is very important to do long-term listening tests. However, this is a double-edged sword, because your ears not only adjust to a different sound, but they also adjust to shortfalls of the tested equipment. This is often the reason why manufacturers recommend the run-in period… yes, it might have a small impact on electronics, and much bigger impact on speakers (due to compliance of moving parts), but more often than not, it has the biggest impact on your ears (brain) adjusting to the new sound. Let me give you an example from my own experience to show our ability to adapt. Last year I received a compilation album from a series of concerts in Poland by various groups. I absolutely loved the songs, bur when I listened to the album for the first time, it seemed to have a bit too much low end for my taste and it didn’t seem as engaging as I was hoping for it to be. Fast forward a couple of months, and I no longer seem to notice the bass issue and I really enjoy the atmosphere of the recording. Nothing has changed in my system or my living room, the only thing that has changed was the number of times that I listened to that album. Of course, if there are some serious problems your ears will never adapt to it and they will always bother you. However, you’d be surprised what your ears can adapt to.
Don’t rely on short A/B testing only. Use long-term listening to evaluate equipment but don’t forget about the capability of your ears to adjust. Always go back to your previous equipment to validate the test results.
Challenge 7 – Reliability & Consistency of Our Hearing
If all of the challenges described so far were not enough, consider this… we all like to think of ourselves as superior beings. Yes, our ability to hear, is indeed a marvel of nature, however, similarly to all other senses, it is far from being reliable. As a matter of fact, we are really poor data recording devices. This is one of the reasons why human testimony has the lowest value in science. After all, how accurately and consistently can your eyes measure distance or how accurately and consistently your hand can measure weight?
Our perception of things is often affected not only by psychological factors already described but also by physiological factors. Have you ever found that there are times when you like a particular piece of music and other times you cannot get into it because you are ‘not in the mood’? Have you ever found that after listening loud, your ears become tired and you don’t enjoy the music as much? Unfortunately, this is the reality of our hearing – our ears are capable of amazing things but they can easily be affected by internal and external factors.
Despite what I have said above, human testimonies are indeed used in science, but only when combined with appropriate statistical measures. For example, how would a business know if HiFi enthusiasts will like their new speakers better than competitor speakers? They obviously could not test all HiFi enthusiasts… However, they could use sampling to randomly select a group of people that are a good representation of their target market. They could then run multiple double-blind A/B tests and use maths to find out how statistically significant are the results of that test. If it turned out that a majority of tested people prefer their product to the competitor’s product, due to the statistical inference (Upton and Cook, 2008), they could be confident that a majority of their target market would prefer it too. The key to this is size… the larger the sample, the greater the probability of the results being applicable to the whole population.
Before attempting to test equipment, make sure that you are in the mood to do it. If you are tired and you are having to concentrate on subtle differences, it may be really hard to spot them. You may miss things that you would have noticed if you felt better during your listening session.
We’ve established that volume is your friend with statistical measures. However, when we compare equipment at home, we are not trying to find out if other people will like it, we are trying to find out if we are going to like it. The size is still your friend but in the form of number of tests. The more tests your run, the greater the likelihood of the test result being accurate. For instance, if you just switch between the equipment once and you form a preference, it is within the realm of possibilities that your preference happened by chance. However, if you switched between the equipment 20 times, and 18 times out of 20 you prefer the same component, then you can be a lot more confident that you like this component because of what you’ve heard and not by chance.
We have established that there are a number of challenges with reliable comparisons between pieces of HiFi equipment. However, there are methods that we can implement to make these comparisons more reliable.
1. Only change one component at a time and test things in the same location.
2. Match equipment volume level during comparisons.
3. Minimise time between listening sessions, ideally use a ‘live’ switch box.
4. Do single-blind or double-blind tests to avoid biases.
5. Test using a variety of recordings that sound good and bad on your existing system.
6. Apart from A/B comparison, also use long-term listening tests.
7. Run multiple tests – repetition is your friend.
And remember – there is no one easier to fool than yourself! So, use these methods to keep yourself honest and potentially save yourself some money in the process.
Cohen, M.A., Horowitz, T.S. and Wolfe, J.M. (2009) ‘Auditory recognition memory is inferior to visual recognition memory’, Proceedings of the National Academy of Sciences of the United States of America, 106 (14) [online]. Available at: https://www.pnas.org/content/106/14/6008 (Accessed: 10th January 2021)
Colloms, M. (1997) High Performance Loudspeakers. 5th edt. John Wiley.
Everest, F. A. and Pohlmann, K.C., (2009) Master Handbook of Acoustics. 5th edt. New York, McGraw Hill Professional.
Fletcher, H. and Munson W.A. (1933) ‘Loudness, Its Definition, Measurements and Calculations‘, Journal of the Acoustical Society of America, 5, p82-108.
Greenwald, A. G. and Banaji, M. R. (1995) ‘Implicit social cognition: Attitudes, self-esteem, and stereotypes’, Psychological Review, 102 (1), p4–27.
Griffin, R.W. (2013) Fundamentals of Management. 7th edt. Cengage Learning.
Nickerson, R.S. (1998) ‘Confirmation bias: A ubiquitous phenomenon in many guises’, Review of General Psychology, 2 (2), p175–220.
Nisbett, R.E. and Wilson, T.D. (1977) ‘The halo effect: Evidence for unconscious alteration of judgments’, Journal of Personality and Social Psychology, 35 (4), p250–256.
Robinson, D.W. and Dadson, R.S. (1956) ‘A Re-Determination of the Equal-Loudness Relations for Pure Tones’, British Journal of Applied Psychology, 7, p166-181.
Steinbock, B. (1978) ‘Speciesism and the Idea of Equality’, Philosophy, 53 (204), p247–256.
Upton, G., Cook, I. (2008) Oxford Dictionary of Statistics, 3rd edt. Oxford University Press.
Written: January 2021 | Published: January 2021