How to Differentiate Lossless From Lossy Music

zeus · October 19, 2024, 6:40pm

I originally wasn’t going to triple post, but then this got huge so I did.

Here’s three different graphs of the same track in different qualities so you can visualize spectral thinning.

The first one is the most awful sounding one of the bunch. 128 CBR, but it doesn’t sound like it.

This is pretty egregious, about as bad as it can get, and in order to understand why you have to look at what a proper 128 CBR rip is supposed to look like.

You see how it’s a more solid cut at 16kHZ and there’s a lot more blue compared to green in the upper kHz ranges (for MP3 128 CBR)? Yeah, that matters. You can actually hear that difference. Trust me. That color corresponds to how loud a sound is. Green means it’s much louder, which is bad. Loud is not good. Loud strangles dynamics and makes everything the same level, so a quiet whisper is the same volume as a gravity blast. You want dynamics. The second rip, while still 128, is leagues better than that first rip. You see that bar around 16kHz in the first image? Yeah, that’s a whole lot of noise. We’ll get into that later…

And if you’re a Dir en grey fan, you should know exactly what song this is. You probably have the shit rip. It’s the easiest one to find.

Here is what the FLAC version looks like. Is it to your expectations?

Compare the good 128 rip to this. Look at how much information in the upper registers is literally gone. Then look at how the gradients of blue and green around the 12-16kHz range are not the same. Similar, but not the same. This is the result of the encoder, and this is probably reflecting some smearing and pre-echo. All that blue and purple is where the cymbals and harmonic information live. You may not be able to “hear” it, but it’s more of a sense thing.

And to switch tracks since Karma doesn’t come in any quality higher than above, here’s a 24/96 copy of Asrun Dream by Gackt.

What’s that above 35kHz? Ultrasonic frequencies! Ultrasonic frequencies are inaudible to humans. In clinical settings, lower frequencies have greater depth of penetration into the body, while higher frequencies have greater resolution but limited depth of penetration. These frequencies should be removed by your DAC. The DSD format that I discussed about a week ago uses noise shaping to move quantization noise to these ultrasonic frequencies, which allows for an extended frequency response and a wide dynamic range.

What is quantization noise? When you take an analog signal, convert it to digital, and then convert back to analog, the final signal is not the same as the original signal. The difference is quantization error, which comes out as noise.

This pic I stole from Wikipedia visualizes this. Above is the original analog signal (green), the quantized signal (black dots), the signal reconstructed from the quantized signal (yellow) and the difference between the original signal and the reconstructed signal (red), which is the quantization error.

What is dithering? Dither is an intentionally applied form of noise used to randomize quantization error, preventing large-scale patterns such as color banding in images. Dither is routinely used in processing of both digital audio and video data, and is often one of the last stages of mastering audio to a CD.

What is noise shaping? Noise shaping is a digital audio processing technique, usually in combination with dithering, which is used to increase the apparent signal-to-noise ratio of the final product. This is done by altering the spectral shape of the error that is introduced by dithering and quantization, to push it up into that ultrasonic range so it’s not audible!

Obviously, the result of all of this is low quantization noise and low distortion in the audible bandwidth necessary for high resolution audio. Single rate DSD64 with one-bit sampling can deliver a dynamic range of 120 dB from 20 Hz to 20 kHz and an extended frequency response up to 100 kHz, even though most recent SACD players specify an upper limit of 80 to 90 kHz.

DSD is not without issues. One is that it’s huge. Another is that DSD creates a tremendous amount of noise. So much, in fact, that Sony/Phillips have created a noise-shaping system designed solely for the purpose of disguising the inherent noise in a DSD signal. Like I said above, the noise created by DSD’s one-bit sampling is shifted out of the lower frequencies, and shoved up into the ultrasonic range, making the noise “inaudible.” But not all of the noise is shifted all the time, and lower frequencies can still contain noise.

“Noesis” by Gackt is a wonderful example of how this noise isn’t always removed. There’s noise around 2:10 and 5:09 (but oddly enough, not 3:47) that I’ve never heard rendered that way before; it’s present in the PCM file as well but it’s not as noticeable. I’m gonna have to tag in golden ears @Aeolus so he can tell you if he hears it as well.

You can’t actually see this imperfection in the spectrograph, but you can see how there’s some noise in the ultrasonic range at 45kHz. I drew a little arrow next to it, so you can see it. See why the band of noise at 16kHz in the very first graph was no good? This is where it’s supposed to live!

Any imperfections in a DSD signal are time and amplitude imperfections. If one were to zoom in on a DSD signal, those amplitude fluctuations would be visible. This would imply that DSD is incapable of reproducing the same transient twice due to the time domain errors caused by the sampling. So DSD’s one-bit sampling is not better or worse than PCM (pulse code modulation), just different.

Also remember, any of these imperfections are so slight as to be imperceptible to the ear. You literally have to analyze waves to get this far into it. Me? I turn all of the DSD files I have into 24/96kHz PCM ALAC files, since I literally can’t tell the difference. The way I see it, taking DSD files and converting them down into PCM is one step above taking FLAC files and crunching them down into MP3! It’s as close as I can get to studio masters.

Put another way, if DSD sampling is represented as “squares”, and PCM sampling is represented as “circles”, and the process of converting from DSD to PCM is converting these “squares” into “circles”, then the conversation isn’t about “squares versus circles”, but about how to draw a better circle.

With the example that I started with, you can literally hear (and see) how starting from an FLAC file and converting down to 128 VBR sounds way better than whatever trash the original rip was, even though on the surface they’re both “128 CBR”. It’s not the same. And like I said, this conversation is just one step up. Starting with DSD and converting down to an FLAC container will yield better results than starting with the FLAC off the CD, because one is limited to 16 bits and the other can go as high as 24.

And since I have you here, there is a thing called 32 bit audio. You literally cannot hear the difference between 24-bit and 32-bit. Why care? It’s more if you’re an audio engineer, or a musician recording your own takes, because then you want to work with all the quality and mix down to a reasonable file size when you’re done. Anyone insisting they can hear differences in 32-bit files are full of shit. Watch how they’re suddenly allergic to A/B/X testing…

I should take this to it’s logical conclusion. Here’s a 24/192 file, courtesy of Bob Dylan.

All that green is the quantization noise! The purpose of keeping all of that extra noise is that you’re supposed to put all the artifacts that you don’t care about in this range, and then have your DAC filter it out, so you’re left with music. This is why even though I have the picture above implying that my playback is 32/96, I actually settled on 24/48. Literally anything above 48kHz is noise that I don’t want to hear (like intermodulation distortion, or IMD), and if I set the playback rate as I had it above, then those ultrasonic noises will be played back. Frequencies higher than 20kHz combine to produce lower frequencies within our hearing range that add to the high end frequency timbre of music.

There’s reason to keep it in the file, but I don’t wanna hear it!

And since it’s legitimately impossible for me to find anything higher quality than that…I had to go make it! So here’s 32/192 courtesy of my guitar (if you’re curious it’s a cover of Constance by Spiritbox)! It’s also 317MB

Peep the noise all the way at the top. Noise, well into the ultrasonic region that’s not doing anything for anyone. Where did it come from? Turns out, it’s the effect of pitch shifting a guitar into the bass range! Pitch shifters can make ultrasonic sound audible by replacing the high frequencies that are lowered with ultrasonic frequencies

Why doesn’t it stretch all the way up like the other graphs? Because this is literally just my guitar, which has a limited range it works in. I also didn’t do any fancy post-processing, so there’s just nothing in those ranges until you get up to around 48kHz, where there’s some ultrasonic noise.

So yeah, I really do think anything above 24/48 is a waste and possibly produces worse sound.

EDIT: There’s one small detail I forgot to add about DSD above. DSD features multichannel sound, which means that if you have a surround sound system, you can enjoy true multichannel playback of your favorite songs. When you convert from DSD to PCM, you also have to resample the five channels down to two, or you won’t hear anything. This could be another interesting reason to keep DSD files of your favorite releases.