How libmp3lame Decides Which Frequencies to Discard

This article explains the inner workings of libmp3lame, the core engine of the LAME MP3 encoder, focusing on how it dynamically discards audio frequencies during compression. By utilizing a highly sophisticated psychoacoustic model, libmp3lame analyzes incoming audio signals to identify and remove frequencies that the human ear cannot perceive. This process allows the encoder to significantly reduce file size while maintaining high perceived audio quality.

The Psychoacoustic Model (GPSYCHO)

At the heart of libmp3lame is its psychoacoustic model, known as GPSYCHO. This model simulates how the human brain and auditory system process sound. Instead of treating all audio frequencies equally, libmp3lame uses GPSYCHO to analyze the audio spectrum in real-time and calculate a “masking threshold.” Any frequency component that falls below this threshold is deemed inaudible and is dynamically discarded or heavily compressed.

Absolute Threshold of Hearing

The first filter applied by the encoder is the absolute threshold of hearing. The human ear is naturally insensitive to extremely low and extremely high frequencies, especially at low volumes. libmp3lame maps the input signal against a standardized curve of human hearing limits. Any audio frequencies that fall below this baseline curve of quietness are immediately discarded because a human listener would not be able to hear them anyway.

Simultaneous Masking (Frequency Masking)

Simultaneous masking occurs when a loud sound drowns out a quieter sound occurring at the same time. This is a primary tool for libmp3lame to discard unnecessary frequencies:

Tone-Masking-Noise: A pure tone (like a flute) will mask noise at nearby frequencies.
Noise-Masking-Tone: A noisy sound (like a cymbal crash) will mask pure tones close to it in frequency.

The encoder divides the audio signal into critical frequency bands. If a dominant, loud frequency is present in a band, libmp3lame calculates a masking curve around it. Any quieter frequencies residing within this curve are discarded, as the louder sound physically prevents the human brain from perceiving them.

Temporal Masking (Time-Domain Masking)

Human hearing does not instantly reset after hearing a sound. libmp3lame exploits this limitation using temporal masking, which occurs in two ways:

Forward Masking: After a loud sound stops, the ear remains desensitized for up to 100–200 milliseconds. libmp3lame dynamically discards quieter frequencies that immediately follow a loud transient (like a drum hit).
Backward Masking: For a tiny window of about 5–20 milliseconds before a loud sound occurs, the brain is distracted by the upcoming impulse. The encoder discards quiet signals immediately preceding a loud transient.

MDCT and Bit Allocation (Quantization)

To actually remove the frequencies, libmp3lame converts the audio from the time domain to the frequency domain using the Modified Discrete Cosine Transform (MDCT).

Once the frequency spectrum is mapped, the encoder applies the masking threshold calculated by GPSYCHO. During the quantization phase (where audio data is converted to digital bits), the encoder allocates bits based on necessity. If a frequency band’s energy is below the masking threshold, the encoder allocates zero bits to it. In digital audio, allocating zero bits to a frequency effectively discards it from the final MP3 file, resulting in highly optimized compression.