libmp3lame Low Bitrate Encoding Under 64 kbps
This article explores how the libmp3lame encoder
processes audio at extremely low bitrates below 64 kbps. It examines the
specific optimization techniques used by the encoder—such as automatic
sample rate reduction, aggressive lowpass filtering, channel downmixing,
and modified psychoacoustic masking—to maintain the best possible audio
quality within severe bandwidth constraints.
Sample Rate Reduction and Downsampling
When encoding audio below 64 kbps, libmp3lame
automatically downsamples the input signal. High sample rates like 44.1
kHz require too many bits to encode accurately at low bitrates. To
resolve this, LAME reduces the sample rate to 24 kHz, 22.05 kHz, 16 kHz,
or even lower. This reduction halves the frequency spectrum, drastically
decreasing the amount of data the encoder needs to analyze and compress,
resulting in a cleaner output for the remaining frequencies.
Aggressive Lowpass Filtering
To prevent harsh digital distortion and “swirling” artifacts, LAME applies a strict lowpass filter at low bitrates. At bitrates under 64 kbps, the encoder cuts off high frequencies, typically setting the lowpass threshold between 8 kHz and 12 kHz. Removing these high-frequency components allows the encoder to dedicate its limited bit budget to the mid and low frequencies, where the human ear is most sensitive.
Channel Downmixing and Joint Stereo
Encoding two independent stereo channels at low bitrates results in
poor quality for both channels due to data starvation.
libmp3lame addresses this by utilizing Joint Stereo
(specifically Mid/Side stereo) encoding, which shares redundant
information between the left and right channels. At extremely low
bitrates (such as 32 kbps or below), the encoder often downmixes the
audio to pure mono. Consolidating the signal into a single channel
doubles the available bits per channel, significantly improving clarity
and reducing compression artifacts.
Psychoacoustic Tuning and Masking Thresholds
The LAME psychoacoustic model adapts its algorithms when target bitrates drop below 64 kbps. It aggressively adjusts the Absolute Threshold of Hearing (ATH) and increases frequency masking thresholds. This means the encoder deliberately discards subtle, quieter sounds that are adjacent to louder frequencies. While this results in a loss of acoustic detail, it prevents audible compression artifacts like “pre-echo” and phase cancellation, prioritizing the most perceptually important parts of the audio.
Bit Reservoir Constraints
While LAME normally uses a “bit reservoir” to borrow unused bits from simple audio passages to use during complex ones, this reservoir is quickly depleted at sub-64 kbps bitrates. Because there are rarely any “surplus” bits, LAME relies on coarser quantization (reducing the precision of the audio samples) and groups scale factor bands together to fit the strict target bitrate constraint.