How libmp3lame Calculates MP3 Frame Sizes
This article explains the mathematical formulas and algorithms used
by the libmp3lame library (the core engine behind the LAME
MP3 encoder) to determine the size of MP3 audio frames. It covers the
deterministic algebraic formulas used for Constant Bitrate (CBR)
encoding, as well as the psychoacoustic algorithms and quantization
loops used to dynamically calculate frame sizes in Variable Bitrate
(VBR) encoding.
The Standard MP3 Frame Size Formula
For Constant Bitrate (CBR) streams, libmp3lame
determines the size of an MP3 frame using a standardized mathematical
formula derived from the MPEG-1 and MPEG-2 specifications. Because MP3
audio is divided into discrete frames that represent a fixed duration of
time, the frame size in bytes is directly proportional to the bitrate
and inversely proportional to the sample rate.
For MPEG-1 Layer III (which supports sample rates of 32 kHz, 44.1 kHz, and 48 kHz), each frame contains 1,152 samples. The mathematical formula to calculate the frame size in bytes is:
\[\text{Frame Size} = \left\lfloor 144 \times \frac{\text{Bitrate}}{\text{Sample Rate}} \right\rfloor + \text{Padding}\]
For MPEG-2 and MPEG-2.5 Layer III (which support lower sample rates of 8 kHz to 24 kHz), each frame contains 576 samples. The formula adjusts accordingly:
\[\text{Frame Size} = \left\lfloor 72 \times \frac{\text{Bitrate}}{\text{Sample Rate}} \right\rfloor + \text{Padding}\]
Understanding the Variables
- Bitrate: The target bit rate in bits per second (bps).
- Sample Rate: The frequency of the audio in Hertz (Hz).
- Padding: An optional 1-byte (8-bit) slot. If the division of bitrate by sample rate does not yield an integer, LAME adds a padding byte to specific frames to maintain the overall target bitrate over time.
- The Coefficients (144 and 72): These constants are mathematically derived from the number of samples per frame divided by 8 (to convert bits to bytes). For MPEG-1: \(1152 \text{ samples} / 8 \text{ bits} = 144\).
Dynamic Frame Size Calculation in VBR Mode
When encoding in Variable Bitrate (VBR) mode, libmp3lame
cannot rely solely on a static formula. Instead, it uses a complex,
iterative algorithmic process to determine the optimal bitrate—and
consequently, the frame size—for each individual frame based on the
complexity of the audio.
1. The GPSYCHO Psychoacoustic Model
LAME analyzes the input audio signal using a psychoacoustic model called GPSYCHO (an educational and highly optimized implementation of the ISO MPEG psychoacoustic model).
- Fast Fourier Transform (FFT): The encoder applies a 1024-point FFT to the audio signal to convert it from the time domain to the frequency domain.
- Masking Thresholds: The algorithm calculates the “Signal-to-Mask Ratio” (SMR). It determines which parts of the audio are audible to the human ear and which parts are masked (rendered inaudible) by louder, adjacent frequencies.
2. The Quantization Loop Algorithm (Bit Allocation)
Once the masking thresholds are established, libmp3lame
employs a nested, two-loop search algorithm to determine how many bits
(and what size frame) are required to encode the audio without
introducing audible distortion.
- The Outer Loop (Noise Control): This loop checks if the quantization noise (the distortion introduced by compressing the audio) exceeds the masking threshold calculated by the psychoacoustic model. If the noise is too high, it requests more bits for the frame.
- The Inner Loop (Rate Control): This loop attempts to compress the Modified Discrete Cosine Transform (MDCT) coefficients of the audio using Huffman coding. It adjusts the quantizer step size to fit the data into the number of bits requested by the outer loop.
Through this iterative feedback loop, libmp3lame decides
on the minimum bitrate required to maintain the user’s desired quality
level (\(V0\) through \(V9\)). Once this optimal bitrate is
determined for the current frame, it is plugged back into the standard
MPEG frame size formula to physically write the frame.