How libmp3lame Calculates MP3 Frame Sizes

This article explains the mathematical formulas and algorithms used by the libmp3lame library (the core engine behind the LAME MP3 encoder) to determine the size of MP3 audio frames. It covers the deterministic algebraic formulas used for Constant Bitrate (CBR) encoding, as well as the psychoacoustic algorithms and quantization loops used to dynamically calculate frame sizes in Variable Bitrate (VBR) encoding.

The Standard MP3 Frame Size Formula

For Constant Bitrate (CBR) streams, libmp3lame determines the size of an MP3 frame using a standardized mathematical formula derived from the MPEG-1 and MPEG-2 specifications. Because MP3 audio is divided into discrete frames that represent a fixed duration of time, the frame size in bytes is directly proportional to the bitrate and inversely proportional to the sample rate.

For MPEG-1 Layer III (which supports sample rates of 32 kHz, 44.1 kHz, and 48 kHz), each frame contains 1,152 samples. The mathematical formula to calculate the frame size in bytes is:

\[\text{Frame Size} = \left\lfloor 144 \times \frac{\text{Bitrate}}{\text{Sample Rate}} \right\rfloor + \text{Padding}\]

For MPEG-2 and MPEG-2.5 Layer III (which support lower sample rates of 8 kHz to 24 kHz), each frame contains 576 samples. The formula adjusts accordingly:

\[\text{Frame Size} = \left\lfloor 72 \times \frac{\text{Bitrate}}{\text{Sample Rate}} \right\rfloor + \text{Padding}\]

Understanding the Variables


Dynamic Frame Size Calculation in VBR Mode

When encoding in Variable Bitrate (VBR) mode, libmp3lame cannot rely solely on a static formula. Instead, it uses a complex, iterative algorithmic process to determine the optimal bitrate—and consequently, the frame size—for each individual frame based on the complexity of the audio.

1. The GPSYCHO Psychoacoustic Model

LAME analyzes the input audio signal using a psychoacoustic model called GPSYCHO (an educational and highly optimized implementation of the ISO MPEG psychoacoustic model).

2. The Quantization Loop Algorithm (Bit Allocation)

Once the masking thresholds are established, libmp3lame employs a nested, two-loop search algorithm to determine how many bits (and what size frame) are required to encode the audio without introducing audible distortion.

Through this iterative feedback loop, libmp3lame decides on the minimum bitrate required to maintain the user’s desired quality level (\(V0\) through \(V9\)). Once this optimal bitrate is determined for the current frame, it is plugged back into the standard MPEG frame size formula to physically write the frame.