LAME Preset Extreme vs Insane Mathematical Comparison
This article provides a direct mathematical and technical comparison
between the legacy --preset extreme and
--preset insane encoding options in the
libmp3lame library. We analyze the differences in their
target bitrates, file sizes, compression ratios, and algorithmic
allocations to show how these two high-quality MP3 presets operate under
the hood.
Bitrate and Operating Modes
The fundamental mathematical difference between
--preset extreme and --preset insane lies in
their operational modes: Variable Bitrate (VBR) versus Constant Bitrate
(CBR).
--preset insane(CBR 320 kbps): This preset forces the encoder to use a fixed bitrate of exactly 320 kilobits per second (kbps) for every frame. In LAME, this maps directly to the command line option-b 320.--preset extreme(VBR ~220-250 kbps): This preset uses a variable bitrate aiming for a target quality level. In modern LAME versions, this maps to-V 0. The bitrate fluctuates frame-by-frame depending on the complexity of the audio signal.
File Size and Bitrate Mathematics
The mathematical formula to calculate the final file size (\(S\)) of an encoded audio file in bytes is:
\[S = \frac{\text{Bitrate (bps)} \times \text{Duration (seconds)}}{8}\]
For a standard 5-minute (300 seconds) audio track, we can mathematically compare the output sizes:
1. --preset insane
(320 kbps constant)
\[S_{\text{insane}} = \frac{320,000 \text{ bps} \times 300 \text{ s}}{8} = 12,000,000 \text{ bytes} \approx 11.44 \text{ MB}\]
The file size for --preset insane is entirely
deterministic and will always be exactly the same for any audio file of
the same duration.
2.
--preset extreme (Variable, average ~240 kbps)
\[S_{\text{extreme\_avg}} = \frac{240,000 \text{ bps} \times 300 \text{ s}}{8} = 9,000,000 \text{ bytes} \approx 8.58 \text{ MB}\]
For --preset extreme, the bitrate fluctuates between 220
kbps and 250 kbps depending on complexity. * Lower Bound (220
kbps): \(S_{\text{min}} \approx 7.86
\text{ MB}\) * Upper Bound (250 kbps): \(S_{\text{max}} \approx 8.94 \text{
MB}\)
Mathematically, --preset insane produces files that are
approximately 28% to 45% larger than
--preset extreme.
Compression Ratios
Standard CD-quality audio (uncompressed PCM) has a sample rate of 44,100 Hz, 16-bit depth, and 2 channels (stereo). The uncompressed bitrate (\(B_{\text{uncompressed}}\)) is calculated as:
\[B_{\text{uncompressed}} = 44,100 \times 16 \times 2 = 1,411,200 \text{ bps} = 1411.2 \text{ kbps}\]
We can calculate the compression ratios (\(C\)) for both presets using the formula:
\[C = \frac{B_{\text{uncompressed}}}{\text{Bitrate}}\]
--preset insaneCompression Ratio: \[C_{\text{insane}} = \frac{1411.2 \text{ kbps}}{320 \text{ kbps}} = 4.41:1\] This means the uncompressed audio is compressed to exactly 22.67% of its original size.--preset extremeCompression Ratio (using a 240 kbps average): \[C_{\text{extreme}} = \frac{1411.2 \text{ kbps}}{240 \text{ kbps}} = 5.88:1\] This means the uncompressed audio is compressed to approximately 17.01% of its original size.
Algorithmic and Frame-Level Differences
MP3 audio is divided into discrete segments called frames. At a sample rate of 44.1 kHz, each MPEG-1 Layer III frame represents 1,152 audio samples, which equates to exactly \(26.12 \text{ ms}\) of audio (\(1152 / 44100 \approx 0.02612\) seconds).
The mathematical frame size in bytes (\(F\)) is determined by:
\[F = 144 \times \frac{\text{Bitrate}}{\text{Sample Rate}} + \text{Padding}\]
Frame Allocation in
--preset insane
Because the bitrate is fixed at 320,000 bps, the frame size remains mathematically constant:
\[F_{\text{insane}} = 144 \times \frac{320,000}{44,100} \approx 1044.89 \text{ bytes}\]
With padding adjustments, frames alternate between 1044 and 1045 bytes to maintain the exact 320 kbps average. The encoder must use this exact space for every 26.12 ms block of audio, regardless of whether the audio is a complex orchestral climax or absolute silence.
Frame Allocation in
--preset extreme
Under --preset extreme, the frame size \(F\) varies dynamically. LAME uses a
psychoacoustic model to calculate the Mask-to-Noise Ratio (MNR). If a
26.12 ms audio frame is silent or simple, LAME may allocate a frame size
corresponding to 128 kbps:
\[F_{\text{simple}} = 144 \times \frac{128,000}{44,100} \approx 417.95 \text{ bytes}\]
If the frame is highly complex with transient signals, LAME will dynamically scale up to the maximum frame size of 320 kbps (1044 bytes). This prevents the mathematical waste of bits on simple data, allocating higher bit depths only where the human ear would perceive quantization noise.