LAME Preset Extreme vs Insane Mathematical Comparison

This article provides a direct mathematical and technical comparison between the legacy --preset extreme and --preset insane encoding options in the libmp3lame library. We analyze the differences in their target bitrates, file sizes, compression ratios, and algorithmic allocations to show how these two high-quality MP3 presets operate under the hood.

Bitrate and Operating Modes

The fundamental mathematical difference between --preset extreme and --preset insane lies in their operational modes: Variable Bitrate (VBR) versus Constant Bitrate (CBR).

File Size and Bitrate Mathematics

The mathematical formula to calculate the final file size (\(S\)) of an encoded audio file in bytes is:

\[S = \frac{\text{Bitrate (bps)} \times \text{Duration (seconds)}}{8}\]

For a standard 5-minute (300 seconds) audio track, we can mathematically compare the output sizes:

1. --preset insane (320 kbps constant)

\[S_{\text{insane}} = \frac{320,000 \text{ bps} \times 300 \text{ s}}{8} = 12,000,000 \text{ bytes} \approx 11.44 \text{ MB}\]

The file size for --preset insane is entirely deterministic and will always be exactly the same for any audio file of the same duration.

2. --preset extreme (Variable, average ~240 kbps)

\[S_{\text{extreme\_avg}} = \frac{240,000 \text{ bps} \times 300 \text{ s}}{8} = 9,000,000 \text{ bytes} \approx 8.58 \text{ MB}\]

For --preset extreme, the bitrate fluctuates between 220 kbps and 250 kbps depending on complexity. * Lower Bound (220 kbps): \(S_{\text{min}} \approx 7.86 \text{ MB}\) * Upper Bound (250 kbps): \(S_{\text{max}} \approx 8.94 \text{ MB}\)

Mathematically, --preset insane produces files that are approximately 28% to 45% larger than --preset extreme.

Compression Ratios

Standard CD-quality audio (uncompressed PCM) has a sample rate of 44,100 Hz, 16-bit depth, and 2 channels (stereo). The uncompressed bitrate (\(B_{\text{uncompressed}}\)) is calculated as:

\[B_{\text{uncompressed}} = 44,100 \times 16 \times 2 = 1,411,200 \text{ bps} = 1411.2 \text{ kbps}\]

We can calculate the compression ratios (\(C\)) for both presets using the formula:

\[C = \frac{B_{\text{uncompressed}}}{\text{Bitrate}}\]

Algorithmic and Frame-Level Differences

MP3 audio is divided into discrete segments called frames. At a sample rate of 44.1 kHz, each MPEG-1 Layer III frame represents 1,152 audio samples, which equates to exactly \(26.12 \text{ ms}\) of audio (\(1152 / 44100 \approx 0.02612\) seconds).

The mathematical frame size in bytes (\(F\)) is determined by:

\[F = 144 \times \frac{\text{Bitrate}}{\text{Sample Rate}} + \text{Padding}\]

Frame Allocation in --preset insane

Because the bitrate is fixed at 320,000 bps, the frame size remains mathematically constant:

\[F_{\text{insane}} = 144 \times \frac{320,000}{44,100} \approx 1044.89 \text{ bytes}\]

With padding adjustments, frames alternate between 1044 and 1045 bytes to maintain the exact 320 kbps average. The encoder must use this exact space for every 26.12 ms block of audio, regardless of whether the audio is a complex orchestral climax or absolute silence.

Frame Allocation in --preset extreme

Under --preset extreme, the frame size \(F\) varies dynamically. LAME uses a psychoacoustic model to calculate the Mask-to-Noise Ratio (MNR). If a 26.12 ms audio frame is silent or simple, LAME may allocate a frame size corresponding to 128 kbps:

\[F_{\text{simple}} = 144 \times \frac{128,000}{44,100} \approx 417.95 \text{ bytes}\]

If the frame is highly complex with transient signals, LAME will dynamically scale up to the maximum frame size of 320 kbps (1044 bytes). This prevents the mathematical waste of bits on simple data, allocating higher bit depths only where the human ear would perceive quantization noise.