VLevel Technical Info

Code Layout

The core of VLevel is the VolumeLeveler class. It contains the look-ahead buffer. It works on blocks of double-precision numbers, called values (value_t). Some number (channels) of values makes up a sample. Many of VolumeLeveler's functions take lengths in samples instead of values, so be careful.

vlevel-bin and vlevel-ladspa.so drive the VolumeLeveler class. vlevel-bin also uses the CommandLine class to parse it's options. CommandLine is interesting in it's own right, and pretty self-explanatory.

Soon, there will be a Ruby script using SOX that makes Ogg, FLAC, and wav files work with a nice drag-n-drop GUI using FXRuby.

General Idea

The intent is to make the quiet parts louder. This is done by computing the volume at each point, then scaling it as described in the Math section. The complex part is finding out what the volume is at each point (VolumeLeveler::avg_amp). It must vary smoothly so the volume changes aren't obvious, but it must always be above the peaks of the waveform, or clipping could occur.

To do this, there is a fifo buffer. Imagine a graph of position vs. amplitude showing the contents of the buffer. A horizontal line is drawn across it, representing VolumeLeveler::avg_amp. From where avg_amp crosses the y-axis, a line is drawn with the maximum possible slope to intercept one of the amplitude points. This line which is as steep as possible, is the smooth line that avg_amp will follow.

When the value at the head of the buffer is removed, it is scaled based on avg_amp. avg_amp is then incremented by the slope of the line. If we reach the point the line was drawn to (max_slope_pos), we search the fifo for the next point of highest slope. Otherwise, we only need to check the incoming sample to see if a line drawn to it has the highest slope.

    y                       y (a few samples later)

    ^       ^               ^
    |      / max_slope      |
    |     /                 |
    |    /s                 |s\---------- avg_amp
    |   / s                 |s \
    |  /  s                 |s  \ max_slope
    | /   s s               |s s \
    |--s-ss-s----avg_amp    |s s  \
    | ss ss s s             |s s ss
    |ssssssssss             |ssssss
    +------------> x        +---------> x

Sorry for the ASCII art. The result is that the average amplitude (avg_amp) varies slowly and always stays above the amplitude of each sample. When the samples are removed, they are scaled based on the next section.

Math

Once we have avg_amp, each sample is scaled when it is output according to this:

output_sample = sample * avg_amp ^ (-strength)

This is derived as follows:

First, we convert the amplitude of avg_amp to decibels (1 = 0dB):

avg_amp_db = 10 * log10(avg_amp)

avg_amp_db is less than zero. We want to scale it to be closer to zero, in such a way that if strength is 1, it will become zero, and if strength is 0 it will remain unchanged.

ideal_amp_db = avg_amp_db * (1 - strength)
ideal_amp_db = 10 * log10(avg_amp) * (1 - strength)

Now we convert back to samples:

ideal_amp = 10 ^ (ideal_amp_db / 10)
ideal_amp = 10 ^ (log10(avg_amp) * (1 - strength))
ideal_amp = (10 ^ log10(avg_amp)) ^ (1 - strength)
ideal_amp = avg_amp ^ (1 - strength)

Now we find out what we should multiply the samples by to change their peak amplitude, avg_amp, to their ideal peak amplitude, ideal_amp:

multiplier = ideal_amp / avg_amp multiplier = avg_amp ^ (1 - strength) / avg_amp multiplier = avg_amp ^ (-strength)

And finally, we multiply the sample by the multiplier:

output_sample = sample * multiplier output_sample = sample * avg_amp ^ (-strength)

Undoing the effect

If the original values for strength weren't too close to 1, you can undo the VLevel by giving the undo option. It works by changing strength as shown below.

When we first leveled, we scaled the amplitudes like so:

ideal_amp_db = avg_amp_db * (1 - strength)

To get that back, we solve for avg_amp_db:

avg_amp_db = ideal_amp_db * 1 / (1 - strength)

In this pass, however, the original avg_amp_db becomes ideal_amp_db, and the original (1 - strength) becomes 1 / (1 - strength). Now we skip ahead a bit:

multiplier = avg_amp ^ (1 - strength) / avg_amp

Substituting as explained above and continuing:

multiplier = avg_amp ^ (1 / (1 - strength)) / avg_amp multiplier = avg_amp ^ ((1 / (1 - strength)) - 1) multiplier = avg_amp ^ (strength / (1 - strength))

But how do we get VLevel to do this? Well, we can give it any strength, and it does this:

multiplier = avg_amp ^ -strength

And we want it to do this:

multiplier = avg_amp ^ (undo_strength / (1 - undo_strength))

So...

-strength = undo_strength / (1 - undo_strength) strength = undo_strength / (undo_strength - 1)

By choosing strength as above before starting VLevel, we can then undo the first VLevel, with no change to the main algorithm.

To be totally precise, we'd also have to make a min_multiplier with a value of 1 / orig_max_multiplier, but that would be slow, and does anybody care if we drop the static anyway?

It's not perfect, probably because avg_amp moves linearly, not logarithmically, so there are some rounding errors. Someday I might try changing that, but it's a big change.