Table of Contents

This document describes filters, sources, and sinks provided by the libavfilter library.

Filtering in FFmpeg is enabled through the libavfilter library.

In libavfilter, a filter can have multiple inputs and multiple outputs. To illustrate the sorts of things that are possible, we consider the following filtergraph.

[main] input --> split ---------------------> overlay --> output | ^ |[tmp] [flip]| +-----> crop --> vflip -------+

This filtergraph splits the input stream in two streams, then sends one stream through the crop filter and the vflip filter, before merging it back with the other stream by overlaying it on top. You can use the following command to achieve this:

ffmpeg -i INPUT -vf "split [main][tmp]; [tmp] crop=iw:ih/2:0:0, vflip [flip]; [main][flip] overlay=0:H/2" OUTPUT

The result will be that the top half of the video is mirrored onto the bottom half of the output video.

Filters in the same linear chain are separated by commas, and distinct linear chains of filters are separated by semicolons. In our example, crop,vflip are in one linear chain, split and overlay are separately in another. The points where the linear chains join are labelled by names enclosed in square brackets. In the example, the split filter generates two outputs that are associated to the labels [main] and [tmp] .

The stream sent to the second output of split , labelled as [tmp] , is processed through the crop filter, which crops away the lower half part of the video, and then vertically flipped. The overlay filter takes in input the first unchanged output of the split filter (which was labelled as [main] ), and overlay on its lower half the output generated by the crop,vflip filterchain.

Some filters take in input a list of parameters: they are specified after the filter name and an equal sign, and are separated from each other by a colon.

There exist so-called source filters that do not have an audio/video input, and sink filters that will not have audio/video output.

The graph2dot program included in the FFmpeg tools directory can be used to parse a filtergraph description and issue a corresponding textual representation in the dot language.

Invoke the command:

graph2dot -h

to see how to use graph2dot .

You can then pass the dot description to the dot program (from the graphviz suite of programs) and obtain a graphical representation of the filtergraph.

For example the sequence of commands:

echo GRAPH_DESCRIPTION | \ tools/graph2dot -o graph.tmp && \ dot -Tpng graph.tmp -o graph.png && \ display graph.png

can be used to create and display an image representing the graph described by the GRAPH_DESCRIPTION string. Note that this string must be a complete self-contained graph, with its inputs and outputs explicitly defined. For example if your command line is of the form:

ffmpeg -i infile -vf scale=640:360 outfile

your GRAPH_DESCRIPTION string will need to be of the form:

nullsrc,scale=640:360,nullsink

you may also need to set the nullsrc parameters and add a format filter in order to simulate a specific input file.

A filtergraph is a directed graph of connected filters. It can contain cycles, and there can be multiple links between a pair of filters. Each link has one input pad on one side connecting it to one filter from which it takes its input, and one output pad on the other side connecting it to one filter accepting its output.

Each filter in a filtergraph is an instance of a filter class registered in the application, which defines the features and the number of input and output pads of the filter.

A filter with no input pads is called a "source", and a filter with no output pads is called a "sink".

A filtergraph has a textual representation, which is recognized by the -filter / -vf / -af and -filter_complex options in ffmpeg and -vf / -af in ffplay , and by the avfilter_graph_parse_ptr() function defined in libavfilter/avfilter.h .

A filterchain consists of a sequence of connected filters, each one connected to the previous one in the sequence. A filterchain is represented by a list of ","-separated filter descriptions.

A filtergraph consists of a sequence of filterchains. A sequence of filterchains is represented by a list of ";"-separated filterchain descriptions.

A filter is represented by a string of the form: [ in_link_1 ]...[ in_link_N ] filter_name @ id = arguments [ out_link_1 ]...[ out_link_M ]

filter_name is the name of the filter class of which the described filter is an instance of, and has to be the name of one of the filter classes registered in the program optionally followed by "@ id ". The name of the filter class is optionally followed by a string "= arguments ".

arguments is a string which contains the parameters used to initialize the filter instance. It may have one of two forms:

A ’:’-separated list of key=value pairs.

pairs. A ’:’-separated list of value . In this case, the keys are assumed to be the option names in the order they are declared. E.g. the fade filter declares three options in this order – type , start_frame and nb_frames . Then the parameter list in:0:30 means that the value in is assigned to the option type , 0 to start_frame and 30 to nb_frames .

. In this case, the keys are assumed to be the option names in the order they are declared. E.g. the filter declares three options in this order – , and . Then the parameter list means that the value is assigned to the option , to and to . A ’:’-separated list of mixed direct value and long key=value pairs. The direct value must precede the key=value pairs, and follow the same constraints order of the previous point. The following key=value pairs can be set in any preferred order.

If the option value itself is a list of items (e.g. the format filter takes a list of pixel formats), the items in the list are usually separated by ‘ | ’.

The list of arguments can be quoted using the character ‘ ' ’ as initial and ending mark, and the character ‘ \ ’ for escaping the characters within the quoted text; otherwise the argument string is considered terminated when the next special character (belonging to the set ‘ []=;, ’) is encountered.

The name and arguments of the filter are optionally preceded and followed by a list of link labels. A link label allows one to name a link and associate it to a filter output or input pad. The preceding labels in_link_1 ... in_link_N , are associated to the filter input pads, the following labels out_link_1 ... out_link_M , are associated to the output pads.

When two link labels with the same name are found in the filtergraph, a link between the corresponding input and output pad is created.

If an output pad is not labelled, it is linked by default to the first unlabelled input pad of the next filter in the filterchain. For example in the filterchain

nullsrc, split[L1], [L2]overlay, nullsink

the split filter instance has two output pads, and the overlay filter instance two input pads. The first output pad of split is labelled "L1", the first input pad of overlay is labelled "L2", and the second output pad of split is linked to the second input pad of overlay, which are both unlabelled.

In a filter description, if the input label of the first filter is not specified, "in" is assumed; if the output label of the last filter is not specified, "out" is assumed.

In a complete filterchain all the unlabelled filter input and output pads must be connected. A filtergraph is considered valid if all the filter input and output pads of all the filterchains are connected.

Libavfilter will automatically insert scale filters where format conversion is required. It is possible to specify swscale flags for those automatically inserted scalers by prepending sws_flags= flags ; to the filtergraph description.

Here is a BNF description of the filtergraph syntax:

NAME ::= sequence of alphanumeric characters and '_' FILTER_NAME ::= NAME ["@" NAME ] LINKLABEL ::= "[" NAME "]" LINKLABELS ::= LINKLABEL [ LINKLABELS ] FILTER_ARGUMENTS ::= sequence of chars (possibly quoted) FILTER ::= [ LINKLABELS ] FILTER_NAME ["=" FILTER_ARGUMENTS ] [ LINKLABELS ] FILTERCHAIN ::= FILTER [, FILTERCHAIN ] FILTERGRAPH ::= [sws_flags= flags ;] FILTERCHAIN [; FILTERGRAPH ]

Filtergraph description composition entails several levels of escaping. See (ffmpeg-utils)the "Quoting and escaping" section in the ffmpeg-utils(1) manual for more information about the employed escaping procedure.

A first level escaping affects the content of each filter option value, which may contain the special character : used to separate values, or one of the escaping characters \' .

A second level escaping affects the whole filter description, which may contain the escaping characters \' or the special characters [],; used by the filtergraph description.

Finally, when you specify a filtergraph on a shell commandline, you need to perform a third level escaping for the shell special characters contained within it.

For example, consider the following string to be embedded in the drawtext filter description text value:

this is a 'string': may contain one, or more, special characters

This string contains the ' special escaping character, and the : special character, so it needs to be escaped in this way:

text=this is a \'string\'\: may contain one, or more, special characters

A second level of escaping is required when embedding the filter description in a filtergraph description, in order to escape all the filtergraph special characters. Thus the example above becomes:

drawtext=text=this is a \\\'string\\\'\\: may contain one\, or more\, special characters

(note that in addition to the \' escaping special characters, also , needs to be escaped).

Finally an additional level of escaping is needed when writing the filtergraph description in a shell command, which depends on the escaping rules of the adopted shell. For example, assuming that \ is special and needs to be escaped with another \ , the previous string will finally result in:

-vf "drawtext=text=this is a \\\\\\'string\\\\\\'\\\\: may contain one\\, or more\\, special characters"

Some filters support a generic enable option. For the filters supporting timeline editing, this option can be set to an expression which is evaluated before sending a frame to the filter. If the evaluation is non-zero, the filter will be enabled, otherwise the frame will be sent unchanged to the next filter in the filtergraph.

The expression accepts the following values:

‘ t ’ timestamp expressed in seconds, NAN if the input timestamp is unknown ‘ n ’ sequential number of the input frame, starting from 0 ‘ pos ’ the position in the file of the input frame, NAN if unknown ‘ w ’ ‘ h ’ width and height of the input frame if video

Additionally, these filters support an enable command that can be used to re-define the expression.

Like any other filtering option, the enable option follows the same rules.

For example, to enable a blur filter (smartblur) from 10 seconds to 3 minutes, and a curves filter starting at 3 seconds:

smartblur = enable='between(t,10,3*60)', curves = enable='gte(t,3)' : preset=cross_process

See ffmpeg -filters to view which filters have timeline support.

Some options can be changed during the operation of the filter using a command. These options are marked ’T’ on the output of ffmpeg -h filter=<name of filter> . The name of the command is the name of the option and the argument is the new value.

Some filters with several inputs support a common set of options. These options can only be set by name, not with the short notation.

eof_action The action to take when EOF is encountered on the secondary input; it accepts one of the following values: repeat Repeat the last frame (the default). endall End both streams. pass Pass the main input through. shortest If set to 1, force the output to terminate when the shortest input terminates. Default value is 0. repeatlast If set to 1, force the filter to extend the last frame of secondary streams until the end of the primary stream. A value of 0 disables this behavior. Default value is 1.

When you configure your FFmpeg build, you can disable any of the existing filters using --disable-filters . The configure output will show the audio filters included in your build.

Below is a description of the currently available audio filters.

A compressor is mainly used to reduce the dynamic range of a signal. Especially modern music is mostly compressed at a high ratio to improve the overall loudness. It’s done to get the highest attention of a listener, "fatten" the sound and bring more "power" to the track. If a signal is compressed too much it may sound dull or "dead" afterwards or it may start to "pump" (which could be a powerful effect but can also destroy a track completely). The right compression is the key to reach a professional sound and is the high art of mixing and mastering. Because of its complex settings it may take a long time to get the right feeling for this kind of effect.

Compression is done by detecting the volume above a chosen level threshold and dividing it by the factor set with ratio . So if you set the threshold to -12dB and your signal reaches -6dB a ratio of 2:1 will result in a signal at -9dB. Because an exact manipulation of the signal would cause distortion of the waveform the reduction can be levelled over the time. This is done by setting "Attack" and "Release". attack determines how long the signal has to rise above the threshold before any reduction will occur and release sets the time the signal has to fall below the threshold to reduce the reduction again. Shorter signals than the chosen attack time will be left untouched. The overall reduction of the signal can be made up afterwards with the makeup setting. So compressing the peaks of a signal about 6dB and raising the makeup to this level results in a signal twice as loud than the source. To gain a softer entry in the compression the knee flattens the hard edge at the threshold in the range of the chosen decibels.

The filter accepts the following options:

level_in Set input gain. Default is 1. Range is between 0.015625 and 64. mode Set mode of compressor operation. Can be upward or downward . Default is downward . threshold If a signal of stream rises above this level it will affect the gain reduction. By default it is 0.125. Range is between 0.00097563 and 1. ratio Set a ratio by which the signal is reduced. 1:2 means that if the level rose 4dB above the threshold, it will be only 2dB above after the reduction. Default is 2. Range is between 1 and 20. attack Amount of milliseconds the signal has to rise above the threshold before gain reduction starts. Default is 20. Range is between 0.01 and 2000. release Amount of milliseconds the signal has to fall below the threshold before reduction is decreased again. Default is 250. Range is between 0.01 and 9000. makeup Set the amount by how much signal will be amplified after processing. Default is 1. Range is from 1 to 64. knee Curve the sharp knee around the threshold to enter gain reduction more softly. Default is 2.82843. Range is between 1 and 8. link Choose if the average level between all channels of input stream or the louder( maximum ) channel of input stream affects the reduction. Default is average . detection Should the exact signal be taken in case of peak or an RMS one in case of rms . Default is rms which is mostly smoother. mix How much to use compressed signal in output. Default is 1. Range is between 0 and 1.

This filter supports the all above options as commands.

Simple audio dynamic range compression/expansion filter.

The filter accepts the following options:

contrast Set contrast. Default is 33. Allowed range is between 0 and 100.

Copy the input audio source unchanged to the output. This is mainly useful for testing purposes.

Apply cross fade from one input audio stream to another input audio stream. The cross fade is applied for specified duration near the end of first stream.

The filter accepts the following options:

nb_samples, ns Specify the number of samples for which the cross fade effect has to last. At the end of the cross fade effect the first input audio will be completely silent. Default is 44100. duration, d Specify the duration of the cross fade effect. See (ffmpeg-utils)the Time duration section in the ffmpeg-utils(1) manual for the accepted syntax. By default the duration is determined by nb_samples . If set this option is used instead of nb_samples . overlap, o Should first stream end overlap with second stream start. Default is enabled. curve1 Set curve for cross fade transition for first stream. curve2 Set curve for cross fade transition for second stream. For description of available curve types see afade filter description.

Cross fade from one input to another: ffmpeg -i first.flac -i second.flac -filter_complex acrossfade=d=10:c1=exp:c2=exp output.flac

Cross fade from one input to another but without overlapping: ffmpeg -i first.flac -i second.flac -filter_complex acrossfade=d=10:o=0:c1=exp:c2=exp output.flac

Split audio stream into several bands.

This filter splits audio stream into two or more frequency ranges. Summing all streams back will give flat output.

The filter accepts the following options:

split Set split frequencies. Those must be positive and increasing. order Set filter order, can be 2nd , 4th or 8th . Default is 4th .

Reduce audio bit resolution.

This filter is bit crusher with enhanced functionality. A bit crusher is used to audibly reduce number of bits an audio signal is sampled with. This doesn’t change the bit depth at all, it just produces the effect. Material reduced in bit depth sounds more harsh and "digital". This filter is able to even round to continuous values instead of discrete bit depths. Additionally it has a D/C offset which results in different crushing of the lower and the upper half of the signal. An Anti-Aliasing setting is able to produce "softer" crushing sounds.

Another feature of this filter is the logarithmic mode. This setting switches from linear distances between bits to logarithmic ones. The result is a much more "natural" sounding crusher which doesn’t gate low signals for example. The human ear has a logarithmic perception, so this kind of crushing is much more pleasant. Logarithmic crushing is also able to get anti-aliased.

The filter accepts the following options:

level_in Set level in. level_out Set level out. bits Set bit reduction. mix Set mixing amount. mode Can be linear: lin or logarithmic: log . dc Set DC. aa Set anti-aliasing. samples Set sample reduction. lfo Enable LFO. By default disabled. lforange Set LFO range. lforate Set LFO rate.

Delay audio filtering until a given wallclock timestamp. See the cue filter.

Remove impulsive noise from input audio.

Samples detected as impulsive noise are replaced by interpolated samples using autoregressive modelling.

w Set window size, in milliseconds. Allowed range is from 10 to 100 . Default value is 55 milliseconds. This sets size of window which will be processed at once. o Set window overlap, in percentage of window size. Allowed range is from 50 to 95 . Default value is 75 percent. Setting this to a very high value increases impulsive noise removal but makes whole process much slower. a Set autoregression order, in percentage of window size. Allowed range is from 0 to 25 . Default value is 2 percent. This option also controls quality of interpolated samples using neighbour good samples. t Set threshold value. Allowed range is from 1 to 100 . Default value is 2 . This controls the strength of impulsive noise which is going to be removed. The lower value, the more samples will be detected as impulsive noise. b Set burst fusion, in percentage of window size. Allowed range is 0 to 10 . Default value is 2 . If any two samples detected as noise are spaced less than this value then any sample between those two samples will be also detected as noise. m Set overlap method. It accepts the following values: a Select overlap-add method. Even not interpolated samples are slightly changed with this method. s Select overlap-save method. Not interpolated samples remain unchanged. Default value is a .

Remove clipped samples from input audio.

Samples detected as clipped are replaced by interpolated samples using autoregressive modelling.

w Set window size, in milliseconds. Allowed range is from 10 to 100 . Default value is 55 milliseconds. This sets size of window which will be processed at once. o Set window overlap, in percentage of window size. Allowed range is from 50 to 95 . Default value is 75 percent. a Set autoregression order, in percentage of window size. Allowed range is from 0 to 25 . Default value is 8 percent. This option also controls quality of interpolated samples using neighbour good samples. t Set threshold value. Allowed range is from 1 to 100 . Default value is 10 . Higher values make clip detection less aggressive. n Set size of histogram used to detect clips. Allowed range is from 100 to 9999 . Default value is 1000 . Higher values make clip detection less aggressive. m Set overlap method. It accepts the following values: a Select overlap-add method. Even not interpolated samples are slightly changed with this method. s Select overlap-save method. Not interpolated samples remain unchanged. Default value is a .

Delay one or more audio channels.

Samples in delayed channel are filled with silence.

The filter accepts the following option:

delays Set list of delays in milliseconds for each channel separated by ’|’. Unused delays will be silently ignored. If number of given delays is smaller than number of channels all remaining channels will not be delayed. If you want to delay exact number of samples, append ’S’ to number. If you want instead to delay in seconds, append ’s’ to number. all Use last set delay for all remaining channels. By default is disabled. This option if enabled changes how option delays is interpreted.

Delay first channel by 1.5 seconds, the third channel by 0.5 seconds and leave the second channel (and any other channels that may be present) unchanged. adelay=1500|0|500

Delay second channel by 500 samples, the third channel by 700 samples and leave the first channel (and any other channels that may be present) unchanged. adelay=0|500S|700S

Delay all channels by same number of samples: adelay=delays=64S:all=1

Compute derivative/integral of audio stream.

Applying both filters one after another produces original audio.

Apply echoing to the input audio.

Echoes are reflected sound and can occur naturally amongst mountains (and sometimes large buildings) when talking or shouting; digital echo effects emulate this behaviour and are often used to help fill out the sound of a single instrument or vocal. The time difference between the original signal and the reflection is the delay , and the loudness of the reflected signal is the decay . Multiple echoes can have different delays and decays.

A description of the accepted parameters follows.

in_gain Set input gain of reflected signal. Default is 0.6 . out_gain Set output gain of reflected signal. Default is 0.3 . delays Set list of time intervals in milliseconds between original signal and reflections separated by ’|’. Allowed range for each delay is (0 - 90000.0] . Default is 1000 . decays Set list of loudness of reflected signals separated by ’|’. Allowed range for each decay is (0 - 1.0] . Default is 0.5 .

Make it sound as if there are twice as many instruments as are actually playing: aecho=0.8:0.88:60:0.4

If delay is very short, then it sounds like a (metallic) robot playing music: aecho=0.8:0.88:6:0.4

A longer delay will sound like an open air concert in the mountains: aecho=0.8:0.9:1000:0.3

Same as above but with one more mountain: aecho=0.8:0.9:1000|1800:0.3|0.25

Audio emphasis filter creates or restores material directly taken from LPs or emphased CDs with different filter curves. E.g. to store music on vinyl the signal has to be altered by a filter first to even out the disadvantages of this recording medium. Once the material is played back the inverse filter has to be applied to restore the distortion of the frequency response.

The filter accepts the following options:

level_in Set input gain. level_out Set output gain. mode Set filter mode. For restoring material use reproduction mode, otherwise use production mode. Default is reproduction mode. type Set filter type. Selects medium. Can be one of the following: col select Columbia. emi select EMI. bsi select BSI (78RPM). riaa select RIAA. cd select Compact Disc (CD). 50fm select 50µs (FM). 75fm select 75µs (FM). 50kf select 50µs (FM-KF). 75kf select 75µs (FM-KF).

Modify an audio signal according to the specified expressions.

This filter accepts one or more expressions (one for each channel), which are evaluated and used to modify a corresponding audio signal.

It accepts the following parameters:

exprs Set the ’|’-separated expressions list for each separate channel. If the number of input channels is greater than the number of expressions, the last specified expression is used for the remaining output channels. channel_layout, c Set output channel layout. If not specified, the channel layout is specified by the number of expressions. If set to ‘ same ’, it will use by default the same input channel layout.

Each expression in exprs can contain the following constants and functions:

ch channel number of the current expression n number of the evaluated sample, starting from 0 s sample rate t time of the evaluated sample expressed in seconds nb_in_channels nb_out_channels input and output number of channels val(CH) the value of input channel with number CH

Note: this filter is slow. For faster processing you should use a dedicated filter.

Half volume: aeval=val(ch)/2:c=same

Invert phase of the second channel: aeval=val(0)|-val(1)

Apply fade-in/out effect to input audio.

A description of the accepted parameters follows.

type, t Specify the effect type, can be either in for fade-in, or out for a fade-out effect. Default is in . start_sample, ss Specify the number of the start sample for starting to apply the fade effect. Default is 0. nb_samples, ns Specify the number of samples for which the fade effect has to last. At the end of the fade-in effect the output audio will have the same volume as the input audio, at the end of the fade-out transition the output audio will be silence. Default is 44100. start_time, st Specify the start time of the fade effect. Default is 0. The value must be specified as a time duration; see (ffmpeg-utils)the Time duration section in the ffmpeg-utils(1) manual for the accepted syntax. If set this option is used instead of start_sample . duration, d Specify the duration of the fade effect. See (ffmpeg-utils)the Time duration section in the ffmpeg-utils(1) manual for the accepted syntax. At the end of the fade-in effect the output audio will have the same volume as the input audio, at the end of the fade-out transition the output audio will be silence. By default the duration is determined by nb_samples . If set this option is used instead of nb_samples . curve Set curve for fade transition. It accepts the following values: tri select triangular, linear slope (default) qsin select quarter of sine wave hsin select half of sine wave esin select exponential sine wave log select logarithmic ipar select inverted parabola qua select quadratic cub select cubic squ select square root cbr select cubic root par select parabola exp select exponential iqsin select inverted quarter of sine wave ihsin select inverted half of sine wave dese select double-exponential seat desi select double-exponential sigmoid losi select logistic sigmoid nofade no fade applied

Fade in first 15 seconds of audio: afade=t=in:ss=0:d=15

Fade out last 25 seconds of a 900 seconds audio: afade=t=out:st=875:d=25

Denoise audio samples with FFT.

A description of the accepted parameters follows.

nr Set the noise reduction in dB, allowed range is 0.01 to 97. Default value is 12 dB. nf Set the noise floor in dB, allowed range is -80 to -20. Default value is -50 dB. nt Set the noise type. It accepts the following values: w Select white noise. v Select vinyl noise. s Select shellac noise. c Select custom noise, defined in bn option. Default value is white noise. bn Set custom band noise for every one of 15 bands. Bands are separated by ’ ’ or ’|’. rf Set the residual floor in dB, allowed range is -80 to -20. Default value is -38 dB. tn Enable noise tracking. By default is disabled. With this enabled, noise floor is automatically adjusted. tr Enable residual tracking. By default is disabled. om Set the output mode. It accepts the following values: i Pass input unchanged. o Pass noise filtered out. n Pass only noise. Default value is o .

This filter supports the following commands:

sample_noise, sn Start or stop measuring noise profile. Syntax for the command is : "start" or "stop" string. After measuring noise profile is stopped it will be automatically applied in filtering. noise_reduction, nr Change noise reduction. Argument is single float number. Syntax for the command is : " noise_reduction " noise_floor, nf Change noise floor. Argument is single float number. Syntax for the command is : " noise_floor " output_mode, om Change output mode operation. Syntax for the command is : "i", "o" or "n" string.

Apply arbitrary expressions to samples in frequency domain.

real Set frequency domain real expression for each separate channel separated by ’|’. Default is "re". If the number of input channels is greater than the number of expressions, the last specified expression is used for the remaining output channels. imag Set frequency domain imaginary expression for each separate channel separated by ’|’. Default is "im". Each expression in real and imag can contain the following constants and functions: sr sample rate b current frequency bin number nb number of available bins ch channel number of the current expression chs number of channels pts current frame pts re current real part of frequency bin of current channel im current imaginary part of frequency bin of current channel real(b, ch) Return the value of real part of frequency bin at location ( bin , channel ) imag(b, ch) Return the value of imaginary part of frequency bin at location ( bin , channel ) win_size Set window size. Allowed range is from 16 to 131072. Default is 4096 win_func Set window function. Default is hann . overlap Set window overlap. If set to 1, the recommended overlap for selected window function will be picked. Default is 0.75 .

Leave almost only low frequencies in audio: afftfilt="'real=re * (1-clip((b/nb)*b,0,1))':imag='im * (1-clip((b/nb)*b,0,1))'"

Apply robotize effect: afftfilt="real='hypot(re,im)*sin(0)':imag='hypot(re,im)*cos(0)':win_size=512:overlap=0.75"

Apply whisper effect: afftfilt="real='hypot(re,im)*cos((random(0)*2-1)*2*3.14)':imag='hypot(re,im)*sin((random(1)*2-1)*2*3.14)':win_size=128:overlap=0.8"

Apply an arbitrary Finite Impulse Response filter.

This filter is designed for applying long FIR filters, up to 60 seconds long.

It can be used as component for digital crossover filters, room equalization, cross talk cancellation, wavefield synthesis, auralization, ambiophonics, ambisonics and spatialization.

This filter uses the streams higher than first one as FIR coefficients. If the non-first stream holds a single channel, it will be used for all input channels in the first stream, otherwise the number of channels in the non-first stream must be same as the number of channels in the first stream.

It accepts the following parameters:

dry Set dry gain. This sets input gain. wet Set wet gain. This sets final output gain. length Set Impulse Response filter length. Default is 1, which means whole IR is processed. gtype Enable applying gain measured from power of IR. Set which approach to use for auto gain measurement. none Do not apply any gain. peak select peak gain, very conservative approach. This is default value. dc select DC gain, limited application. gn select gain to noise approach, this is most popular one. irgain Set gain to be applied to IR coefficients before filtering. Allowed range is 0 to 1. This gain is applied after any gain applied with gtype option. irfmt Set format of IR stream. Can be mono or input . Default is input . maxir Set max allowed Impulse Response filter duration in seconds. Default is 30 seconds. Allowed range is 0.1 to 60 seconds. response Show IR frequency response, magnitude(magenta), phase(green) and group delay(yellow) in additional video stream. By default it is disabled. channel Set for which IR channel to display frequency response. By default is first channel displayed. This option is used only when response is enabled. size Set video stream size. This option is used only when response is enabled. rate Set video stream frame rate. This option is used only when response is enabled. minp Set minimal partition size used for convolution. Default is 8192 . Allowed range is from 1 to 32768 . Lower values decreases latency at cost of higher CPU usage. maxp Set maximal partition size used for convolution. Default is 8192 . Allowed range is from 8 to 32768 . Lower values may increase CPU usage. nbirs Set number of input impulse responses streams which will be switchable at runtime. Allowed range is from 1 to 32 . Default is 1 . ir Set IR stream which will be used for convolution, starting from 0 , should always be lower than supplied value by nbirs option. Default is 0 . This option can be changed at runtime via commands.

Apply reverb to stream using mono IR file as second input, complete command using ffmpeg: ffmpeg -i input.wav -i middle_tunnel_1way_mono.wav -lavfi afir output.wav

Set output format constraints for the input audio. The framework will negotiate the most appropriate format to minimize conversions.

It accepts the following parameters:

sample_fmts, f A ’|’-separated list of requested sample formats. sample_rates, r A ’|’-separated list of requested sample rates. channel_layouts, cl A ’|’-separated list of requested channel layouts. See (ffmpeg-utils)the Channel Layout section in the ffmpeg-utils(1) manual for the required syntax.

If a parameter is omitted, all values are allowed.

Force the output to either unsigned 8-bit or signed 16-bit stereo

aformat=sample_fmts=u8|s16:channel_layouts=stereo

A gate is mainly used to reduce lower parts of a signal. This kind of signal processing reduces disturbing noise between useful signals.

Gating is done by detecting the volume below a chosen level threshold and dividing it by the factor set with ratio . The bottom of the noise floor is set via range . Because an exact manipulation of the signal would cause distortion of the waveform the reduction can be levelled over time. This is done by setting attack and release .

attack determines how long the signal has to fall below the threshold before any reduction will occur and release sets the time the signal has to rise above the threshold to reduce the reduction again. Shorter signals than the chosen attack time will be left untouched.

level_in Set input level before filtering. Default is 1. Allowed range is from 0.015625 to 64. mode Set the mode of operation. Can be upward or downward . Default is downward . If set to upward mode, higher parts of signal will be amplified, expanding dynamic range in upward direction. Otherwise, in case of downward lower parts of signal will be reduced. range Set the level of gain reduction when the signal is below the threshold. Default is 0.06125. Allowed range is from 0 to 1. Setting this to 0 disables reduction and then filter behaves like expander. threshold If a signal rises above this level the gain reduction is released. Default is 0.125. Allowed range is from 0 to 1. ratio Set a ratio by which the signal is reduced. Default is 2. Allowed range is from 1 to 9000. attack Amount of milliseconds the signal has to rise above the threshold before gain reduction stops. Default is 20 milliseconds. Allowed range is from 0.01 to 9000. release Amount of milliseconds the signal has to fall below the threshold before the reduction is increased again. Default is 250 milliseconds. Allowed range is from 0.01 to 9000. makeup Set amount of amplification of signal after processing. Default is 1. Allowed range is from 1 to 64. knee Curve the sharp knee around the threshold to enter gain reduction more softly. Default is 2.828427125. Allowed range is from 1 to 8. detection Choose if exact signal should be taken for detection or an RMS like one. Default is rms . Can be peak or rms . link Choose if the average level between all channels or the louder channel affects the reduction. Default is average . Can be average or maximum .

Apply an arbitrary Infinite Impulse Response filter.

It accepts the following parameters:

zeros, z Set numerator/zeros coefficients. poles, p Set denominator/poles coefficients. gains, k Set channels gains. dry_gain Set input gain. wet_gain Set output gain. format, f Set coefficients format. ‘ tf ’ digital transfer function ‘ zp ’ Z-plane zeros/poles, cartesian (default) ‘ pr ’ Z-plane zeros/poles, polar radians ‘ pd ’ Z-plane zeros/poles, polar degrees ‘ sp ’ S-plane zeros/poles process, r Set kind of processing. Can be d - direct or s - serial cascading. Default is s . precision, e Set filtering precision. ‘ dbl ’ double-precision floating-point (default) ‘ flt ’ single-precision floating-point ‘ i32 ’ 32-bit integers ‘ i16 ’ 16-bit integers normalize, n Normalize filter coefficients, by default is enabled. Enabling it will normalize magnitude response at DC to 0dB. mix How much to use filtered signal in output. Default is 1. Range is between 0 and 1. response Show IR frequency response, magnitude(magenta), phase(green) and group delay(yellow) in additional video stream. By default it is disabled. channel Set for which IR channel to display frequency response. By default is first channel displayed. This option is used only when response is enabled. size Set video stream size. This option is used only when response is enabled.

Coefficients in tf format are separated by spaces and are in ascending order.

Coefficients in zp format are separated by spaces and order of coefficients doesn’t matter. Coefficients in zp format are complex numbers with i imaginary unit.

Different coefficients and gains can be provided for every channel, in such case use ’|’ to separate coefficients or gains. Last provided coefficients will be used for all remaining channels.

Apply 2 pole elliptic notch at around 5000Hz for 48000 Hz sample rate: aiir=k=1:z=7.957584807809675810E-1 -2.575128568908332300 3.674839853930788710 -2.57512875289799137 7.957586296317130880E-1:p=1 -2.86950072432325953 3.63022088054647218 -2.28075678147272232 6.361362326477423500E-1:f=tf:r=d

Same as above but in zp format: aiir=k=0.79575848078096756:z=0.80918701+0.58773007i 0.80918701-0.58773007i 0.80884700+0.58784055i 0.80884700-0.58784055i:p=0.63892345+0.59951235i 0.63892345-0.59951235i 0.79582691+0.44198673i 0.79582691-0.44198673i:f=zp:r=s

The limiter prevents an input signal from rising over a desired threshold. This limiter uses lookahead technology to prevent your signal from distorting. It means that there is a small delay after the signal is processed. Keep in mind that the delay it produces is the attack time you set.

The filter accepts the following options:

level_in Set input gain. Default is 1. level_out Set output gain. Default is 1. limit Don’t let signals above this level pass the limiter. Default is 1. attack The limiter will reach its attenuation level in this amount of time in milliseconds. Default is 5 milliseconds. release Come back from limiting to attenuation 1.0 in this amount of milliseconds. Default is 50 milliseconds. asc When gain reduction is always needed ASC takes care of releasing to an average reduction level rather than reaching a reduction of 0 in the release time. asc_level Select how much the release time is affected by ASC, 0 means nearly no changes in release time while 1 produces higher release times. level Auto level output signal. Default is enabled. This normalizes audio back to 0dB if enabled.

Depending on picked setting it is recommended to upsample input 2x or 4x times with aresample before applying this filter.

Apply a two-pole all-pass filter with central frequency (in Hz) frequency , and filter-width width . An all-pass filter changes the audio’s frequency to phase relationship without changing its frequency to amplitude relationship.

The filter accepts the following options:

frequency, f Set frequency in Hz. width_type, t Set method to specify band-width of filter. h Hz q Q-Factor o octave s slope k kHz width, w Specify the band-width of a filter in width_type units. mix, m How much to use filtered signal in output. Default is 1. Range is between 0 and 1. channels, c Specify which channels to filter, by default all available are filtered. normalize, n Normalize biquad coefficients, by default is disabled. Enabling it will normalize magnitude response at DC to 0dB. order, o Set the filter order, can be 1 or 2. Default is 2. transform, a Set transform type of IIR filter. di dii tdii

This filter supports the following commands:

frequency, f Change allpass frequency. Syntax for the command is : " frequency " width_type, t Change allpass width_type. Syntax for the command is : " width_type " width, w Change allpass width. Syntax for the command is : " width " mix, m Change allpass mix. Syntax for the command is : " mix "

Loop audio samples.

The filter accepts the following options:

loop Set the number of loops. Setting this value to -1 will result in infinite loops. Default is 0. size Set maximal number of samples. Default is 0. start Set first sample of loop. Default is 0.

Merge two or more audio streams into a single multi-channel stream.

The filter accepts the following options:

inputs Set the number of inputs. Default is 2.

If the channel layouts of the inputs are disjoint, and therefore compatible, the channel layout of the output will be set accordingly and the channels will be reordered as necessary. If the channel layouts of the inputs are not disjoint, the output will have all the channels of the first input then all the channels of the second input, in that order, and the channel layout of the output will be the default value corresponding to the total number of channels.

For example, if the first input is in 2.1 (FL+FR+LF) and the second input is FC+BL+BR, then the output will be in 5.1, with the channels in the following order: a1, a2, b1, a3, b2, b3 (a1 is the first channel of the first input, b1 is the first channel of the second input).

On the other hand, if both input are in stereo, the output channels will be in the default order: a1, a2, b1, b2, and the channel layout will be arbitrarily set to 4.0, which may or may not be the expected value.

All inputs must have the same sample rate, and format.

If inputs do not have the same duration, the output will stop with the shortest.

Merge two mono files into a stereo stream: amovie=left.wav [l] ; amovie=right.mp3 [r] ; [l] [r] amerge

Multiple merges assuming 1 video stream and 6 audio streams in input.mkv : ffmpeg -i input.mkv -filter_complex "[0:1][0:2][0:3][0:4][0:5][0:6] amerge=inputs=6" -c:a pcm_s16le output.mkv

Mixes multiple audio inputs into a single output.

Note that this filter only supports float samples (the amerge and pan audio filters support many formats). If the amix input has integer samples then aresample will be automatically inserted to perform the conversion to float samples.

For example

ffmpeg -i INPUT1 -i INPUT2 -i INPUT3 -filter_complex amix=inputs=3:duration=first:dropout_transition=3 OUTPUT

will mix 3 input audio streams to a single output with the same duration as the first input and a dropout transition time of 3 seconds.

It accepts the following parameters:

inputs The number of inputs. If unspecified, it defaults to 2. duration How to determine the end-of-stream. longest The duration of the longest input. (default) shortest The duration of the shortest input. first The duration of the first input. dropout_transition The transition time, in seconds, for volume renormalization when an input stream ends. The default value is 2 seconds. weights Specify weight of each input audio stream as sequence. Each weight is separated by space. By default all inputs have same weight.

This filter supports the following commands:

weights Syntax is same as option with same name.

Multiply first audio stream with second audio stream and store result in output audio stream. Multiplication is done by multiplying each sample from first stream with sample at same position from second stream.

With this element-wise multiplication one can create amplitude fades and amplitude modulations.

High-order parametric multiband equalizer for each channel.

It accepts the following parameters:

params This option string is in format: "c chn f= cf w= w g= g t= f | ..." Each equalizer band is separated by ’|’. chn Set channel number to which equalization will be applied. If input doesn’t have that channel the entry is ignored. f Set central frequency for band. If input doesn’t have that frequency the entry is ignored. w Set band width in hertz. g Set band gain in dB. t Set filter type for band, optional, can be: ‘ 0 ’ Butterworth, this is default. ‘ 1 ’ Chebyshev type 1. ‘ 2 ’ Chebyshev type 2. curves With this option activated frequency response of anequalizer is displayed in video stream. size Set video stream size. Only useful if curves option is activated. mgain Set max gain that will be displayed. Only useful if curves option is activated. Setting this to a reasonable value makes it possible to display gain which is derived from neighbour bands which are too close to each other and thus produce higher gain when both are activated. fscale Set frequency scale used to draw frequency response in video output. Can be linear or logarithmic. Default is logarithmic. colors Set color for each channel curve which is going to be displayed in video stream. This is list of color names separated by space or by ’|’. Unrecognised or missing colors will be replaced by white color.

Lower gain by 10 of central frequency 200Hz and width 100 Hz for first 2 channels using Chebyshev type 1 filter: anequalizer=c0 f=200 w=100 g=-10 t=1|c1 f=200 w=100 g=-10 t=1

This filter supports the following commands:

change Alter existing filter parameters. Syntax for the commands is : " fN |f= freq |w= width |g= gain " fN is existing filter number, starting from 0, if no such filter is available error is returned. freq set new frequency parameter. width set new width parameter in herz. gain set new gain parameter in dB. Full filter invocation with asendcmd may look like this: asendcmd=c=’4.0 anequalizer change 0|f=200|w=50|g=1’,anequalizer=...

Reduce broadband noise in audio samples using Non-Local Means algorithm.

Each sample is adjusted by looking for other samples with similar contexts. This context similarity is defined by comparing their surrounding patches of size p . Patches are searched in an area of r around the sample.

The filter accepts the following options:

s Set denoising strength. Allowed range is from 0.00001 to 10. Default value is 0.00001. p Set patch radius duration. Allowed range is from 1 to 100 milliseconds. Default value is 2 milliseconds. r Set research radius duration. Allowed range is from 2 to 300 milliseconds. Default value is 6 milliseconds. o Set the output mode. It accepts the following values: i Pass input unchanged. o Pass noise filtered out. n Pass only noise. Default value is o . m Set smooth factor. Default value is 11 . Allowed range is from 1 to 15 .

This filter supports the following commands:

s Change denoise strength. Argument is single float number. Syntax for the command is : " s " o Change output mode. Syntax for the command is : "i", "o" or "n" string.

Apply Normalized Least-Mean-Squares algorithm to the first audio stream using the second audio stream.

This adaptive filter is used to mimic a desired filter by finding the filter coefficients that relate to producing the least mean square of the error signal (difference between the desired, 2nd input audio stream and the actual signal, the 1st input audio stream).

A description of the accepted options follows.

order Set filter order. mu Set filter mu. eps Set the filter eps. leakage Set the filter leakage. out_mode It accepts the following values: i Pass the 1st input. d Pass the 2nd input. o Pass filtered samples. n Pass difference between desired and filtered samples. Default value is o .

One of many usages of this filter is noise reduction, input audio is filtered with same samples that are delayed by fixed amount, one such example for stereo audio is: asplit[a][b],[a]adelay=32S|32S[a],[b][a]anlms=order=128:leakage=0.0005:mu=.5:out_mode=o

This filter supports the same commands as options, excluding option order .

Pass the audio source unchanged to the output.

Pad the end of an audio stream with silence.

This can be used together with ffmpeg -shortest to extend audio streams to the same length as the video stream.

A description of the accepted options follows.

packet_size Set silence packet size. Default value is 4096. pad_len Set the number of samples of silence to add to the end. After the value is reached, the stream is terminated. This option is mutually exclusive with whole_len . whole_len Set the minimum total number of samples in the output audio stream. If the value is longer than the input audio length, silence is added to the end, until the value is reached. This option is mutually exclusive with pad_len . pad_dur Specify the duration of samples of silence to add. See (ffmpeg-utils)the Time duration section in the ffmpeg-utils(1) manual for the accepted syntax. Used only if set to non-zero value. whole_dur Specify the minimum total duration in the output audio stream. See (ffmpeg-utils)the Time duration section in the ffmpeg-utils(1) manual for the accepted syntax. Used only if set to non-zero value. If the value is longer than the input audio length, silence is added to the end, until the value is reached. This option is mutually exclusive with pad_dur

If neither the pad_len nor the whole_len nor pad_dur nor whole_dur option is set, the filter will add silence to the end of the input stream indefinitely.

Add 1024 samples of silence to the end of the input: apad=pad_len=1024

Make sure the audio output will contain at least 10000 samples, pad the input with silence if required: apad=whole_len=10000

Use ffmpeg to pad the audio input with silence, so that the video stream will always result the shortest and will be converted until the end in the output file when using the shortest option: ffmpeg -i VIDEO -i AUDIO -filter_complex "[1:0]apad" -shortest OUTPUT

Add a phasing effect to the input audio.

A phaser filter creates series of peaks and troughs in the frequency spectrum. The position of the peaks and troughs are modulated so that they vary over time, creating a sweeping effect.

A description of the accepted parameters follows.

in_gain Set input gain. Default is 0.4. out_gain Set output gain. Default is 0.74 delay Set delay in milliseconds. Default is 3.0. decay Set decay. Default is 0.4. speed Set modulation speed in Hz. Default is 0.5. type Set modulation type. Default is triangular. It accepts the following values: ‘ triangular, t ’ ‘ sinusoidal, s ’

Audio pulsator is something between an autopanner and a tremolo. But it can produce funny stereo effects as well. Pulsator changes the volume of the left and right channel based on a LFO (low frequency oscillator) with different waveforms and shifted phases. This filter have the ability to define an offset between left and right channel. An offset of 0 means that both LFO shapes match each other. The left and right channel are altered equally - a conventional tremolo. An offset of 50% means that the shape of the right channel is exactly shifted in phase (or moved backwards about half of the frequency) - pulsator acts as an autopanner. At 1 both curves match again. Every setting in between moves the phase shift gapless between all stages and produces some "bypassing" sounds with sine and triangle waveforms. The more you set the offset near 1 (starting from the 0.5) the faster the signal passes from the left to the right speaker.

The filter accepts the following options:

level_in Set input gain. By default it is 1. Range is [0.015625 - 64]. level_out Set output gain. By default it is 1. Range is [0.015625 - 64]. mode Set waveform shape the LFO will use. Can be one of: sine, triangle, square, sawup or sawdown. Default is sine. amount Set modulation. Define how much of original signal is affected by the LFO. offset_l Set left channel offset. Default is 0. Allowed range is [0 - 1]. offset_r Set right channel offset. Default is 0.5. Allowed range is [0 - 1]. width Set pulse width. Default is 1. Allowed range is [0 - 2]. timing Set possible timing mode. Can be one of: bpm, ms or hz. Default is hz. bpm Set bpm. Default is 120. Allowed range is [30 - 300]. Only used if timing is set to bpm. ms Set ms. Default is 500. Allowed range is [10 - 2000]. Only used if timing is set to ms. hz Set frequency in Hz. Default is 2. Allowed range is [0.01 - 100]. Only used if timing is set to hz.

Resample the input audio to the specified parameters, using the libswresample library. If none are specified then the filter will automatically convert between its input and output.

This filter is also able to stretch/squeeze the audio data to make it match the timestamps or to inject silence / cut out audio to make it match the timestamps, do a combination of both or do neither.

The filter accepts the syntax [ sample_rate :] resampler_options , where sample_rate expresses a sample rate and resampler_options is a list of key = value pairs, separated by ":". See the (ffmpeg-resampler)"Resampler Options" section in the ffmpeg-resampler(1) manual for the complete list of supported options.

Resample the input audio to 44100Hz: aresample=44100

Stretch/squeeze samples to the given timestamps, with a maximum of 1000 samples per second compensation: aresample=async=1000

Reverse an audio clip.

Warning: This filter requires memory to buffer the entire clip, so trimming is suggested.

Take the first 5 seconds of a clip, and reverse it. atrim=end=5,areverse

Reduce noise from speech using Recurrent Neural Networks.

This filter accepts the following options:

model, m Set train model file to load. This option is always required.

Set the number of samples per each output audio frame.

The last output packet may contain a different number of samples, as the filter will flush all the remaining samples when the input audio signals its end.

The filter accepts the following options:

nb_out_samples, n Set the number of frames per each output audio frame. The number is intended as the number of samples per each channel. Default value is 1024. pad, p If set to 1, the filter will pad the last audio frame with zeroes, so that the last frame will contain the same number of samples as the previous ones. Default value is 1.

For example, to set the number of per-frame samples to 1234 and disable padding for the last frame, use:

asetnsamples=n=1234:p=0

Set the sample rate without altering the PCM data. This will result in a change of speed and pitch.

The filter accepts the following options:

sample_rate, r Set the output sample rate. Default is 44100 Hz.

Show a line containing various information for each input audio frame. The input audio is not modified.

The shown line contains a sequence of key/value pairs of the form key : value .

The following values are shown in the output:

n The (sequential) number of the input frame, starting from 0. pts The presentation timestamp of the input frame, in time base units; the time base depends on the filter input pad, and is usually 1/ sample_rate . pts_time The presentation timestamp of the input frame in seconds. pos position of the frame in the input stream, -1 if this information in unavailable and/or meaningless (for example in case of synthetic audio) fmt The sample format. chlayout The channel layout. rate The sample rate for the audio frame. nb_samples The number of samples (per channel) in the frame. checksum The Adler-32 checksum (printed in hexadecimal) of the audio data. For planar audio, the data is treated as if all the planes were concatenated. plane_checksums A list of Adler-32 checksums for each data plane.

Apply audio soft clipping.

Soft clipping is a type of distortion effect where the amplitude of a signal is saturated along a smooth curve, rather than the abrupt shape of hard-clipping.

This filter accepts the following options:

type Set type of soft-clipping. It accepts the following values: tanh atan cubic exp alg quintic sin param Set additional parameter which controls sigmoid function.

This filter supports the all above options as commands.

Automatic Speech Recognition

This filter uses PocketSphinx for speech recognition. To enable compilation of this filter, you need to configure FFmpeg with --enable-pocketsphinx .

It accepts the following options:

rate Set sampling rate of input audio. Defaults is 16000 . This need to match speech models, otherwise one will get poor results. hmm Set dictionary containing acoustic model files. dict Set pronunciation dictionary. lm Set language model file. lmctl Set language model set. lmname Set which language model to use. logfn Set output for log messages.

The filter exports recognized speech as the frame metadata lavfi.asr.text .

Display time domain statistical information about the audio channels. Statistics are calculated and displayed for each audio channel and, where applicable, an overall figure is also given.

It accepts the following option:

length Short window length in seconds, used for peak and trough RMS measurement. Default is 0.05 (50 milliseconds). Allowed range is [0.01 - 10] . metadata Set metadata injection. All the metadata keys are prefixed with lavfi.astats.X , where X is channel number starting from 1 or string Overall . Default is disabled. Available keys for each channel are: DC_offset Min_level Max_level Min_difference Max_difference Mean_difference RMS_difference Peak_level RMS_peak RMS_trough Crest_factor Flat_factor Peak_count Noise_floor Noise_floor_count Bit_depth Dynamic_range Zero_crossings Zero_crossings_rate Number_of_NaNs Number_of_Infs Number_of_denormals and for Overall: DC_offset Min_level Max_level Min_difference Max_difference Mean_difference RMS_difference Peak_level RMS_level RMS_peak RMS_trough Flat_factor Peak_count Noise_floor Noise_floor_count Bit_depth Number_of_samples Number_of_NaNs Number_of_Infs Number_of_denormals For example full key look like this lavfi.astats.1.DC_offset or this lavfi.astats.Overall.Peak_count . For description what each key means read below. reset Set number of frame after which stats are going to be recalculated. Default is disabled. measure_perchannel Select the entries which need to be measured per channel. The metadata keys can be used as flags, default is all which measures everything. none disables all per channel measurement. measure_overall Select the entries which need to be measured overall. The metadata keys can be used as flags, default is all which measures everything. none disables all overall measurement.

A description of each shown parameter follows:

DC offset Mean amplitude displacement from zero. Min level Minimal sample level. Max level Maximal sample level. Min difference Minimal difference between two consecutive samples. Max difference Maximal difference between two consecutive samples. Mean difference Mean difference between two consecutive samples. The average of each difference between two consecutive samples. RMS difference Root Mean Square difference between two consecutive samples. Peak level dB RMS level dB Standard peak and RMS level measured in dBFS. RMS peak dB RMS trough dB Peak and trough values for RMS level measured over a short window. Crest factor Standard ratio of peak to RMS level (note: not in dB). Flat factor Flatness (i.e. consecutive samples with the same value) of the signal at its peak levels (i.e. either Min level or Max level ). Peak count Number of occasions (not the number of samples) that the signal attained either Min level or Max level . Noise floor dB Minimum local peak measured in dBFS over a short window. Noise floor count Number of occasions (not the number of samples) that the signal attained Noise floor . Bit depth Overall bit depth of audio. Number of bits used for each sample. Dynamic range Measured dynamic range of audio in dB. Zero crossings Number of points where the waveform crosses the zero level axis. Zero crossings rate Rate of Zero crossings and number of audio samples.

Boost subwoofer frequencies.

The filter accepts the following options:

dry Set dry gain, how much of original signal is kept. Allowed range is from 0 to 1. Default value is 0.5. wet Set wet gain, how much of filtered signal is kept. Allowed range is from 0 to 1. Default value is 0.8. decay Set delay line decay gain value. Allowed range is from 0 to 1. Default value is 0.7. feedback Set delay line feedback gain value. Allowed range is from 0 to 1. Default value is 0.5. cutoff Set cutoff frequency in herz. Allowed range is 50 to 900. Default value is 100. slope Set slope amount for cutoff frequency. Allowed range is 0.0001 to 1. Default value is 0.5. delay Set delay. Allowed range is from 1 to 100. Default value is 20.

This filter supports the all above options as commands.

Adjust audio tempo.

The filter accepts exactly one parameter, the audio tempo. If not specified then the filter will assume nominal 1.0 tempo. Tempo must be in the [0.5, 100.0] range.

Note that tempo greater than 2 will skip some samples rather than blend them in. If for any reason this is a concern it is always possible to daisy-chain several instances of atempo to achieve the desired product tempo.

Slow down audio to 80% tempo: atempo=0.8

To speed up audio to 300% tempo: atempo=3

To speed up audio to 300% tempo by daisy-chaining two atempo instances: atempo=sqrt(3),atempo=sqrt(3)

This filter supports the following commands:

tempo Change filter tempo scale factor. Syntax for the command is : " tempo "

Trim the input so that the output contains one continuous subpart of the input.

It accepts the following parameters:

start Timestamp (in seconds) of the start of the section to keep. I.e. the audio sample with the timestamp start will be the first sample in the output. end Specify time of the first audio sample that will be dropped, i.e. the audio sample immediately preceding the one with the timestamp end will be the last sample in the output. start_pts Same as start , except this option sets the start timestamp in samples instead of seconds. end_pts Same as end , except this option sets the end timestamp in samples instead of seconds. duration The maximum duration of the output in seconds. start_sample The number of the first sample that should be output. end_sample The number of the first sample that should be dropped.

start , end , and duration are expressed as time duration specifications; see (ffmpeg-utils)the Time duration section in the ffmpeg-utils(1) manual.

Note that the first two sets of the start/end options and the duration option look at the frame timestamp, while the _sample options simply count the samples that pass through the filter. So start/end_pts and start/end_sample will give different results when the timestamps are wrong, inexact or do not start at zero. Also note that this filter does not modify the timestamps. If you wish to have the output timestamps start at zero, insert the asetpts filter after the atrim filter.

If multiple start or end options are set, this filter tries to be greedy and keep all samples that match at least one of the specified constraints. To keep only the part that matches all the constraints at once, chain multiple atrim filters.

The defaults are such that all the input is kept. So it is possible to set e.g. just the end values to keep everything before the specified time.

Examples:

Drop everything except the second minute of input: ffmpeg -i INPUT -af atrim=60:120

Keep only the first 1000 samples: ffmpeg -i INPUT -af atrim=end_sample=1000

Calculate normalized cross-correlation between two input audio streams.

Resulted samples are always between -1 and 1 inclusive. If result is 1 it means two input samples are highly correlated in that selected segment. Result 0 means they are not correlated at all. If result is -1 it means two input samples are out of phase, which means they cancel each other.

The filter accepts the following options:

size Set size of segment over which cross-correlation is calculated. Default is 256. Allowed range is from 2 to 131072. algo Set algorithm for cross-correlation. Can be slow or fast . Default is slow . Fast algorithm assumes mean values over any given segment are always zero and thus need much less calculations to make. This is generally not true, but is valid for typical audio streams.

Calculate correlation between channels in stereo audio stream: ffmpeg -i stereo.wav -af channelsplit,axcorrelate=size=1024:algo=fast correlation.wav

Apply a two-pole Butterworth band-pass filter with central frequency frequency , and (3dB-point) band-width width. The csg option selects a constant skirt gain (peak gain = Q) instead of the default: constant 0dB peak gain. The filter roll off at 6dB per octave (20dB per decade).

The filter accepts the following options:

frequency, f Set the filter’s central frequency. Default is 3000 . csg Constant skirt gain if set to 1. Defaults to 0. width_type, t Set method to specify band-width of filter. h Hz q Q-Factor o octave s slope k kHz width, w Specify the band-width of a filter in width_type units. mix, m How much to use filtered signal in output. Default is 1. Range is between 0 and 1. channels, c Specify which channels to filter, by default all available are filtered. normalize, n Normalize biquad coefficients, by default is disabled. Enabling it will normalize magnitude response at DC to 0dB. transform, a Set transform type of IIR filter. di dii tdii

This filter supports the following commands:

frequency, f Change bandpass frequency. Syntax for the command is : " frequency " width_type, t Change bandpass width_type. Syntax for the command is : " width_type " width, w Change bandpass width. Syntax for the command is : " width " mix, m Change bandpass mix. Syntax for the command is : " mix "

Apply a two-pole Butterworth band-reject filter with central frequency frequency , and (3dB-point) band-width width . The filter roll off at 6dB per octave (20dB per decade).

The filter accepts the following options:

frequency, f Set the filter’s central frequency. Default is 3000 . width_type, t Set method to specify band-width of filter. h Hz q Q-Factor o octave s slope k kHz width, w Specify the band-width of a filter in width_type units. mix, m How much to use filtered signal in output. Default is 1. Range is between 0 and 1. channels, c Specify which channels to filter, by default all available are filtered. normalize, n Normalize biquad coefficients, by default is disabled. Enabling it will normalize magnitude response at DC to 0dB. transform, a Set transform type of IIR filter. di dii tdii

This filter supports the following commands:

frequency, f Change bandreject frequency. Syntax for the command is : " frequency " width_type, t Change bandreject width_type. Syntax for the command is : " width_type " width, w Change bandreject width. Syntax for the command is : " width " mix, m Change bandreject mix. Syntax for the command is : " mix "

Boost or cut the bass (lower) frequencies of the audio using a two-pole shelving filter with a response similar to that of a standard hi-fi’s tone-controls. This is also known as shelving equalisation (EQ).

The filter accepts the following options:

gain, g Give the gain at 0 Hz. Its useful range is about -20 (for a large cut) to +20 (for a large boost). Beware of clipping when using a positive gain. frequency, f Set the filter’s central frequency and so can be used to extend or reduce the frequency range to be boosted or cut. The default value is 100 Hz. width_type, t Set method to specify band-width of filter. h Hz q Q-Factor o octave s slope k kHz width, w Determine how steep is the filter’s shelf transition. mix, m How much to use filtered signal in output. Default is 1. Range is between 0 and 1. channels, c Specify which channels to filter, by default all available are filtered. normalize, n Normalize biquad coefficients, by default is disabled. Enabling it will normalize magnitude response at DC to 0dB. transform, a Set transform type of IIR filter. di dii tdii

This filter supports the following commands:

frequency, f Change bass frequency. Syntax for the command is : " frequency " width_type, t Change bass width_type. Syntax for the command is : " width_type " width, w Change bass width. Syntax for the command is : " width " gain, g Change bass gain. Syntax for the command is : " gain " mix, m Change bass mix. Syntax for the command is : " mix "

Apply a biquad IIR filter with the given coefficients. Where b0 , b1 , b2 and a0 , a1 , a2 are the numerator and denominator coefficients respectively. and channels , c specify which channels to filter, by default all available are filtered.

This filter supports the following commands:

a0 a1 a2 b0 b1 b2 Change biquad parameter. Syntax for the command is : " value " mix, m How much to use filtered signal in output. Default is 1. Range is between 0 and 1. channels, c Specify which channels to filter, by default all available are filtered. normalize, n Normalize biquad coefficients, by default is disabled. Enabling it will normalize magnitude response at DC to 0dB. transform, a Set transform type of IIR filter. di dii tdii

Bauer stereo to binaural transformation, which improves headphone listening of stereo audio records.

To enable compilation of this filter you need to configure FFmpeg with --enable-libbs2b .

It accepts the following parameters:

profile Pre-defined crossfeed level. default Default level (fcut=700, feed=50). cmoy Chu Moy circuit (fcut=700, feed=60). jmeier Jan Meier circuit (fcut=650, feed=95). fcut Cut frequency (in Hz). feed Feed level (in Hz).

Remap input channels to new locations.

It accepts the following parameters:

map Map channels from input to output. The argument is a ’|’-separated list of mappings, each in the in_channel - out_channel or in_channel form. in_channel can be either the name of the input channel (e.g. FL for front left) or its index in the input channel layout. out_channel is the name of the output channel or its index in the output channel layout. If out_channel is not given then it is implicitly an index, starting with zero and increasing by one for each mapping. channel_layout The channel layout of the output stream.

If no mapping is present, the filter will implicitly map input channels to output channels, preserving indices.

For example, assuming a 5.1+downmix input MOV file, ffmpeg -i in.mov -filter 'channelmap=map=DL-FL|DR-FR' out.wav will create an output WAV file tagged as stereo from the downmix channels of the input.

To fix a 5.1 WAV improperly encoded in AAC’s native channel order ffmpeg -i in.wav -filter 'channelmap=1|2|0|5|3|4:5.1' out.wav

Split each channel from an input audio stream into a separate output stream.

It accepts the following parameters:

channel_layout The channel layout of the input stream. The default is "stereo". channels A channel layout describing the channels to be extracted as separate output streams or "all" to extract each input channel as a separate stream. The default is "all". Choosing channels not present in channel layout in the input will result in an error.

For example, assuming a stereo input MP3 file, ffmpeg -i in.mp3 -filter_complex channelsplit out.mkv will create an output Matroska file with two audio streams, one containing only the left channel and the other the right channel.

Split a 5.1 WAV file into per-channel files: ffmpeg -i in.wav -filter_complex 'channelsplit=channel_layout=5.1[FL][FR][FC][LFE][SL][SR]' -map '[FL]' front_left.wav -map '[FR]' front_right.wav -map '[FC]' front_center.wav -map '[LFE]' lfe.wav -map '[SL]' side_left.wav -map '[SR]' side_right.wav

Extract only LFE from a 5.1 WAV file: ffmpeg -i in.wav -filter_complex 'channelsplit=channel_layout=5.1:channels=LFE[LFE]' -map '[LFE]' lfe.wav

Add a chorus effect to the audio.

Can make a single vocal sound like a chorus, but can also be applied to instrumentation.

Chorus resembles an echo effect with a short delay, but whereas with echo the delay is constant, with chorus, it is varied using using sinusoidal or triangular modulation. The modulation depth defines the range the modulated delay is played before or after the delay. Hence the delayed sound will sound slower or faster, that is the delayed sound tuned around the original one, like in a chorus where some vocals are slightly off key.

It accepts the following parameters:

in_gain Set input gain. Default is 0.4. out_gain Set output gain. Default is 0.4. delays Set delays. A typical delay is around 40ms to 60ms. decays Set decays. speeds Set speeds. depths Set depths.

A single delay: chorus=0.7:0.9:55:0.4:0.25:2

Two delays: chorus=0.6:0.9:50|60:0.4|0.32:0.25|0.4:2|1.3

Fuller sounding chorus with three delays: chorus=0.5:0.9:50|60|40:0.4|0.32|0.3:0.25|0.4|0.3:2|2.3|1.3

Compress or expand the audio’s dynamic range.

It accepts the following parameters:

attacks decays A list of times in seconds for each channel over which the instantaneous level of the input signal is averaged to determine its volume. attacks refers to increase of volume and decays refers to decrease of volume. For most situations, the attack time (response to the audio getting louder) should be shorter than the decay time, because the human ear is more sensitive to sudden loud audio than sudden soft audio. A typical value for attack is 0.3 seconds and a typical value for decay is 0.8 seconds. If specified number of attacks & decays is lower than number of channels, the last set attack/decay will be used for all remaining channels. points A list of points for the transfer function, specified in dB relative to the maximum possible signal amplitude. Each key points list must be defined using the following syntax: x0/y0|x1/y1|x2/y2|.... or x0/y0 x1/y1 x2/y2 .... The input values must be in strictly increasing order but the transfer function does not have to be monotonically rising. The point 0/0 is assumed but may be overridden (by 0/out-dBn ). Typical values for the transfer function are -70/-70|-60/-20|1/0 . soft-knee Set the curve radius in dB for all joints. It defaults to 0.01. gain Set the additional gain in dB to be applied at all points on the transfer function. This allows for easy adjustment of the overall gain. It defaults to 0. volume Set an initial volume, in dB, to be assumed for each channel when filtering starts. This permits the user to supply a nominal level initially, so that, for example, a very large gain is not applied to initial signal levels before the companding has begun to operate. A typical value for audio which is initially quiet is -90 dB. It defaults to 0. delay Set a delay, in seconds. The input audio is analyzed immediately, but audio is delayed before being fed to the volume adjuster. Specifying a delay approximately equal to the attack/decay times allows the filter to effectively operate in predictive rather than reactive mode. It defaults to 0.

Make music with both quiet and loud passages suitable for listening to in a noisy environment: compand=.3|.3:1|1:-90/-60|-60/-40|-40/-30|-20/-20:6:0:-90:0.2 Another example for audio with whisper and explosion parts: compand=0|0:1|1:-90/-900|-70/-70|-30/-9|0/-3:6:0:0:0

A noise gate for when the noise is at a lower level than the signal: compand=.1|.1:.2|.2:-900/-900|-50.1/-900|-50/-50:.01:0:-90:.1

Here is another noise gate, this time for when the noise is at a higher level than the signal (making it, in some ways, similar to squelch): compand=.1|.1:.1|.1:-45.1/-45.1|-45/-900|0/-900:.01:45:-90:.1

2:1 compression starting at -6dB: compand=points=-80/-80|-6/-6|0/-3.8|20/3.5

2:1 compression starting at -9dB: compand=points=-80/-80|-9/-9|0/-5.3|20/2.9

2:1 compression starting at -12dB: compand=points=-80/-80|-12/-12|0/-6.8|20/1.9

2:1 compression starting at -18dB: compand=points=-80/-80|-18/-18|0/-9.8|20/0.7

3:1 compression starting at -15dB: compand=points=-80/-80|-15/-15|0/-10.8|20/-5.2

Compressor/Gate: compand=points=-80/-105|-62/-80|-15.4/-15.4|0/-12|20/-7.6

Expander: compand=attacks=0:points=-80/-169|-54/-80|-49.5/-64.6|-41.1/-41.1|-25.8/-15|-10.8/-4.5|0/0|20/8.3

Hard limiter at -6dB: compand=attacks=0:points=-80/-80|-6/-6|20/-6

Hard limiter at -12dB: compand=attacks=0:points=-80/-80|-12/-12|20/-12

Hard noise gate at -35 dB: compand=attacks=0:points=-80/-115|-35.1/-80|-35/-35|20/20

Soft limiter: compand=attacks=0:points=-80/-80|-12.4/-12.4|-6/-8|0/-6.8|20/-2.8

Compensation Delay Line is a metric based delay to compensate differing positions of microphones or speakers.

For example, you have recorded guitar with two microphones placed in different locations. Because the front of sound wave has fixed speed in normal conditions, the phasing of microphones can vary and depends on their location and interposition. The best sound mix can be achieved when these microphones are in phase (synchronized). Note that a distance of ~30 cm between microphones makes one microphone capture the signal in antiphase to the other microphone. That makes the final mix sound moody. This filter helps to solve phasing problems by adding different delays to each microphone track and make them synchronized.

The best result can be reached when you take one track as base and synchronize other tracks one by one with it. Remember that synchronization/delay tolerance depends on sample rate, too. Higher sample rates will give more tolerance.

The filter accepts the following parameters:

mm Set millimeters distance. This is compensation distance for fine tuning. Default is 0. cm Set cm distance. This is compensation distance for tightening distance setup. Default is 0. m Set meters distance. This is compensation distance for hard distance setup. Default is 0. dry Set dry amount. Amount of unprocessed (dry) signal. Default is 0. wet Set wet amount. Amount of processed (wet) signal. Default is 1. temp Set temperature in degrees Celsius. This is the temperature of the environment. Default is 20.

Apply headphone crossfeed filter.

Crossfeed is the process of blending the left and right channels of stereo audio recording. It is mainly used to reduce extreme stereo separation of low frequencies.

The intent is to produce more speaker like sound to the listener.

The filter accepts the following options:

strength Set strength of crossfeed. Default is 0.2. Allowed range is from 0 to 1. This sets gain of low shelf filter for side part of stereo image. Default is -6dB. Max allowed is -30db when strength is set to 1. range Set soundstage wideness. Default is 0.5. Allowed range is from 0 to 1. This sets cut off frequency of low shelf filter. Default is cut off near 1550 Hz. With range set to 1 cut off frequency is set to 2100 Hz. slope Set curve slope of low shelf filter. Default is 0.5. Allowed range is from 0.01 to 1. level_in Set input gain. Default is 0.9. level_out Set output gain. Default is 1.

This filter supports the all above options as commands.

Simple algorithm to expand audio dynamic range.

The filter accepts the following options:

i Sets the intensity of effect (default: 2.0). Must be in range between 0.0 (unchanged sound) to 10.0 (maximum effect). c Enable clipping. By default is enabled.

This filter supports the all above options as commands.

Apply a DC shift to the audio.

This can be useful to remove a DC offset (caused perhaps by a hardware problem in the recording chain) from the audio. The effect of a DC offset is reduced headroom and hence volume. The astats filter can be used to determine if a signal has a DC offset.

shift Set the DC shift, allowed range is [-1, 1]. It indicates the amount to shift the audio. limitergain Optional. It should have a value much less than 1 (e.g. 0.05 or 0.02) and is used to prevent clipping.

Apply de-essing to the audio samples.

i Set intensity for triggering de-essing. Allowed range is from 0 to 1. Default is 0. m Set amount of ducking on treble part of sound. Allowed range is from 0 to 1. Default is 0.5. f How much of original frequency content to keep when de-essing. Allowed range is from 0 to 1. Default is 0.5. s Set the output mode. It accepts the following values: i Pass input unchanged. o Pass ess filtered out. e Pass only ess. Default value is o .

Measure audio dynamic range.

DR values of 14 and higher is found in very dynamic material. DR of 8 to 13 is found in transition material. And anything less that 8 have very poor dynamics and is very compressed.

The filter accepts the following options:

length Set window length in seconds used to split audio into segments of equal length. Default is 3 seconds.

Dynamic Audio Normalizer.

This filter applies a certain amount of gain to the input audio in order to bring its peak magnitude to a target level (e.g. 0 dBFS). However, in contrast to more "simple" normalization algorithms, the Dynamic Audio Normalizer *dynamically* re-adjusts the gain factor to the input audio. This allows for applying extra gain to the "quiet" sections of the audio while avoiding distortions or clipping the "loud" sections. In other words: The Dynamic Audio Normalizer will "even out" the volume of quiet and loud sections, in the sense that the volume of each section is brought to the same target level. Note, however, that the Dynamic Audio Normalizer achieves this goal *without* applying "dynamic range compressing". It will retain 100% of the dynamic range *within* each section of the audio file.

framelen, f Set the frame length in milliseconds. In range from 10 to 8000 milliseconds. Default is 500 milliseconds. The Dynamic Audio Normalizer processes the input audio in small chunks, referred to as frames. This is required, because a peak magnitude has no meaning for just a single sample value. Instead, we need to determine the peak magnitude for a contiguous sequence of sample values. While a "standard" normalizer would simply use the peak magnitude of the complete file, the Dynamic Audio Normalizer determines the peak magnitude individually for each frame. The length of a frame is specified in milliseconds. By default, the Dynamic Audio Normalizer uses a frame length of 500 milliseconds, which has been found to give good results with most files. Note that the exact frame length, in number of samples, will be determined automatically, based on the sampling rate of the individual input audio file. gausssize, g Set the Gaussian filter window size. In range from 3 to 301, must be odd number. Default is 31. Probably the most important parameter of the Dynamic Audio Normalizer is the window size of the Gaussian smoothing filter. The filter’s window size is specified in frames, centered around the current frame. For the sake of simplicity, this must be an odd number. Consequently, the default value of 31 takes into account the current frame, as well as the 15 preceding frames and the 15 subsequent frames. Using a larger window results in a stronger smoothing effect and thus in less gain variation, i.e. slower gain adaptation. Conversely, using a smaller window results in a weaker smoothing effect and thus in more gain variation, i.e. faster gain adaptation. In other words, the more you increase this value, the more the Dynamic Audio Normalizer will behave like a "traditional" normalization filter. On the contrary, the more you decrease this value, the more the Dynamic Audio Normalizer will behave like a dynamic range compressor. peak, p Set the target peak value. This specifies the highest permissible magnitude level for the normalized audio input. This filter will try to approach the target peak magnitude as closely as possible, but at the same time it also makes sure that the normalized signal will never exceed the peak magnitude. A frame’s maximum local gain factor is imposed directly by the target peak magnitude. The default value is 0.95 and thus leaves a headroom of 5%*. It is not recommended to go above this value. maxgain, m Set the maximum gain factor. In range from 1.0 to 100.0. Default is 10.0. The Dynamic Audio Normalizer determines the maximum possible (local) gain factor for each input frame, i.e. the maximum gain factor that does not result in clipping or distortion. The maximum gain factor is determined by the frame’s highest magnitude sample. However, the Dynamic Audio Normalizer additionally bounds the frame’s maximum gain factor by a predetermined (global) maximum gain factor. This is done in order to avoid excessive gain factors in "silent" or almost silent frames. By default, the maximum gain factor is 10.0, For most inputs the default value should be sufficient and it usually is not recommended to increase this value. Though, for input with an extremely low overall volume level, it may be necessary to allow even higher gain factors. Note, however, that the Dynamic Audio Normalizer does not simply apply a "hard" threshold (i.e. cut off values above the threshold). Instead, a "sigmoid" threshold function will be applied. This way, the gain factors will smoothly approach the threshold value, but never exceed that value. targetrms, r Set the target RMS. In range from 0.0 to 1.0. Default is 0.0 - disabled. By default, the Dynamic Audio Normalizer performs "peak" normalization. This means that the maximum local gain factor for each frame is defined (only) by the frame’s highest magnitude sample. This way, the samples can be amplified as much as possible without exceeding the maximum signal level, i.e. without clipping. Optionally, however, the Dynamic Audio Normalizer can also take into account the frame’s root mean square, abbreviated RMS. In electrical engineering, the RMS is commonly used to determine the power of a time-varying signal. It is therefore considered that the RMS is a better approximation of the "perceived loudness" than just looking at the signal’s peak magnitude. Consequently, by adjusting all frames to a constant RMS value, a uniform "perceived loudness" can be established. If a target RMS value has been specified, a frame’s local gain factor is defined as the factor that would result in exactly that RMS value. Note, however, that the maximum local gain factor is still restricted by the frame’s highest magnitude sample, in order to prevent clipping. coupling, n Enable channels coupling. By default is enabled. By default, the Dynamic Audio Normalizer will amplify all channels by the same amount. This means the same gain factor will be applied to all channels, i.e. the maximum possible gain factor is determined by the "loudest" channel. However, in some recordings, it may happen that the volume of the different channels is uneven, e.g. one channel may be "quieter" than the other one(s). In this case, this option can be used to disable the channel coupling. This way, the gain factor will be determined independently for each channel, depending only on the individual channel’s highest magnitude sample. This allows for harmonizing the volume of the different channels. correctdc, c Enable DC bias correction. By default is disabled. An audio signal (in the time domain) is a sequence of sample values. In the Dynamic Audio Normalizer these sample values are represented in the -1.0 to 1.0 range, regardless of the original input format. Normally, the audio signal, or "waveform", should be centered around the zero point. That means if we calculate the mean value of all samples in a file, or in a single frame, then the result should be 0.0 or at least very close to that value. If, however, there is a significant deviation of the mean value from 0.0, in either positive or negative direction, this is referred to as a DC bias or DC offset. Since a DC bias is clearly undesirable, the Dynamic Audio Normalizer provides optional DC bias correction. With DC bias correction enabled, the Dynamic Audio Normalizer will determine the mean value, or "DC correction" offset, of each input frame and subtract that value from all of the frame’s sample values which ensures those samples are centered around 0.0 again. Also, in order to avoid "gaps" at the frame boundaries, the DC correction offset values will be interpolated smoothly between neighbouring frames. altboundary, b Enable alternative boundary mode. By default is disabled. The Dynamic Audio Normalizer takes into account a certain neighbourhood around each frame. This includes the preceding frames as well as the subsequent frames. However, for the "boundary" frames, located at the very beginning and at the very end of the audio file, not all neighbouring frames are available. In particular, for the first few frames in the audio file, the preceding frames are not known. And, similarly, for the last few frames in the audio file, the subsequent frames are not known. Thus, the question arises which gain factors should be assumed for the missing frames in the "boundary" region. The Dynamic Audio Normalizer implements two modes to deal with this situation. The default boundary mode assumes a gain factor of exactly 1.0 for the missing frames, resulting in a smooth "fade in" and "fade out" at the beginning and at the end of the input, respectively. compress, s Set the compress factor. In range from 0.0 to 30.0. Default is 0.0. By default, the Dynamic Audio Normalizer does not apply "traditional" compression. This means that signal peaks will not be pruned and thus the full dynamic range will be retained within each local neighbourhood. However, in some cases it may be desirable to combine the Dynamic Audio Normalizer’s normalization algorithm with a more "traditional" compression. For this purpose, the Dynamic Audio Normalizer provides an optional compression (thresholding) function. If (and only if) the compression feature is enabled, all input frames will be processed by a soft knee thresholding function prior to the actual normalization process. Put simply, the thresholding function is going to prune all samples whose magnitude exceeds a certain threshold value. However, the Dynamic Audio Normalizer does not simply apply a fixed threshold value. Instead, the threshold value will be adjusted for each individual frame. In general, smaller parameters result in stronger compression, and vice versa. Values below 3.0 are not recommended, because audible distortion may appear. threshold, t Set the target threshold value. This specifies the lowest permissible magnitude level for the audio input which will be normalized. If input frame volume is above this value frame will be normalized. Otherwise frame may not be normalized at all. The default value is set to 0, which means all input frames will be normalized. This option is mostly useful if digital noise is not wanted to be amplified.

This filter supports the all above options as commands.

Make audio easier to listen to on headphones.

This filter adds ‘cues’ to 44.1kHz stereo (i.e. audio CD format) audio so that when listened to on headphones the stereo image is moved from inside your head (standard for headphones) to outside and in front of the listener (standard for speakers).

Ported from SoX.

Apply a two-pole peaking equalisation (EQ) filter. With this filter, the signal-level at and around a selected frequency can be increased or decreased, whilst (unlike bandpass and bandreject filters) that at all other frequencies is unchanged.

In order to produce complex equalisation curves, this filter can be given several times, each with a different central frequency.

The filter accepts the following options:

frequency, f Set the filter’s central frequency in Hz. width_type, t Set method to specify band-width of filter. h Hz q Q-Factor o octave s slope k kHz width, w Specify the band-width of a filter in width_type units. gain, g Set the required gain or attenuation in dB. Beware of clipping when using a positive gain. mix, m How much to use filtered signal in output. Default is 1. Range is between 0 and 1. channels, c Specify which channels to filter, by default all available are filtered. normalize, n Normalize biquad coefficients, by default is disabled. Enabling it will normalize magnitude response at DC to 0dB. transform, a Set transform type of IIR filter. di dii tdii

Attenuate 10 dB at 1000 Hz, with a bandwidth of 200 Hz: equalizer=f=1000:t=h:width=200:g=-10

Apply 2 dB gain at 1000 Hz with Q 1 and attenuate 5 dB at 100 Hz with Q 2: equalizer=f=1000:t=q:w=1:g=2,equalizer=f=100:t=q:w=2:g=-5

This filter supports the following commands:

frequency, f Change equalizer frequency. Syntax for the command is : " frequency " width_type, t Change equalizer width_type. Syntax for the command is : " width_type " width, w Change equalizer width. Syntax for the command is : " width " gain, g Change equalizer gain. Syntax for the command is : " gain " mix, m Change equalizer mix. Syntax for the command is : " mix "

Linearly increases the difference between left and right channels which adds some sort of "live" effect to playback.

The filter accepts the following options:

m Sets the difference coefficient (default: 2.5). 0.0 means mono sound (average of both channels), with 1.0 sound will be unchanged, with -1.0 left and right channels will be swapped. c Enable clipping. By default is enabled.

This filter supports the all above options as commands.

Apply FIR Equalization using arbitrary frequency response.

The filter accepts the following option:

gain Set gain curve equation (in dB). The expression can contain variables: f the evaluated frequency sr sample rate ch channel number, set to 0 when multichannels evaluation is disabled chid channel id, see libavutil/channel_layout.h, set to the first channel id when multichannels evaluation is disabled chs number of channels chlayout channel_layout, see libavutil/channel_layout.h and functions: gain_interpolate(f) interpolate gain on frequency f based on gain_entry cubic_interpolate(f) same as gain_interpolate, but smoother This option is also available as command. Default is gain_interpolate(f) . gain_entry Set gain entry for gain_interpolate function. The expression can contain functions: entry(f, g) store gain entry at frequency f with value g This option is also available as command. delay Set filter delay in seconds. Higher value means more accurate. Default is 0.01 . accuracy Set filter accuracy in Hz. Lower value means more accurate. Default is 5 . wfunc Set window function. Acceptable values are: rectangular rectangular window, useful when gain curve is already smooth hann hann window (default) hamming hamming window blackman blackman window nuttall3 3-terms continuous 1st derivative nuttall window mnuttall3 minimum 3-terms discontinuous nuttall window nuttall 4-terms continuous 1st derivative nuttall window bnuttall minimum 4-terms discontinuous nuttall (blackman-nuttall) window bharris blackman-harris window tukey tukey window fixed If enabled, use fixed number of audio samples. This improves speed when filtering with large delay. Default is disabled. multi Enable multichannels evaluation on gain. Default is disabled. zero_phase Enable zero phase mode by subtracting timestamp to compensate delay. Default is disabled. scale Set scale used by gain. Acceptable values are: linlin linear frequency, linear gain linlog linear frequency, logarithmic (in dB) gain (default) loglin logarithmic (in octave scale where 20 Hz is 0) frequency, linear gain loglog logarithmic frequency, logarithmic gain dumpfile Set file for dumping, suitable for gnuplot. dumpscale Set scale for dumpfile. Acceptable values are same with scale option. Default is linlog. fft2 Enable 2-channel convolution using complex FFT. This improves speed significantly. Default is disabled. min_phase Enable minimum phase impulse response. Default is disabled.

lowpass at 1000 Hz: firequalizer=gain='if(lt(f,1000), 0, -INF)'

lowpass at 1000 Hz with gain_entry: firequalizer=gain_entry='entry(1000,0); entry(1001, -INF)'

custom equalization: firequalizer=gain_entry='entry(100,0); entry(400, -4); entry(1000, -6); entry(2000, 0)'

higher delay with zero phase to compensate delay: firequalizer=delay=0.1:fixed=on:zero_phase=on

lowpass on left channel, highpass on right channel: firequalizer=gain='if(eq(chid,1), gain_interpolate(f), if(eq(chid,2), gain_interpolate(1e6+f), 0))' :gain_entry='entry(1000, 0); entry(1001,-INF); entry(1e6+1000,0)':multi=on

Apply a flanging effect to the audio.

The filter accepts the following options:

delay Set base delay in milliseconds. Range from 0 to 30. Default value is 0. depth Set added sweep delay in milliseconds. Range from 0 to 10. Default value is 2. regen Set percentage regeneration (delayed signal feedback). Range from -95 to 95. Default value is 0. width Set percentage of delayed signal mixed with original. Range from 0 to 100. Default value is 71. speed Set sweeps per second (Hz). Range from 0.1 to 10. Default value is 0.5. shape Set swept wave shape, can be triangular or sinusoidal . Default value is sinusoidal . phase Set swept wave percentage-shift for multi channel. Range from 0 to 100. Default value is 25. interp Set delay-line interpolation, linear or quadratic . Default is linear .

Apply Haas effect to audio.

Note that this makes most sense to apply on mono signals. With this filter applied to mono signals it give some directionality and stretches its stereo image.

The filter accepts the following options:

level_in Set input level. By default is 1 , or 0dB level_out Set output level. By default is 1 , or 0dB. side_gain Set gain applied to side part of signal. By default is 1 . middle_source Set kind of middle source. Can be one of the following: ‘ left ’ Pick left channel. ‘ right ’ Pick right channel. ‘ mid ’ Pick middle part signal of stereo image. ‘ side ’ Pick side part signal of stereo image. middle_phase Change middle phase. By default is disabled. left_delay Set left channel delay. By default is 2.05 milliseconds. left_balance Set left channel balance. By default is -1 . left_gain Set left channel gain. By default is 1 . left_phase Change left phase. By default is disabled. right_delay Set right channel delay. By defaults is 2.12 milliseconds. right_balance Set right channel balance. By default is 1 . right_gain Set right channel gain. By default is 1 . right_phase Change right phase. By default is enabled.

Decodes High Definition Compatible Digital (HDCD) data. A 16-bit PCM stream with embedded HDCD codes is expanded into a 20-bit PCM stream.

The filter supports the Peak Extend and Low-level Gain Adjustment features of HDCD, and detects the Transient Filter flag.

ffmpeg -i HDCD16.flac -af hdcd OUT24.flac

When using the filter with wav, note the default encoding for wav is 16-bit, so the resulting 20-bit stream will be truncated back to 16-bit. Use something like -acodec pcm_s24le after the filter to get 24-bit PCM output.

ffmpeg -i HDCD16.wav 