Martin Storsjö [Sat, 29 Mar 2014 10:35:11 +0000 (12:35 +0200)]
golomb: Fix the implementation of get_se_golomb_long
This was only used in hevc muxing code so far.
This makes the return values match what get_se_golomb returns for
the same bitstream reader instances.
The logic for producing a signed golomb code out of an unsigned one
was based on the corresponding code in get_se_golomb, which operated
directly on the bitstream reader buffer - not on the equivalent
return value from get_ue_golomb.
CC: libav-stable@libav.org Signed-off-by: Martin Storsjö <martin@martin.st>
Ben Avison [Thu, 20 Mar 2014 18:58:40 +0000 (18:58 +0000)]
truehd: add hand-scheduled ARM asm version of ff_mlp_pack_output.
Profiling results for overall decode and the output_data function in
particular are as follows:
Before After
Mean StdDev Mean StdDev Confidence Change
6:2 total 339.6 15.1 329.3 16.0 95.8% +3.1% (insignificant)
6:2 function 24.6 6.0 9.9 3.1 100.0% +148.5%
8:2 total 324.5 15.5 323.6 14.3 15.2% +0.3% (insignificant)
8:2 function 20.4 3.9 9.9 3.4 100.0% +104.7%
6:6 total 572.8 20.6 539.9 24.2 100.0% +6.1%
6:6 function 54.5 5.6 16.0 3.8 100.0% +240.9%
8:8 total 741.5 21.2 702.5 18.5 100.0% +5.6%
8:8 function 63.9 7.6 18.4 4.8 100.0% +247.3%
The assembly version has also been tested with a fuzz tester to ensure that
any combinations of inputs not exercised by my available test streams still
generate mathematically identical results to the C version.
Ben Avison [Thu, 20 Mar 2014 18:58:38 +0000 (18:58 +0000)]
truehd: tune VLC decoding for ARM.
Profiling on a Raspberry Pi revealed the best performance to correspond
with VLC_BITS = 5. Results for overall audio decode and the get_vlc2 function
in particular are as follows:
Before After
Mean StdDev Mean StdDev Confidence Change
6:2 total 348.8 20.1 339.6 15.1 88.8% +2.7% (insignificant)
6:2 function 38.1 8.1 26.4 4.1 100.0% +44.5%
8:2 total 339.1 15.4 324.5 15.5 99.4% +4.5%
8:2 function 33.8 7.0 27.3 5.6 99.7% +23.6%
6:6 total 604.6 20.8 572.8 20.6 100.0% +5.6%
6:6 function 95.8 8.4 68.9 8.2 100.0% +39.1%
8:8 total 766.4 17.6 741.5 21.2 100.0% +3.4%
8:8 function 106.0 11.4 86.1 9.9 100.0% +23.1%
Ben Avison [Thu, 20 Mar 2014 18:58:37 +0000 (18:58 +0000)]
truehd: add hand-scheduled ARM asm version of ff_mlp_rematrix_channel.
Profiling results for overall audio decode and the rematrix_channels function
in particular are as follows:
Before After
Mean StdDev Mean StdDev Confidence Change
6:2 total 370.8 17.0 348.8 20.1 99.9% +6.3%
6:2 function 46.4 8.4 45.8 6.6 18.0% +1.2% (insignificant)
8:2 total 343.2 19.0 339.1 15.4 54.7% +1.2% (insignificant)
8:2 function 38.9 3.9 40.2 6.9 52.4% -3.2% (insignificant)
6:6 total 658.4 15.7 604.6 20.8 100.0% +8.9%
6:6 function 109.0 8.7 59.5 5.4 100.0% +83.3%
8:8 total 896.2 24.5 766.4 17.6 100.0% +16.9%
8:8 function 223.4 12.8 93.8 5.0 100.0% +138.3%
The assembly version has also been tested with a fuzz tester to ensure that
any combinations of inputs not exercised by my available test streams still
generate mathematically identical results to the C version.
Ben Avison [Thu, 20 Mar 2014 18:58:35 +0000 (18:58 +0000)]
truehd: add hand-scheduled ARM asm version of mlp_filter_channel.
Profiling results for overall audio decode and the mlp_filter_channel(_arm)
function in particular are as follows:
Before After
Mean StdDev Mean StdDev Confidence Change
6:2 total 380.4 22.0 370.8 17.0 87.4% +2.6% (insignificant)
6:2 function 60.7 7.2 36.6 8.1 100.0% +65.8%
8:2 total 357.0 17.5 343.2 19.0 97.8% +4.0% (insignificant)
8:2 function 60.3 8.8 37.3 3.8 100.0% +61.8%
6:6 total 717.2 23.2 658.4 15.7 100.0% +8.9%
6:6 function 140.4 12.9 81.5 9.2 100.0% +72.4%
8:8 total 981.9 16.2 896.2 24.5 100.0% +9.6%
8:8 function 193.4 15.0 103.3 11.5 100.0% +87.2%
Experiments with adding preload instructions to this function yielded no
useful benefit, so these have not been included.
The assembly version has also been tested with a fuzz tester to ensure that
any combinations of inputs not exercised by my available test streams still
generate mathematically identical results to the C version.
Anton Khirnov [Mon, 17 Mar 2014 09:03:47 +0000 (10:03 +0100)]
avconv: rewrite output data size tracking
Store a variable per OutputStream instead of globals for
audio/video/extradata. This makes the code simpler and cleaner and fixes
2pass with multiple output streams.
Peter Krefting [Thu, 6 Feb 2014 12:51:39 +0000 (12:51 +0000)]
configure: Remove dcbzl check for e500v1 and e500v2 architectures
The DCBZL instruction is not available for the e500v1 and e500v2
architectures, but may still be recognized by the toolchain, so we
need to explicitly disable it for these architectures.
References: PowerPC™ e500 Core Family Reference Manual (Freescale)