qsort is called indirectly in filter_frame, suggesting its performance
criticality. AV_QSORT is substantially faster due to the inlining of the
comparison callback. Thus, the increase in performance should be worth
the increase in binary size.
This optimization is just a low hanging fruit. The trac ticket 1430 is
a request for an improved deshake filter.
Sample benchmark (x86-64, Haswell, GNU/Linux):
File: original from https://trac.ffmpeg.org/ticket/1430
command: ffmpeg -stream_loop 8 -i file.webm -vf deshake=rx=64:ry=64 -f null -
Timer truncated at 1024 runs.
new:
28260 decicycles in qsort, 1 runs, 0 skips
35570 decicycles in qsort, 2 runs, 0 skips
39010 decicycles in qsort, 4 runs, 0 skips
46897 decicycles in qsort, 8 runs, 0 skips
40442 decicycles in qsort, 16 runs, 0 skips
41611 decicycles in qsort, 32 runs, 0 skips
40345 decicycles in qsort, 64 runs, 0 skips
38967 decicycles in qsort, 128 runs, 0 skips
38647 decicycles in qsort, 256 runs, 0 skips
40238 decicycles in qsort, 512 runs, 0 skips
39676 decicycles in qsort, 1024 runs, 0 skips
old:
1740280 decicycles in qsort, 1 runs, 0 skips
923560 decicycles in qsort, 2 runs, 0 skips
511330 decicycles in qsort, 4 runs, 0 skips
309720 decicycles in qsort, 8 runs, 0 skips
194900 decicycles in qsort, 16 runs, 0 skips
142686 decicycles in qsort, 32 runs, 0 skips
112516 decicycles in qsort, 64 runs, 0 skips
98166 decicycles in qsort, 128 runs, 0 skips
88147 decicycles in qsort, 256 runs, 0 skips
88706 decicycles in qsort, 512 runs, 0 skips
86783 decicycles in qsort, 1024 runs, 0 skips
Reviewed-by: Nicolas George <george@nsup.org>
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Ganesh Ajjanagadde <gajjanagadde@gmail.com>