aarch64: cabac_encode_{decision,bypass,terminal}_asm
benchmarks on a Nexus 9 (nvidia denver):
101.3 cycles in x264_cabac_encode_decision_c,
67105369 runs, 3495 skips
97.3 cycles in x264_cabac_encode_decision_asm,
67105493 runs, 3371 skips
132.8 cycles in x264_cabac_encode_terminal_c,
1046950 runs, 1626 skips
116.1 cycles in x264_cabac_encode_terminal_asm,
1048424 runs, 152 skips
92.4 cycles in x264_cabac_encode_bypass_c,
16776192 runs, 1024 skips
89.6 cycles in x264_cabac_encode_bypass_asm,
16776453 runs, 763 skips
Cycle counts are not as stable as one would like. The dynamic code
optimisation seems to produce different results for small chnages in a
binary. Repeated runs with the same binary produce stable results
though (ignoring the first run).