]> git.sesse.net Git - x264/log
x264
15 years agoGSOC merge part 3: ARM NEON pixel assembly functions
David Conrad [Sun, 23 Aug 2009 06:55:29 +0000 (23:55 -0700)]
GSOC merge part 3: ARM NEON pixel assembly functions
SAD, SADX3/X4, SSD, SATD, SA8D, Hadamard_AC, VAR, VAR2, SSIM

15 years agoGSOC merge part 2: ARM stack alignment
David Conrad [Sun, 23 Aug 2009 06:40:33 +0000 (23:40 -0700)]
GSOC merge part 2: ARM stack alignment
Neither GCC nor ARMCC support 16 byte stack alignment despite the fact that NEON loads require it.
These macros only work for arrays, but fortunately that covers almost all instances of stack alignment in x264.

15 years agoFix unaligned accesses in bitstream writer
David Conrad [Fri, 21 Aug 2009 03:44:09 +0000 (20:44 -0700)]
Fix unaligned accesses in bitstream writer
Fixes x264 on CPUs with no unaligned access support (e.g. SPARC).
Improves performance marginally on CPUs with penalties for unaligned stores (e.g. some x86).

15 years agoFix bug in calculation of I-frame costs with AQ.
Fiona Glaser [Thu, 20 Aug 2009 20:08:25 +0000 (13:08 -0700)]
Fix bug in calculation of I-frame costs with AQ.

15 years agoGSOC merge part 1: Framework for ARM assembly optimizations
David Conrad [Thu, 20 Aug 2009 00:03:02 +0000 (17:03 -0700)]
GSOC merge part 1: Framework for ARM assembly optimizations
x264 will detect which ARM core it's building for and only build NEON asm if the target is ARMv6 or above, then enable NEON at runtime.

15 years agoFix a bug in checkasm and two OSX fixes
David Conrad [Wed, 19 Aug 2009 23:18:36 +0000 (16:18 -0700)]
Fix a bug in checkasm and two OSX fixes
MC chroma checkasm test could crash in some situations
Remove -lmx, as it's not needed and the iPhone doesn't have it.
Remove unused sqrtf emulation; it breaks if math.h is included.

15 years agoImprove QPRD
Fiona Glaser [Wed, 19 Aug 2009 08:49:47 +0000 (01:49 -0700)]
Improve QPRD
Always check the last macroblock's QP, even if the normal search doesn't reach it.
Raise the failure threshold when moving towards the last macroblock's QP.
0.2-1% improved compression.

15 years agoFix MB-tree with keyint<3
Fiona Glaser [Wed, 19 Aug 2009 04:53:28 +0000 (21:53 -0700)]
Fix MB-tree with keyint<3
Also slightly improve VBV keyint handling.

15 years agoFix bug in VBV lookahead + no MB-tree
Fiona Glaser [Wed, 19 Aug 2009 02:25:45 +0000 (19:25 -0700)]
Fix bug in VBV lookahead + no MB-tree
I-frames need to have VBV lookahead run on them as well.

15 years agoAdd support for frame-accurate parameter changes
Fiona Glaser [Wed, 19 Aug 2009 01:37:26 +0000 (18:37 -0700)]
Add support for frame-accurate parameter changes
Parameter structs can now be passed with individual frames.
The previous method would only change the parameter of what was currently being encoded, which due to delay might be very far from an intended exact frame.
Also add support for changing aspect ratio.  Only works in a stream with repeating headers and requires the caller to force an IDR to ensure instant effect.

15 years agoFix x264_encoder_reconfig with multithreading
Fiona Glaser [Tue, 18 Aug 2009 22:46:26 +0000 (15:46 -0700)]
Fix x264_encoder_reconfig with multithreading
New behavior: reconfigging the encoder will result in changes being applied
to each of the encoding threads as they finish encoding the current frame.

15 years agoFix two bugs in QPRD
Fiona Glaser [Sun, 16 Aug 2009 10:29:49 +0000 (03:29 -0700)]
Fix two bugs in QPRD
QPRD could in some cases force blocks to skip when they shouldn't be ~(+0.01db)
Force QPRD to abide by qpmin/qpmax restrictions.

15 years agoLookahead VBV
Fiona Glaser [Sun, 16 Aug 2009 02:02:31 +0000 (19:02 -0700)]
Lookahead VBV
Use the large-scale lookahead capability introduced in MB-tree for ratecontrol purposes.
(Does not require MB-tree, however.)
Greatly improved quality and compliance in 1-pass VBV mode, especially in CBR; +2db OPSNR or more in some cases.
Fix some other bugs in VBV, which should improve non-lookahead mode as well.
Change the tolerance algorithm in row VBV to allow for more significant mispredictions when buffer is nearly full.
Note that due to the fixing of an extremely long-standing bug (>1 year), bitrates may change by nontrivial amounts in CRF without MB-tree.

15 years agoFix bug in b-adapt 1
Fiona Glaser [Fri, 14 Aug 2009 14:20:07 +0000 (07:20 -0700)]
Fix bug in b-adapt 1
B-adapt 1 didn't use more than MAX(1,bframes-1) B-frames when MB-tree was off.

15 years agoFix a potential failure in VBV
Fiona Glaser [Fri, 14 Aug 2009 00:13:33 +0000 (17:13 -0700)]
Fix a potential failure in VBV
If VBV does underflow, ratecontrol could be permanently broken for the rest of the clip.
Revert part of the previous VBV changes to fix this.

15 years agonew API function x264_encoder_delayed_frames.
Anton Mitrofanov [Thu, 13 Aug 2009 21:40:21 +0000 (21:40 +0000)]
new API function x264_encoder_delayed_frames.
fix x264cli on streams whose total length is less than the encoder latency.

15 years agoAdd no-mbtree to fprofile (and fix pyramid in fprofile)
Fiona Glaser [Thu, 13 Aug 2009 21:12:26 +0000 (14:12 -0700)]
Add no-mbtree to fprofile (and fix pyramid in fprofile)

15 years agoDon't print a warning about direct=auto in 2pass when B-frames are off
Fiona Glaser [Sun, 9 Aug 2009 23:06:52 +0000 (16:06 -0700)]
Don't print a warning about direct=auto in 2pass when B-frames are off

15 years agofix lowres padding, which failed to extrapolate the right side for some resolutions.
Loren Merritt [Thu, 13 Aug 2009 05:02:59 +0000 (05:02 +0000)]
fix lowres padding, which failed to extrapolate the right side for some resolutions.
fix a buffer overread in x264_mbtree_propagate_cost_sse2. no effect on actual behavior, only theoretical correctness.
fix x264_slicetype_frame_cost_recalculate on I-frames, which previously used all 0 mb costs.
shut up a valgrind warning in predict_8x8_filter_mmx.

15 years agosimd part of x264_macroblock_tree_propagate.
Loren Merritt [Sun, 9 Aug 2009 04:00:36 +0000 (04:00 +0000)]
simd part of x264_macroblock_tree_propagate.
1.6x faster on conroe.

15 years agoMB-tree fixes:
Loren Merritt [Sat, 8 Aug 2009 14:53:27 +0000 (14:53 +0000)]
MB-tree fixes:
AQ was applied inconsistently, with some AQed costs compared to other non-AQed costs. Strangely enough, fixing this increases SSIM on some sources but decreases it on others. More investigation needed.
Account for weighted bipred.
Reduce memory, increase precision, simplify, and early terminate.

15 years agoAdd missing free()s for new data allocated for MB-tree
Fiona Glaser [Sun, 9 Aug 2009 00:51:01 +0000 (17:51 -0700)]
Add missing free()s for new data allocated for MB-tree
Eliminates a memory leak.

15 years agoFix keyframe insertion with MB-tree and no B-frames
Fiona Glaser [Sat, 8 Aug 2009 19:53:06 +0000 (12:53 -0700)]
Fix keyframe insertion with MB-tree and no B-frames

15 years agoFix MP4 output (bug in malloc checking patch)
Fiona Glaser [Sat, 8 Aug 2009 18:26:36 +0000 (11:26 -0700)]
Fix MP4 output (bug in malloc checking patch)

15 years agoGracefully terminate in the case of a malloc failure
Steven Walters [Fri, 7 Aug 2009 23:18:01 +0000 (16:18 -0700)]
Gracefully terminate in the case of a malloc failure
Fuzz tests show that all mallocs appear to be checked correctly now.

15 years agoFix a potential infinite loop in QPfile parsing on Windows
Anton Mitrofanov [Fri, 7 Aug 2009 17:44:13 +0000 (10:44 -0700)]
Fix a potential infinite loop in QPfile parsing on Windows
ftell doesn't seem to work properly on Windows in text mode.

15 years agoFix delay calculation with multiple threads
Fiona Glaser [Fri, 7 Aug 2009 17:31:16 +0000 (10:31 -0700)]
Fix delay calculation with multiple threads
Delay frames for threading don't actually count as part of lookahead.

15 years agoAdd "veryslow" preset
Fiona Glaser [Fri, 7 Aug 2009 06:09:46 +0000 (23:09 -0700)]
Add "veryslow" preset
Apparently some people are actually *using* placebo, so I've added this preset to bridge the gap.

15 years agoMacroblock-tree ratecontrol
Fiona Glaser [Wed, 5 Aug 2009 00:46:33 +0000 (17:46 -0700)]
Macroblock-tree ratecontrol
On by default; can be turned off with --no-mbtree.
Uses a large lookahead to track temporal propagation of data and weight quality accordingly.
Requires a very large separate statsfile (2 bytes per macroblock) in multi-pass mode.
Doesn't work with b-pyramid yet.
Note that MB-tree inherently measures quality different from the standard qcomp method, so bitrates produced by CRF may change somewhat.
This makes the "medium" preset a bit slower.  Accordingly, make "fast" slower as well, and introduce a new preset "faster" between "fast" and "veryfast".
All presets "fast" and above will have MB-tree on.
Add a new option, --rc-lookahead, to control the distance MB tree looks ahead to perform propagation analysis.
Default is 40; larger values will be slower and require more memory but give more accurate results.
This value will be used in the future to control ratecontrol lookahead (VBV).
Add a new option, --no-psy, to disable all psy optimizations that don't improve PSNR or SSIM.
This disables psy-RD/trellis, but also other more subtle internal psy optimizations that can't be controlled directly via external parameters.
Quality improvement from MB-tree is about 2-70% depending on content.
Strength of MB-tree adjustments can be tweaked using qcompress; higher values mean lower MB-tree strength.
Note that MB-tree may perform slightly suboptimally on fades; this will be fixed by weighted prediction, which is coming soon.

15 years agoVarious 1-pass VBV tweaks
Fiona Glaser [Tue, 4 Aug 2009 03:52:30 +0000 (20:52 -0700)]
Various 1-pass VBV tweaks
Make predictors have an offset in addition to a multiplier.
This primarily fixes issues in sources with lots of extremely static scenes, such as anime and CGI.
We tried linear regressions, but they were very unreliable as predictors.
Also allow VBV to be slightly more aggressive in raising QPs to avoid not having enough bits left in some situations.
Up to 1db improvement on some clips.

15 years agoFix another 10L in QPRD
Fiona Glaser [Wed, 29 Jul 2009 03:41:27 +0000 (20:41 -0700)]
Fix another 10L in QPRD
An entry in subpel_iterations was missing.
I have no idea how QPRD was working at all without this change.

15 years agoUpdate help and cleanup in ratecontrol.c
Fiona Glaser [Tue, 28 Jul 2009 08:16:23 +0000 (01:16 -0700)]
Update help and cleanup in ratecontrol.c
Deal with some out-of-date information.

15 years ago15% faster refine_bidir_satd, 10% faster refine_bidir_rd (or less with trellis=2)
Loren Merritt [Tue, 28 Jul 2009 07:16:31 +0000 (07:16 +0000)]
15% faster refine_bidir_satd, 10% faster refine_bidir_rd (or less with trellis=2)
re-roll a loop (saves 44KB code size, which is the cause of most of this speed gain)
don't re-mc mvs that haven't changed

15 years agoFaster bidir_rd plus some bugfixes
Fiona Glaser [Tue, 28 Jul 2009 04:03:00 +0000 (21:03 -0700)]
Faster bidir_rd plus some bugfixes
Cache chroma MC during refine_bidir_rd and use both the luma and chroma caches to skip MC in macroblock_encode.
Fix incorrect call to rd_cost_part; refine_bidir_rd output was incorrect for i8>0.
Remove some redundant clips.
~12% faster refine_bidir_rd.

15 years agoAdd "fastdecode" tune option
Fiona Glaser [Mon, 27 Jul 2009 11:45:03 +0000 (04:45 -0700)]
Add "fastdecode" tune option
It does what it says it does.

15 years agoFix two bugs in QPRD
Fiona Glaser [Sun, 26 Jul 2009 19:20:09 +0000 (12:20 -0700)]
Fix two bugs in QPRD
fprofile settings now actually fprofile QPRD.
Don't use i_mbrd before initializing it.

15 years agoFix 10l in QPRD
Fiona Glaser [Sun, 26 Jul 2009 10:03:12 +0000 (03:03 -0700)]
Fix 10l in QPRD
Trellis used wrong lambda with trellis=1

15 years agoFix a nondeterminism with threads and subme>7
Fiona Glaser [Sun, 26 Jul 2009 05:31:06 +0000 (22:31 -0700)]
Fix a nondeterminism with threads and subme>7
Also add a few more checks to eliminate the need for spel_border.

15 years agoAdd QPRD support as subme=10
Fiona Glaser [Thu, 23 Jul 2009 19:20:39 +0000 (12:20 -0700)]
Add QPRD support as subme=10
Refactor trellis lambda selection to be done in analyse_init instead of in trellis.
This will allow for more easy adaption of lambda later on; for now it allows constant lambda across variable QPs.
QPRD is only available with adaptive quantization enabled and generally improves SSIM and visual quality.
Additionally, weight the SSD values from RD based on the relative QP offset for chroma; helps visually at high QPs where chroma has a lower QP than luma.
This fixes some visual artifacts created by QPRD at high QPs.
Note that this generally hurts PSNR and SSIM, and so is only on when psy-RD is on.

15 years agoSSSE3 cachesplit workaround for avg2_w16
Fiona Glaser [Wed, 22 Jul 2009 02:56:21 +0000 (19:56 -0700)]
SSSE3 cachesplit workaround for avg2_w16
Palignr-based solution for the most commonly used qpel function.
1-1.5% faster overall on Core 2 chips.

15 years agoshut up valgrind warnings in trellis
Loren Merritt [Wed, 22 Jul 2009 20:20:52 +0000 (20:20 +0000)]
shut up valgrind warnings in trellis

15 years agoNew AQ algorithm option
Anton Mitrofanov [Sat, 18 Jul 2009 23:30:18 +0000 (16:30 -0700)]
New AQ algorithm option
"Auto-variance" uses log(var)^2 instead of log(var) and attempts to adapt strength per-frame.
Generates significantly better SSIM; on by default with --tune ssim.
Whether it generates visually better quality is still up for debate.
Available as --aq-mode 2.

15 years agoCacheline-split SSSE3 chroma MC
Fiona Glaser [Wed, 15 Jul 2009 19:43:35 +0000 (12:43 -0700)]
Cacheline-split SSSE3 chroma MC
~70% faster chroma MC on 32-bit Conroe
Also slightly faster SSSE3 intra_sad_8x8c

15 years agoImprove documentation of qp/crf options
Fiona Glaser [Sun, 12 Jul 2009 19:07:01 +0000 (12:07 -0700)]
Improve documentation of qp/crf options

15 years agoMerge array_non_zero into zigzag_sub
Fiona Glaser [Fri, 10 Jul 2009 02:02:57 +0000 (19:02 -0700)]
Merge array_non_zero into zigzag_sub
Faster lossless, cleaner code.
SSSE3 version of zigzag_sub_4x4_field, faster lossless interlaced coding.

15 years agoFix bug in reference frame autoadjustment
James Darnley [Thu, 9 Jul 2009 18:25:55 +0000 (11:25 -0700)]
Fix bug in reference frame autoadjustment
For some types of input file, x264 did the adjustment before width/height were known.

15 years agoFix fprofile settings to match changes in defaults
Fiona Glaser [Tue, 7 Jul 2009 18:13:39 +0000 (11:13 -0700)]
Fix fprofile settings to match changes in defaults
Also add b-adapt 2 to fprofile.

15 years agoSlightly faster dequant_flat assembly
Fiona Glaser [Fri, 3 Jul 2009 09:33:44 +0000 (02:33 -0700)]
Slightly faster dequant_flat assembly
Eliminate some redundant shifts.

15 years agoTotally new preset system for x264.c (not libx264), new defaults
Fiona Glaser [Thu, 2 Jul 2009 04:14:57 +0000 (21:14 -0700)]
Totally new preset system for x264.c (not libx264), new defaults
Other new features include "tune" and "profile" settings; see --help for more details.
Unlike most other settings, "preset" and "tune" act before all other options.
However, "profile" acts afterwards, overriding all other options.
Our defaults have also changed: new defaults are --subme 7 --bframes 3 --8x8dct --no-psnr --no-ssim --threads auto --ref 3 --mixed-refs --trellis 1 --weightb --crf 23 --progress.
Users will hopefully find these changes to greatly improve usability.

15 years agoUpdate Gabriel's email address in AUTHORS
Fiona Glaser [Wed, 1 Jul 2009 23:33:12 +0000 (16:33 -0700)]
Update Gabriel's email address in AUTHORS

15 years agoEarly termination for chroma encoding
Fiona Glaser [Tue, 30 Jun 2009 22:20:32 +0000 (15:20 -0700)]
Early termination for chroma encoding
Faster chroma encoding by terminating early if heuristics indicate that the block will be DC-only.
This works because the vast majority of inter chroma blocks have no coefficients at all, and those that do are almost always DC-only.
Add two new helper DSP functions for this: dct_dc_8x8 and var2_8x8.  mmx/sse2/ssse3 versions of each.
Early termination is disabled at very low QPs due to it not being useful there.
Performance increase is ~1-2% without trellis, up to 5-6% with trellis=2.
Increase is greater with lower bitrates.

15 years agoFix bug in checkasm
David Conrad [Fri, 26 Jun 2009 20:09:44 +0000 (13:09 -0700)]
Fix bug in checkasm
frame_init_lowres_core check didn't check the C plane.
However, all x86 and PPC assembly was correct regardless of the unit test being incorrect.

15 years agoAdd subpartition cost for sub-8x8 blocks
Fiona Glaser [Wed, 24 Jun 2009 21:39:15 +0000 (14:39 -0700)]
Add subpartition cost for sub-8x8 blocks
Improves sub-p8x8 mode decision.

15 years agoYet more CABAC and CAVLC optimizations
Fiona Glaser [Wed, 24 Jun 2009 20:24:18 +0000 (13:24 -0700)]
Yet more CABAC and CAVLC optimizations
Also clean up a lot of pointless code duplication in CAVLC MV coding.

15 years agoVarious CABAC optimizations and cleanups
Fiona Glaser [Sat, 20 Jun 2009 01:49:55 +0000 (18:49 -0700)]
Various CABAC optimizations and cleanups
Faster CABAC CBF context calculation for inter blocks.
Add x264_constant_p(), will probably be useful in the future as well.
Simpler subpartition functions.
Clean up and optimize mvd_cpn a bit more.
Various other minor optimizations.

15 years agoAltiVec version of frame_init_lowres_core. 22.4x faster than C on PPC7450 and 25x...
David Wolstencroft [Sat, 20 Jun 2009 19:42:55 +0000 (21:42 +0200)]
AltiVec version of frame_init_lowres_core. 22.4x faster than C on PPC7450 and 25x on PPC970MP.

15 years agoMMX CABAC mvd sum calculation
Fiona Glaser [Fri, 19 Jun 2009 23:03:18 +0000 (16:03 -0700)]
MMX CABAC mvd sum calculation
Faster CABAC mvd coding.

15 years agoFaster MV prediction
Fiona Glaser [Fri, 19 Jun 2009 23:02:39 +0000 (16:02 -0700)]
Faster MV prediction
Smaller code size, plus I get to use goto.

15 years agoFix potential crash in checkasm
Fiona Glaser [Wed, 10 Jun 2009 17:37:01 +0000 (10:37 -0700)]
Fix potential crash in checkasm
ssim_end4_sse2 requires aligned sums

15 years agoSSSE3, faster SSE2/MMX integral_init4v
Fiona Glaser [Wed, 10 Jun 2009 17:11:00 +0000 (10:11 -0700)]
SSSE3, faster SSE2/MMX integral_init4v
The real reason I wrote this was an excuse to use shufpd.

15 years agoconfigure check for uclinux
Mike Frysinger [Thu, 11 Jun 2009 08:29:27 +0000 (08:29 +0000)]
configure check for uclinux

15 years agofix a crash on frame width <= 48 pixels
Loren Merritt [Thu, 11 Jun 2009 08:27:46 +0000 (08:27 +0000)]
fix a crash on frame width <= 48 pixels

15 years agoconfigure check for cc, rather than reporting lack of compiler as an asm error.
Loren Merritt [Wed, 27 May 2009 20:47:18 +0000 (20:47 +0000)]
configure check for cc, rather than reporting lack of compiler as an asm error.
configure check for -mno-cygwin, since it's removed from gcc4.

15 years agoa better way to keep track of mv candidates.
Loren Merritt [Sun, 24 May 2009 05:01:26 +0000 (05:01 +0000)]
a better way to keep track of mv candidates.
2-4% faster dia, hex, and umh.

15 years agoreorder some motion estimation patterns.
Loren Merritt [Sun, 24 May 2009 05:01:19 +0000 (05:01 +0000)]
reorder some motion estimation patterns.
this change is useless on its own, but segregates the bitstream-changing part out of my next optimization.

15 years agoFix VBV warning broken in r915
Loren Merritt [Mon, 25 May 2009 23:16:05 +0000 (19:16 -0400)]
Fix VBV warning broken in r915
x264 will now correctly warn about maxrate specified without bufsize even when a level is not set.

15 years agoconfigure check for ssse3-capable binutils
Loren Merritt [Mon, 25 May 2009 07:03:10 +0000 (07:03 +0000)]
configure check for ssse3-capable binutils

15 years agoFix 10L in r1155
Fiona Glaser [Sun, 24 May 2009 20:58:08 +0000 (16:58 -0400)]
Fix 10L in r1155
Broke --me esa/tesa due to forgetting to add handling for x264_cost_mv_fpel.

15 years agoFix bug where satd was incorrectly used with subme<=1
Fiona Glaser [Sat, 23 May 2009 04:28:15 +0000 (21:28 -0700)]
Fix bug where satd was incorrectly used with subme<=1
Faster subme<=1 with i4x4 enabled.

15 years agoRemove some pointless error handling code in cabac/cavlc
Fiona Glaser [Sat, 23 May 2009 03:40:27 +0000 (20:40 -0700)]
Remove some pointless error handling code in cabac/cavlc

15 years agoSave some memory on mv cost arrays
Fiona Glaser [Sat, 23 May 2009 01:40:12 +0000 (18:40 -0700)]
Save some memory on mv cost arrays
Have quantizers that use the same lambda share the same cost array.

15 years agoVarious CABAC and CAVLC optimizations
Fiona Glaser [Fri, 22 May 2009 23:57:33 +0000 (16:57 -0700)]
Various CABAC and CAVLC optimizations
Backport CAVLC partial-inlining early termination to CABAC (~2-4% faster CABAC residual coding)

15 years agofix a race condition at the end of thread_input
Loren Merritt [Tue, 19 May 2009 02:47:15 +0000 (02:47 +0000)]
fix a race condition at the end of thread_input

15 years agoVarious trellis speed optimizations
Fiona Glaser [Tue, 19 May 2009 02:40:45 +0000 (22:40 -0400)]
Various trellis speed optimizations

15 years agoMake i686 the default arch on x86_32
Fiona Glaser [Sat, 16 May 2009 19:16:34 +0000 (12:16 -0700)]
Make i686 the default arch on x86_32
Disabling asm will default to a generic arch.
Also fix configure for gcc 4.4.

15 years agoFaster signed golomb coding
Fiona Glaser [Sat, 16 May 2009 03:07:59 +0000 (20:07 -0700)]
Faster signed golomb coding
3% faster CAVLC RDO and bitstream writing.

15 years agoFaster spatial direct MV prediction
Fiona Glaser [Thu, 14 May 2009 11:11:15 +0000 (04:11 -0700)]
Faster spatial direct MV prediction
unroll/tweak col_zero_flag

15 years agoMore CABAC and CAVLC optimizations
Fiona Glaser [Mon, 4 May 2009 11:19:28 +0000 (04:19 -0700)]
More CABAC and CAVLC optimizations
Simplified function calling for block_residual_write_(cabac|cavlc) and improved sigmap coding.
Tried making 0/1-bit specific versions of CABAC asm, but benefit was minimal under GCC 4.3.
Helped a decent bit under 3.4, but you shouldn't be using such old versions anyways.

15 years agoVarious optimizations in frametype lookahead
Fiona Glaser [Thu, 30 Apr 2009 05:54:52 +0000 (22:54 -0700)]
Various optimizations in frametype lookahead

15 years agoSome cosmetics/cleanup
Fiona Glaser [Mon, 27 Apr 2009 05:13:17 +0000 (22:13 -0700)]
Some cosmetics/cleanup
Move some macros to x86util.asm that should have been there to begin with.
Fix a typo that didn't cause any issues.

15 years agofix "incompatible types in initialization" compilation issues with GCC 4.3 (which...
Guillaume Poirier [Tue, 21 Apr 2009 21:18:44 +0000 (21:18 +0000)]
fix "incompatible types in initialization" compilation issues with GCC 4.3 (which is stricter than previous compiler version)

15 years agofix conversions between vectors with differing element types or numbers of subparts...
Guillaume Poirier [Tue, 21 Apr 2009 15:32:21 +0000 (17:32 +0200)]
fix conversions between vectors with differing element types or numbers of subparts errors

15 years agoAdd "coded blocks" stat to output information.
Fiona Glaser [Sat, 18 Apr 2009 23:07:53 +0000 (16:07 -0700)]
Add "coded blocks" stat to output information.
This measures the total percentage of blocks, intra and inter, which have nonzero coefficients.
"y,uvAC,uvDC" refers to luma, chroma DC, and chroma AC blocks.
Note that skip blocks are included in this stat.

15 years agoEnable asm predict_8x8_filter
Fiona Glaser [Sat, 18 Apr 2009 06:38:29 +0000 (23:38 -0700)]
Enable asm predict_8x8_filter
I'm not entirely sure how this snuck its way out of holger's intra pred patch.

15 years agoRemove various bits of dead code found by CLANG.
Fiona Glaser [Fri, 17 Apr 2009 13:00:39 +0000 (06:00 -0700)]
Remove various bits of dead code found by CLANG.

15 years agoSlightly faster SSE4 SA8D, SSE4 Hadamard_AC, SSE2 SSIM
Fiona Glaser [Tue, 14 Apr 2009 21:47:02 +0000 (14:47 -0700)]
Slightly faster SSE4 SA8D, SSE4 Hadamard_AC, SSE2 SSIM
shufps is the most underrated SSE instruction on x86.

15 years agoVarious CABAC optimizations
Fiona Glaser [Thu, 9 Apr 2009 09:14:41 +0000 (02:14 -0700)]
Various CABAC optimizations
Move calculation of b_intra out of the core residual loop and hardcode it where applicable.
Inlining cabac_mb_mvd was unnecessary and wasted tremendous amounts of code size.  Inlining only cache_mvd is faster and significantly smaller.

15 years agoCAVLC optimizations
Fiona Glaser [Wed, 8 Apr 2009 12:45:03 +0000 (05:45 -0700)]
CAVLC optimizations
faster bs_write_te, port CABAC context selection optimization to CAVLC.

15 years agoFaster CABAC RDO
Fiona Glaser [Sun, 5 Apr 2009 20:01:42 +0000 (13:01 -0700)]
Faster CABAC RDO
Since the bypass case is quite unlikely, especially when doing merged sigmap/level coding,
it's faster to use a branch than a cmov.

15 years agoActivate intra_sad_x3_8x8c in lookahead
Fiona Glaser [Tue, 31 Mar 2009 17:36:57 +0000 (10:36 -0700)]
Activate intra_sad_x3_8x8c in lookahead

15 years agoMBAFF interlaced coding is not allowed in baseline profile
Fiona Glaser [Tue, 31 Mar 2009 17:34:35 +0000 (10:34 -0700)]
MBAFF interlaced coding is not allowed in baseline profile

15 years agointra_sad_x3_8x8 assembly
Fiona Glaser [Tue, 31 Mar 2009 02:30:59 +0000 (19:30 -0700)]
intra_sad_x3_8x8 assembly

15 years agointra_sad_x3_4x4 assembly
Fiona Glaser [Mon, 30 Mar 2009 23:37:46 +0000 (16:37 -0700)]
intra_sad_x3_4x4 assembly

15 years agointra_sad_x3_8x8c assembly
Fiona Glaser [Mon, 30 Mar 2009 11:07:50 +0000 (04:07 -0700)]
intra_sad_x3_8x8c assembly
Also fix intra_sad_x3_16x16's use of "n" as a loop variable (broke SWAP)

15 years agoShave one instruction off CABAC encode_decision
Fiona Glaser [Mon, 30 Mar 2009 01:27:32 +0000 (18:27 -0700)]
Shave one instruction off CABAC encode_decision
range_lps>>6 ranges from 4-7, so (range_lps>>6)-4 == (range_lps>>6) & 3

15 years agoFaster probe_skip
Fiona Glaser [Fri, 27 Mar 2009 05:22:23 +0000 (22:22 -0700)]
Faster probe_skip
Add a second chroma threshold after the DC transform.

15 years agoAdd missing "static" qualifier to two arrays
Fiona Glaser [Thu, 19 Mar 2009 19:28:21 +0000 (12:28 -0700)]
Add missing "static" qualifier to two arrays
Should slightly improve performance.

15 years agoSSE2 zigzag_interleave
Fiona Glaser [Tue, 17 Mar 2009 18:01:57 +0000 (11:01 -0700)]
SSE2 zigzag_interleave
Replace PHADD with FastShuffle (more accurate naming).
This flag represents asm functions that rely on fast SSE2 shuffle units, and thus are only faster on Phenom, Nehalem, and Penryn CPUs.

15 years agoFaster integral_init
Fiona Glaser [Tue, 10 Mar 2009 06:37:53 +0000 (23:37 -0700)]
Faster integral_init
palignr to avoid unaligned loads is worth it in inith, but not initv.

15 years agoFaster SSSE3 hpel_filter_v
Holger Lubitz [Mon, 9 Mar 2009 21:05:16 +0000 (14:05 -0700)]
Faster SSSE3 hpel_filter_v
~10% faster hpel_filter on 64-bit Penryn.
32-bit version by Fiona Glaser.