]> git.sesse.net Git - x264/log
x264
11 years agoFix y4m input with C420paldv colorspace
Anton Mitrofanov [Tue, 26 Mar 2013 15:54:36 +0000 (19:54 +0400)]
Fix y4m input with C420paldv colorspace

11 years agox86: correctly check stack alignment for Atom hadamard_ac
Fiona Glaser [Sat, 2 Mar 2013 09:22:29 +0000 (01:22 -0800)]
x86: correctly check stack alignment for Atom hadamard_ac

Regression in r2265 (only affected compilers with broken stack alignment,
like ICL on win32).

11 years agox86inc: fix some corner cases of SWAP
Loren Merritt [Mon, 25 Feb 2013 21:23:55 +0000 (21:23 +0000)]
x86inc: fix some corner cases of SWAP

SWAP with >=3 named (rather than numbered) args
PERMUTE followed by SWAP with 2 named args
used to produce the wrong permutation

11 years agoFix array overreads that caused miscompilation in gcc 4.8
Fiona Glaser [Wed, 27 Feb 2013 21:30:22 +0000 (13:30 -0800)]
Fix array overreads that caused miscompilation in gcc 4.8

11 years agoFix undefined behavior in x264_ratecontrol_mb
Fiona Glaser [Thu, 28 Feb 2013 21:32:37 +0000 (13:32 -0800)]
Fix undefined behavior in x264_ratecontrol_mb

11 years agoARM: Fix bug in x264_quant_4x4x4_neon
Stefan Groenroos [Fri, 1 Mar 2013 20:35:34 +0000 (22:35 +0200)]
ARM: Fix bug in x264_quant_4x4x4_neon

Regression in r2273.

11 years agoARM: update NEON mc_chroma to work with NV12 and re-enable it
Stefan Groenroos [Mon, 25 Feb 2013 21:43:09 +0000 (23:43 +0200)]
ARM: update NEON mc_chroma to work with NV12 and re-enable it

Up to 10-15% faster overall.

11 years agoCABAC/CAVLC: use the new bit-iterating macro here too
Fiona Glaser [Thu, 14 Feb 2013 23:00:48 +0000 (15:00 -0800)]
CABAC/CAVLC: use the new bit-iterating macro here too

11 years agoquant_4x4x4: quant one 8x8 block at a time
Fiona Glaser [Fri, 8 Feb 2013 23:34:38 +0000 (15:34 -0800)]
quant_4x4x4: quant one 8x8 block at a time

This reduces overhead and lets us use less branchy code for zigzag, dequant,
decimate, and so on.
Reorganize and optimize a lot of macroblock_encode using this new function.
~1-2% faster overall.

Includes NEON and x86 versions of the new function.
Using larger merged functions like this will also make wider SIMD, like
AVX2, more effective.

11 years agoAdd AvxSynth support to the AviSynth input module.
Stephen Hutchinson [Wed, 13 Feb 2013 02:55:43 +0000 (21:55 -0500)]
Add AvxSynth support to the AviSynth input module.

Uses dlopen to load AvxSynth on Linux and OS X.

Allows the use of --demuxer avs for AvxSynth, though the only source filter it
can currently use is FFMS2.

Add a local copy of avxsynth_c.h and its dependent headers in extras/ so that
users don't need to actually have AvxSynth development headers installed to
enable support for it (mirroring the AviSynth behavior).

Based on a patch by 0x09 (tab@lavabit.com)

11 years agoEliminate some branchiness in ME/analysis
Fiona Glaser [Fri, 8 Feb 2013 08:13:15 +0000 (00:13 -0800)]
Eliminate some branchiness in ME/analysis

Faster, fewer branch mispredictions.

11 years agoFix some store forwarding stalls
Fiona Glaser [Thu, 7 Feb 2013 00:55:39 +0000 (16:55 -0800)]
Fix some store forwarding stalls
There's quite a few others, but most of them don't help to fix or there's no
easy way to avoid them.

11 years agox86: faster AVX satd/sa8d/sa8d_satd/hadamard_ac
Fiona Glaser [Tue, 5 Feb 2013 09:23:23 +0000 (01:23 -0800)]
x86: faster AVX satd/sa8d/sa8d_satd/hadamard_ac

Use Conroe-style movddup in AVX transforms; both Sandy Bridge and Bulldozer
do movddup in the load unit, so it's totally free this way.

On Sandy Bridge:
~6% faster sa8d_satd
~5% faster hadamard_ac
~9% faster 32-bit satd
~2% faster sa8d

11 years agox86: detect Bobcat, improve Atom optimizations, reorganize flags
Fiona Glaser [Sat, 2 Feb 2013 20:37:08 +0000 (12:37 -0800)]
x86: detect Bobcat, improve Atom optimizations, reorganize flags

The Bobcat has a 64-bit SIMD unit reminiscent of the Athlon 64; detect this
and apply the appropriate flags.

It also has an extremely slow palignr instruction; create a flag for this to
avoid massive penalties on palignr-heavy functions.

Improve Atom function selection and document exactly what the SLOW_ATOM flag
covers.

Add Atom-optimized SATD/SA8D/hadamard_ac functions: simply combine the ssse3
optimizations with the sse2 algorithm to avoid pmaddubsw, which is slow on
Atom along with other SIMD multiplies.

Drop TBM detection; it'll probably never be useful for x264.

Invert FastShuffle to SlowShuffle; it only ever applied to one CPU (Conroe).

Detect CMOV, to fail more gracefully when run on a chip with MMX2 but no CMOV.

11 years agox86: combined SA8D/SATD dsp function
Oskar Arvidsson [Sat, 19 Jan 2013 00:47:09 +0000 (01:47 +0100)]
x86: combined SA8D/SATD dsp function

Speedup is most apparent for 8-bit (~30%), but gives some improvements
for 10-bit too (~12%).
64-bit only for now.

11 years agox86: port SSE2+ SATD functions to high bit depth
Oskar Arvidsson [Tue, 29 Jan 2013 22:44:32 +0000 (23:44 +0100)]
x86: port SSE2+ SATD functions to high bit depth

Makes SATD 20-50% faster across all partition sizes but 4x4.

11 years agox86: faster high bit depth ssd
Oskar Arvidsson [Wed, 6 Feb 2013 01:07:53 +0000 (02:07 +0100)]
x86: faster high bit depth ssd

About 15% faster on average.

11 years agox86: optimize and clean up predictor checking
Fiona Glaser [Sat, 19 Jan 2013 06:55:46 +0000 (22:55 -0800)]
x86: optimize and clean up predictor checking
Branchlessly handle elimination of candidates in MMX roundclip asm.
Add a new asm function, similar to roundclip, except without the round part.
Optimize and organize the C code, and make both subme>=3 and subme<3 consistent.
Add lots of explanatory comments and try to make things a little more understandable.
~5-10% faster with subme>=3, ~15-20% faster with subme<3.

11 years agoFix two bugs in predictor checking
Fiona Glaser [Tue, 22 Jan 2013 20:31:55 +0000 (12:31 -0800)]
Fix two bugs in predictor checking
pmv wasn't checked properly in some cases, as well as zero vector.
Output-changing portion of the following patch.

11 years agoImprove lookahead-threads auto selection
Fiona Glaser [Thu, 10 Jan 2013 21:15:52 +0000 (13:15 -0800)]
Improve lookahead-threads auto selection
Smarter decision to improve fast-first-pass performance in 2-pass encodes.
Dramatically improves CPU utilization on multi-core systems.

Tested on a quad-core Ivy Bridge (12 threads, 1080p):
Fast first pass:
veryfast:     ~7% faster
faster:      ~11% faster
fast/medium: ~15% faster
slow/slower: ~42% faster
veryslow:    ~55% faster
CRF/1-pass:
veryfast:     ~9% faster
(all others remained the same)

11 years agox86: Use SSE instead of SSE2 for copying data
Henrik Gramner [Sun, 27 Jan 2013 22:01:59 +0000 (23:01 +0100)]
x86: Use SSE instead of SSE2 for copying data

Reduces code size because movaps/movups is one byte shorter than movdqa/movdqu.
Also merge MMX and SSE versions of memcpy_aligned into a single macro.

11 years ago64-bit cabac optimizations
Henrik Gramner [Sun, 13 Jan 2013 17:27:08 +0000 (18:27 +0100)]
64-bit cabac optimizations

~4% faster PIC

WIN64:
~3% faster and 16 byte shorter cabac_encode_bypass
~8% faster cabac_encode_terminal
Benchmarked on Ivy Bridge

UNIX64:
One instruction less in cabac_encode_bypass

11 years agoconfigure: add QNX support
Mike Gorchak [Sun, 3 Feb 2013 07:35:00 +0000 (23:35 -0800)]
configure: add QNX support

11 years agoWindows: Enable DEP and ASLR
Henrik Gramner [Sun, 20 Jan 2013 18:35:06 +0000 (19:35 +0100)]
Windows: Enable DEP and ASLR

11 years agox86inc: Set ELF hidden visibility for global constants
Henrik Gramner [Thu, 17 Jan 2013 18:17:24 +0000 (19:17 +0100)]
x86inc: Set ELF hidden visibility for global constants

11 years agox86inc: Add cvisible macro for C functions with public prefix
Diego Biurrun [Thu, 17 Jan 2013 10:18:31 +0000 (11:18 +0100)]
x86inc: Add cvisible macro for C functions with public prefix

This allows defining externally visible library symbols.

Signed-off-by: Diego Biurrun <diego@biurrun.de>
11 years agox86inc: rename program_name to private_prefix
Diego Biurrun [Thu, 17 Jan 2013 19:30:37 +0000 (11:30 -0800)]
x86inc: rename program_name to private_prefix
Synced from libav.
The new name is more descriptive and will allow defining a separate public
prefix for externally visible library symbols.

11 years agox264.h: improve x264_encoder_reconfig documentation
Fiona Glaser [Mon, 14 Jan 2013 13:35:30 +0000 (05:35 -0800)]
x264.h: improve x264_encoder_reconfig documentation

11 years agoCosmetics: stricter definition of parameterless functions
Henrik Gramner [Sat, 16 Feb 2013 18:36:50 +0000 (19:36 +0100)]
Cosmetics: stricter definition of parameterless functions

11 years agoUpdate "Install and compile x264" in doc/regression_test.txt
Neil [Mon, 28 Jan 2013 02:47:38 +0000 (10:47 +0800)]
Update "Install and compile x264" in doc/regression_test.txt

11 years agoFix possible non-determinism with mbtree + open-gop + sync-lookahead
Anton Mitrofanov [Thu, 24 Jan 2013 08:11:26 +0000 (12:11 +0400)]
Fix possible non-determinism with mbtree + open-gop + sync-lookahead

Code assumed keyframe analysis would only pull one frame off the list; this
isn't true with open-gop.

11 years agox86: don't use the red zone on win64
Anton Mitrofanov [Mon, 25 Feb 2013 15:28:19 +0000 (19:28 +0400)]
x86: don't use the red zone on win64

11 years agox86-64: fix trellis asm with interlacing
Fiona Glaser [Mon, 11 Feb 2013 00:12:34 +0000 (16:12 -0800)]
x86-64: fix trellis asm with interlacing

Regression in r2145.
Assembly assumed array was [2][64] when it was actually [2][63].
Tiny (~0.1%) compression improvement.

11 years agox86-32: use simple nop codes for <= sse
Ronald S. Bultje [Wed, 30 Jan 2013 17:48:14 +0000 (09:48 -0800)]
x86-32: use simple nop codes for <= sse

The "CentaurHauls family 6 model 9 stepping 8" family of CPUs (flags:
fpu vme de pse tsc msr cx8 sep mtrr pge mov pat mmx fxsr sse up rng
rng_en ace ace_en) SIGILLs on long nop codes.

11 years agoBump dates to 2013
Loren Merritt [Tue, 8 Jan 2013 21:30:57 +0000 (21:30 +0000)]
Bump dates to 2013

11 years agox86inc: Drop tzcnt workaround
Henrik Gramner [Mon, 17 Dec 2012 20:54:00 +0000 (21:54 +0100)]
x86inc: Drop tzcnt workaround

It is no longer needed now that we've bumped the version requirement of yasm to 1.2.0.

11 years agoAVX2/FMA3 version of mbtree_propagate
Fiona Glaser [Mon, 12 Nov 2012 18:28:53 +0000 (10:28 -0800)]
AVX2/FMA3 version of mbtree_propagate
First AVX2 function for testing.
Bump yasm version to 1.2.0 for AVX2 support.

11 years agox86inc: Use VEX-encoded instructions in AVX functions
Henrik Gramner [Tue, 11 Dec 2012 15:05:34 +0000 (16:05 +0100)]
x86inc: Use VEX-encoded instructions in AVX functions
Automatically use VEX-encoding in AVX/AVX2/XOP/FMA3/FMA4 functions for all instructions that exists in a VEX-encoded version.
This change makes it easier to extend existing code to use AVX2.
Also add support for AVX emulation of a few instructions that were missing before.

11 years agox86inc: activate REP_RET automatically
Loren Merritt [Sun, 2 Dec 2012 15:56:30 +0000 (15:56 +0000)]
x86inc: activate REP_RET automatically
Now RET checks whether it immediately follows a branch, so the programmer dosen't have to keep track of that condition.
REP_RET is still needed manually when it's a branch target, but that's much rarer.
The implementation involves lots of spurious labels, but that's ok because we strip them.

11 years agox86inc: support stack mem allocation and re-alignment in PROLOGUE
Ronald S. Bultje [Thu, 6 Dec 2012 23:40:13 +0000 (15:40 -0800)]
x86inc: support stack mem allocation and re-alignment in PROLOGUE
Use this in 8-bit loopfilter functions so they can be used if
there is no aligned stack (e.g. x86-32 MSVC or ICC 10.x).

11 years agoUpdate config.guess and config.sub
Henrik Gramner [Mon, 17 Dec 2012 21:15:02 +0000 (22:15 +0100)]
Update config.guess and config.sub

11 years agoFix crash if the first frame is forced to a non-keyframe
Anton Mitrofanov [Tue, 8 Jan 2013 21:29:49 +0000 (13:29 -0800)]
Fix crash if the first frame is forced to a non-keyframe
This is obviously bad user input, but x264 shouldn't crash if it happens.

11 years agoFix build on ARM with binutils >= 2.23.51.0.6
Bernhard Rosenkränzer [Sun, 30 Dec 2012 20:18:00 +0000 (12:18 -0800)]
Fix build on ARM with binutils >= 2.23.51.0.6
GAS doesn't seem to like spaces in vld1 anymore, so remove those.

11 years agoFix pthread_join emulation on win32 and BeOS
Anton Mitrofanov [Fri, 23 Nov 2012 14:26:53 +0000 (18:26 +0400)]
Fix pthread_join emulation on win32 and BeOS
Doesn't actually affect x264, but it's more correct.

11 years agoFix typo in r2222
Fiona Glaser [Tue, 27 Nov 2012 15:50:51 +0000 (07:50 -0800)]
Fix typo in r2222
Slightly wrong numbers in level table.

11 years agoconfigure: fix gpac detection with -Wp,-D_FORTIFY_SOURCE=2
Sergio Basto [Fri, 23 Nov 2012 02:02:50 +0000 (18:02 -0800)]
configure: fix gpac detection with -Wp,-D_FORTIFY_SOURCE=2

11 years agoSolaris: use sysconf to get processor count
Sean McGovern [Fri, 23 Nov 2012 02:01:16 +0000 (18:01 -0800)]
Solaris: use sysconf to get processor count
Solaris responds correctly to the same value as Cygwin, so let's use that.

11 years agolavf input: allocate AVFrame correctly
Anton Khirnov [Tue, 13 Nov 2012 20:01:24 +0000 (21:01 +0100)]
lavf input: allocate AVFrame correctly
Allocate AVFrames correctly with avcodec_alloc_frame().
This caused crashes with newer libavcodecs that try to free frame extradata.

11 years agoFix crash when using libx264.dll compiled with ICL for X86_64
Anton Mitrofanov [Sat, 10 Nov 2012 23:44:02 +0000 (03:44 +0400)]
Fix crash when using libx264.dll compiled with ICL for X86_64

11 years agoFix possible issues with out-of-spec QP values
Anton Mitrofanov [Thu, 8 Nov 2012 22:31:10 +0000 (02:31 +0400)]
Fix possible issues with out-of-spec QP values
Fixes a possible regression in r2228.

11 years agoAttempt to optimize PPS pic_init_qp in 2-pass mode
Fiona Glaser [Wed, 26 Sep 2012 20:49:02 +0000 (13:49 -0700)]
Attempt to optimize PPS pic_init_qp in 2-pass mode
Small compression improvement; up to ~0.5% in extreme cases.
Helps more with small slice sizes (tiny resolutions or slice-max-size).
Note that this changes the 2-pass stats file format.

11 years agoImprove slice header QP selection
Fiona Glaser [Wed, 26 Sep 2012 20:05:00 +0000 (13:05 -0700)]
Improve slice header QP selection
Use the first macroblock of each slice instead of the last of the previous.
Lets us pick a reasonable initial QP for the first slice too.
Slightly improved compression.

11 years agoUpdate level dpb size calculation to match newer H.264 spec
Fiona Glaser [Thu, 11 Oct 2012 20:27:48 +0000 (13:27 -0700)]
Update level dpb size calculation to match newer H.264 spec
Doesn't actually change encoding behavior, but makes it more correct.
Warning messages should now be accurate at higher bit depths and non-4:2:0.
Technically, since it redefines x264_level_t, this is an API version increment.

11 years agoAdd support for the ffmpeg/vapoursynth high bit depth y4m extensions
Jan Ekström [Sun, 7 Oct 2012 18:12:05 +0000 (21:12 +0300)]
Add support for the ffmpeg/vapoursynth high bit depth y4m extensions

11 years agox86inc: Rename 3dnow2 to 3dnowext
Diego Biurrun [Tue, 6 Nov 2012 13:48:56 +0000 (14:48 +0100)]
x86inc: Rename 3dnow2 to 3dnowext
The name "3dnowext" is more common than "3dnow2". Doesn't affect x264.

11 years agox86inc: only define program_name if the macro is unset.
Diego Biurrun [Wed, 31 Oct 2012 19:23:54 +0000 (12:23 -0700)]
x86inc: only define program_name if the macro is unset.
This allows overriding the value from outside the file.
This can be useful if x86inc.asm is used outside of x264.

11 years agoDisable ARM NEON MRC CPU test for Apple devices
David Wolstencroft [Mon, 29 Oct 2012 16:07:39 +0000 (09:07 -0700)]
Disable ARM NEON MRC CPU test for Apple devices
The Apple A6 CPU doesn't support performance counters, so this test caused a crash.

11 years agoFix crash with no-scenecut + mbtree
Fiona Glaser [Tue, 6 Nov 2012 20:03:20 +0000 (12:03 -0800)]
Fix crash with no-scenecut + mbtree

11 years agoFix reconfiguring to crf=0
Anton Mitrofanov [Fri, 12 Oct 2012 19:43:40 +0000 (23:43 +0400)]
Fix reconfiguring to crf=0
Lossless mode can't currently be enabled mid-stream.

11 years agoFix ALIGNED_ARRAY_EMU macros on ICL
Derek Buitenhuis [Mon, 17 Sep 2012 18:09:20 +0000 (11:09 -0700)]
Fix ALIGNED_ARRAY_EMU macros on ICL
ICL's preprocessor doesn't handle it correctly.
This fix is similar to libav's fix in 0db2d9.

11 years agoFix use of deprecated av_close_input_file call
Jason Martens [Thu, 13 Sep 2012 18:20:40 +0000 (11:20 -0700)]
Fix use of deprecated av_close_input_file call

11 years agoFix pkg-config for dynamic vs static linking
Brad Smith [Wed, 26 Sep 2012 21:13:27 +0000 (14:13 -0700)]
Fix pkg-config for dynamic vs static linking

11 years agoSet libm in the configure script if the OS has libm
Brad Smith [Tue, 11 Sep 2012 00:52:04 +0000 (17:52 -0700)]
Set libm in the configure script if the OS has libm
Prerequisite for another configure patch after this.
Idea copied from libpthread.

11 years agoEnhance mb_info: add mb_info_update
Fiona Glaser [Thu, 16 Aug 2012 20:40:32 +0000 (13:40 -0700)]
Enhance mb_info: add mb_info_update
This feature lets the callee know which decoded macroblocks have changed.

11 years agoFix mb_info_free with sliced threads
Fiona Glaser [Thu, 16 Aug 2012 20:01:17 +0000 (13:01 -0700)]
Fix mb_info_free with sliced threads
x264 would free mb_info before it was completely done using it.

11 years agoEnhance nalu_process
Fiona Glaser [Tue, 7 Aug 2012 19:43:26 +0000 (12:43 -0700)]
Enhance nalu_process
Add the input frame opaque pointer to the arguments.
This makes it easier to use with multiple simultaneous x264 encodes.

11 years agoImprove mb_info constant mb optimization
Fiona Glaser [Mon, 6 Aug 2012 21:55:35 +0000 (14:55 -0700)]
Improve mb_info constant mb optimization
Allow fast skipping even if the pskip MV isn't zero.

11 years agoExport the average effective CRF of each frame
Fiona Glaser [Mon, 30 Jul 2012 19:58:34 +0000 (12:58 -0700)]
Export the average effective CRF of each frame
Useful to judge the resulting quality of a frame when VBV is enabled.

11 years agoRemove special-casing for OpenBSD pthread handling
Brad Smith [Tue, 21 Aug 2012 06:58:19 +0000 (23:58 -0700)]
Remove special-casing for OpenBSD pthread handling
Previously it was policy to use -pthread, but OpenBSD now recommends -lpthread.
its been libpthread anyway and policy has changed to stop using -pthread.

11 years agox86inc: automatically insert vzeroupper for YMM functions
Ronald S. Bultje [Fri, 27 Jul 2012 01:01:49 +0000 (18:01 -0700)]
x86inc: automatically insert vzeroupper for YMM functions
Backported from libav.

11 years agoFree user supplied data when deleting a frame
Kieran Kunhya [Tue, 24 Jul 2012 15:47:45 +0000 (08:47 -0700)]
Free user supplied data when deleting a frame
This eliminates a memory leak when calling x264_encoder_close.

11 years agoRevert r2204
Fiona Glaser [Wed, 18 Jul 2012 15:33:41 +0000 (08:33 -0700)]
Revert r2204
People don't seem to like this so I'm just going to get rid of it.

11 years agoFaster predictor checking with subme<3
Fiona Glaser [Tue, 10 Jul 2012 21:10:44 +0000 (14:10 -0700)]
Faster predictor checking with subme<3
Fix a typo that made an early-skip less effective.
Avoid a relatively unpredictable branch.
Slightly changed output due to the typo-fix.
~50 cycles faster on Core i7.

11 years agoTry 8x8 transform analysis even when sub8x8 partitions are present
Fiona Glaser [Tue, 26 Jun 2012 01:01:29 +0000 (18:01 -0700)]
Try 8x8 transform analysis even when sub8x8 partitions are present
Turn off the sub8x8 partitions, try it, and turn them back on if it didn't help.
Small compression improvement with p4x4 on (~0.1-0.5%).
Also update related comments.

11 years agoSupport changing resolutions between passes with macroblock-tree
Fiona Glaser [Sat, 9 Jun 2012 01:19:59 +0000 (18:19 -0700)]
Support changing resolutions between passes with macroblock-tree
Implement a basic separable bilinear filter to rescale the quantizer offsets.
Structure inspired by swscale, but floating-point instead of fixed-point.
Not as optimized as it could be, but it's quite fast already.

Example compression penalties on a 720p video game recording:
First pass with 720p and second as 480p: ~-1.5% (vs. same res)
First pass with 480p and second as 720p: ~-3% (vs. same res)

11 years agoPrint elapsed time in encoding progress indicator
Alexander Prikhodko [Tue, 12 Jun 2012 17:21:35 +0000 (20:21 +0300)]
Print elapsed time in encoding progress indicator

11 years agoCap ratecontrol predictor parameters
Anton Mitrofanov [Sat, 2 Jun 2012 17:27:50 +0000 (21:27 +0400)]
Cap ratecontrol predictor parameters
Limits VBV mispredictions after long periods of relatively constant video.

11 years agox86inc: import patches from libav
Loren Merritt [Tue, 3 Jul 2012 21:38:04 +0000 (14:38 -0700)]
x86inc: import patches from libav
Allow manual invocation of WIN64_SPILL_XMM even under INIT_MMX
SSE version of mova is movaps rather than movdqa.
YMM version of movnta.
Add mp size for named arguments.
Fix DEFINE_ARGS when used outside of a cglobal.
Define a few more cpuflags.
3-argument wrappers for a few more instructions.

11 years agoFix crash with --fps 0
Anton Mitrofanov [Fri, 22 Jun 2012 18:02:24 +0000 (22:02 +0400)]
Fix crash with --fps 0
Fix some integer overflows and check input parameters better.
Also fix incorrect type specifiers for demuxer info printing.

12 years agoThreaded lookahead
Fiona Glaser [Tue, 8 May 2012 22:42:56 +0000 (15:42 -0700)]
Threaded lookahead

Split each lookahead frame analysis call into multiple threads.  Has a small
impact on quality, but does not seem to be consistently any worse.

This helps alleviate bottlenecks with many cores and frame threads. In many
case, this massively increases performance on many-core systems.  For example,
over 100% faster 1080p encoding with --preset veryfast on a 12-core i7 system.
Realtime 1080p30 at --preset slow should now be feasible on real systems.

For sliced-threads, this patch should be faster regardless of settings (~10%).

By default, lookahead threads are 1/6 of regular threads.  This isn't exacting,
but it seems to work well for all presets on real systems.  With sliced-threads,
it's the same as the number of encoding threads.

12 years agoAdd support for RGB formats in bit-depth conversion filter
Anton Mitrofanov [Fri, 4 May 2012 13:18:12 +0000 (17:18 +0400)]
Add support for RGB formats in bit-depth conversion filter

12 years agoFix some bugs in mb_info code
Anton Mitrofanov [Sat, 12 May 2012 09:57:49 +0000 (13:57 +0400)]
Fix some bugs in mb_info code

12 years agoAdd mb_info API for signalling constant macroblocks
Fiona Glaser [Thu, 29 Mar 2012 21:14:07 +0000 (14:14 -0700)]
Add mb_info API for signalling constant macroblocks
Some use-cases of x264 involve encoding video with large constant areas of the frame.
Sometimes, the caller knows which areas these are, and can tell x264.
This API lets the caller do this and adds internal tracking of modifications to macroblocks to avoid problems.
This is really only suitable without B-frames.
An example use-case would be using x264 for VNC.

12 years agoFaster chroma weight cost calculation
Henrik Gramner [Fri, 6 Apr 2012 22:40:09 +0000 (00:40 +0200)]
Faster chroma weight cost calculation

New assembly function with SSE2, SSSE3 and XOP implementations for calculating absolute sum of differences.

12 years agoAdd Level 5.2 support
Lucien [Sat, 31 Mar 2012 12:42:49 +0000 (13:42 +0100)]
Add Level 5.2 support

12 years agoEradicate all mention of Extended Profile
Henrik Gramner [Thu, 12 Apr 2012 17:14:43 +0000 (19:14 +0200)]
Eradicate all mention of Extended Profile
x264 never supported it and never will because nobody uses it.

12 years agoFix disabling of mbtree when using 2pass encoding and zones
Anton Mitrofanov [Tue, 3 Apr 2012 17:46:52 +0000 (21:46 +0400)]
Fix disabling of mbtree when using 2pass encoding and zones

12 years agoconfigure: force select -mXX gcc option for i386/x86-64
Alexander Prikhodko [Sat, 31 Mar 2012 09:06:21 +0000 (12:06 +0300)]
configure: force select -mXX gcc option for i386/x86-64
Makes multilib compilation more convenient.

12 years agoUpdate config.guess and config.sub
Rafaël Carré [Mon, 16 Apr 2012 01:20:14 +0000 (21:20 -0400)]
Update config.guess and config.sub
Adds support for a bunch of targets, including:
aarch64 (armv8)
arm-linux-androideabi

12 years agoconfigure: correct use of RC variable and add --extra-rcflags
Alexander Prikhodko [Sat, 31 Mar 2012 08:33:41 +0000 (11:33 +0300)]
configure: correct use of RC variable and add --extra-rcflags

12 years agoICL/MSVS: Fix shared library generation and usage
Steven Walters [Thu, 29 Mar 2012 01:15:04 +0000 (21:15 -0400)]
ICL/MSVS: Fix shared library generation and usage
MSVS requires exported variables to be declared with the DATA keyword, and requires that imported variables be declared with dllimport.
This does not fix x264 cli being unable to use a shared library built by ICL however.

12 years agoFix intra-refresh + hrd
Kieran Kunhya [Tue, 27 Mar 2012 16:38:56 +0000 (17:38 +0100)]
Fix intra-refresh + hrd

12 years agoFix frame input colorspace check
Anton Mitrofanov [Sun, 25 Mar 2012 13:34:24 +0000 (17:34 +0400)]
Fix frame input colorspace check

12 years agoFix comment in deblock.c
Fiona Glaser [Thu, 22 Mar 2012 20:56:50 +0000 (13:56 -0700)]
Fix comment in deblock.c
The code does, in fact, handle CAVLC+8x8dct correctly already.

12 years agoFix sliced-threads ratecontrol bug
Fiona Glaser [Tue, 13 Mar 2012 21:37:26 +0000 (14:37 -0700)]
Fix sliced-threads ratecontrol bug
Was using qp instead of qscale; could cause NANs (not to mention less accurate results).

12 years agoFix clobbering of mutex/cvs
Anton Mitrofanov [Mon, 12 Mar 2012 06:08:18 +0000 (23:08 -0700)]
Fix clobbering of mutex/cvs
Regression in r2183.
Bizarrely seemed to work on many platforms, but crashed on win64 and may have been slower.
Only affected sliced threads during encoding, but could cause crashes on x264 encoder close even without sliced threads.

12 years agoSliced-threads: do hpel and deblock after returning
Fiona Glaser [Fri, 24 Feb 2012 21:34:39 +0000 (13:34 -0800)]
Sliced-threads: do hpel and deblock after returning
Lowers encoding latency around 14% in sliced threads mode with preset superfast.
Additionally, even if there is no waiting time between frames, this improves parallelism, because hpel+deblock are done during the (singlethreaded) lookahead.
For ease of debugging, dump-yuv forces all of the threads to wait and finish instead of setting b_full_recon.

12 years agoAdd full-recon API option
Fiona Glaser [Fri, 24 Feb 2012 21:16:52 +0000 (13:16 -0800)]
Add full-recon API option
Fully reconstruct frames even without dump-yuv.

12 years agox86inc: switch to amdnops
Fiona Glaser [Wed, 22 Feb 2012 21:33:36 +0000 (13:33 -0800)]
x86inc: switch to amdnops
Recent AMD CPUs' instruction decoders choke horribly on extremely long nops (i.e. with 4 prefixes).
Won't affect much, since we don't use ALIGN much.

12 years agoBMI1 decimate functions
Fiona Glaser [Wed, 15 Feb 2012 00:54:03 +0000 (16:54 -0800)]
BMI1 decimate functions
Intel was nice enough to make tzcnt equal to "rep bsf", which is backwards-compatible.
This means we don't actually have to add new functions to make it work.