]> git.sesse.net Git - x264/log
x264
10 years agoarm: implement x264_pixel_var2_8x16_neon
Janne Grunau [Wed, 12 Mar 2014 13:35:31 +0000 (14:35 +0100)]
arm: implement x264_pixel_var2_8x16_neon

checkasm --bench on a cortex-a9:
var2_8x16_c: 5677
var2_8x16_neon: 1421

10 years agoarm: implement x264_pixel_var_8x16_neon
Janne Grunau [Wed, 12 Mar 2014 12:16:00 +0000 (13:16 +0100)]
arm: implement x264_pixel_var_8x16_neon

checkasm --bench on a cortex-a9:
var_8x16_c: 4306
var_8x16_neon: 791

10 years agox86: SSE2 and SSSE3 plane_copy_deinterleave_rgb
Henrik Gramner [Sun, 23 Feb 2014 14:33:48 +0000 (15:33 +0100)]
x86: SSE2 and SSSE3 plane_copy_deinterleave_rgb

About 5.6x faster than C on Haswell.

10 years agox86: Minor mbtree_propagate_cost improvements
Henrik Gramner [Sun, 16 Feb 2014 20:24:54 +0000 (21:24 +0100)]
x86: Minor mbtree_propagate_cost improvements

Reduce the number of registers used from 7 to 6.
Reduce the number of vector registers used by the AVX2 implementation from 8 to 7.
Multiply fps_factor by 1/256 once per frame instead of once per macroblock row.
Use mova instead of movu for dst since it's guaranteed to be aligned.
Some cosmetics.

10 years agox86inc: Support arbitrary stack alignments
Henrik Gramner [Sun, 9 Feb 2014 22:58:04 +0000 (23:58 +0100)]
x86inc: Support arbitrary stack alignments

If the stack is known to be at least 32-byte aligned we can safely store ymm
registers on the stack without doing manual alignment.

Change ALLOC_STACK to always align the stack before allocating stack space for
consistency. Previously alignment would occur either before or after allocating
stack space depending on whether manual alignment was required or not.

10 years agox86inc: warn if XOP integer FMA instruction emulation is impossible
Anton Mitrofanov [Fri, 14 Feb 2014 11:53:58 +0000 (15:53 +0400)]
x86inc: warn if XOP integer FMA instruction emulation is impossible

Emulation requires a temporary register if arguments 1 and 4 are the same; this
doesn't obey the semantics of the original instruction, so we can't emulate
that in x86inc.

ffmpeg has an x86util emulation for that case; I'll add it if x264's asm ever
needs it.

Also add pmacsdql emulation.

10 years agox86inc: free up variable name "n" in global namespace
Loren Merritt [Sat, 1 Mar 2014 02:57:56 +0000 (02:57 +0000)]
x86inc: free up variable name "n" in global namespace

10 years agox86: Pass -Worphan-labels to yasm
Henrik Gramner [Wed, 22 Jan 2014 18:09:12 +0000 (19:09 +0100)]
x86: Pass -Worphan-labels to yasm

Makes it easier to detect typos.

10 years agoWrite 3D metadata when outputting Matroska
Steve Lhomme [Sun, 16 Feb 2014 12:15:09 +0000 (13:15 +0100)]
Write 3D metadata when outputting Matroska

For when --frame-packing is set.

10 years agoDon't set chroma_loc_info_present_flag for non-4:2:0
Anton Mitrofanov [Sun, 23 Feb 2014 12:56:03 +0000 (16:56 +0400)]
Don't set chroma_loc_info_present_flag for non-4:2:0

The H.264 spec says it shouldn't be set in these cases.

10 years agox264.h: fix documentation
Fiona Glaser [Mon, 10 Mar 2014 15:42:50 +0000 (08:42 -0700)]
x264.h: fix documentation

The full details of the return values of encoder_encode and encoder_headers
were mistakenly removed a while ago; re-add them.

10 years agoFix pointer cast warning for 64-bit builds
Anton Mitrofanov [Sun, 23 Feb 2014 11:52:57 +0000 (15:52 +0400)]
Fix pointer cast warning for 64-bit builds

10 years agombaff: fix mb_field_decoding_flag tracking and simplify allow skip check
Anton Mitrofanov [Mon, 10 Mar 2014 12:48:02 +0000 (16:48 +0400)]
mbaff: fix mb_field_decoding_flag tracking and simplify allow skip check

Fixes an issue with too many forced non-skips in mbaff+cavlc, as well as
non-deterministic output with mbaff+cavlc+sliced-threads.

10 years agoFix memory overwrite in x264_deblock_h_chroma_mbaff_sse2
Anton Mitrofanov [Sun, 9 Mar 2014 23:22:57 +0000 (03:22 +0400)]
Fix memory overwrite in x264_deblock_h_chroma_mbaff_sse2

Fixes possible corruption with MBAFF+sliced threads.

10 years agoFix corruption with CAVLC overflow handling in MBAFF+main profile
Fiona Glaser [Sun, 2 Mar 2014 18:09:01 +0000 (10:09 -0800)]
Fix corruption with CAVLC overflow handling in MBAFF+main profile

Probably a regression in r2178.

10 years agoFix checkasm --bench output when nop_cycles is too large
Anton Mitrofanov [Mon, 10 Mar 2014 17:17:19 +0000 (21:17 +0400)]
Fix checkasm --bench output when nop_cycles is too large

10 years agoReally fix quantization factor allocation
Anton Mitrofanov [Wed, 22 Jan 2014 08:54:49 +0000 (12:54 +0400)]
Really fix quantization factor allocation

Actually allocate less (instead of just initialize less) and fix comments.

10 years agoFix build with Android NDK
Yu Xiaolei [Sun, 23 Feb 2014 12:12:51 +0000 (04:12 -0800)]
Fix build with Android NDK

Android NDK does not expose sched_getaffinity.

10 years agox86inc: speed up compilation with yasm
Loren Merritt [Thu, 16 Jan 2014 21:34:46 +0000 (13:34 -0800)]
x86inc: speed up compilation with yasm

Work around yasm's inefficiency with handling large numbers of variables
in the global scope.

10 years agoAdd support for AVC-Intra Class 200
Kieran Kunhya [Fri, 10 Jan 2014 23:27:33 +0000 (23:27 +0000)]
Add support for AVC-Intra Class 200

10 years agov210 input support
James Weaver [Tue, 7 Jan 2014 10:31:58 +0000 (10:31 +0000)]
v210 input support

Assembly based on code by Henrik Gramner and Loren Merritt.

10 years agoFix quantization factor allocation
Fiona Glaser [Tue, 21 Jan 2014 21:39:33 +0000 (13:39 -0800)]
Fix quantization factor allocation

We don't need to wastefully allocate quant tables above QP_MAX_SPEC; they're
never used.

10 years agoAvoid some unneccesary memory loads in macroblock_encode
Henrik Gramner [Wed, 8 Jan 2014 00:06:56 +0000 (01:06 +0100)]
Avoid some unneccesary memory loads in macroblock_encode

10 years agoBump dates to 2014
Henrik Gramner [Sun, 5 Jan 2014 14:25:05 +0000 (15:25 +0100)]
Bump dates to 2014

Also update AUTHORS file and my e-mail address in the headers of various files.

10 years agoRemove tools/xyuv.c
Henrik Gramner [Sun, 5 Jan 2014 23:18:31 +0000 (00:18 +0100)]
Remove tools/xyuv.c

It's an old stand-alone application that isn't relevant to x264.

10 years agoUse 8x16c wrappers with x86 asm functions for 4:2:2 with high bit depth
Anton Mitrofanov [Wed, 6 Nov 2013 22:37:23 +0000 (02:37 +0400)]
Use 8x16c wrappers with x86 asm functions for 4:2:2 with high bit depth

10 years agoCLI: Avoid redundant 16-bit upconversions in piped raw input
Henrik Gramner [Fri, 20 Dec 2013 21:44:28 +0000 (22:44 +0100)]
CLI: Avoid redundant 16-bit upconversions in piped raw input

It's not possible to seek in pipes, so if we want to skip frames we have to read and
discard unused ones. It's pointless to do bit-depth upconversions in those frames.

10 years agoFix input support from named pipes in Windows
Anton Mitrofanov [Fri, 3 Jan 2014 16:06:06 +0000 (20:06 +0400)]
Fix input support from named pipes in Windows

10 years agoFix ARM asm compilation with Apple assembler
Steve Clark [Wed, 20 Nov 2013 17:40:23 +0000 (21:40 +0400)]
Fix ARM asm compilation with Apple assembler

10 years agoFix uninitialized variable
Anton Mitrofanov [Wed, 13 Nov 2013 15:24:48 +0000 (19:24 +0400)]
Fix uninitialized variable

Caused if the timebase is not specified in stats file. Found by Clang.

10 years agoRemove --visualize option.
Anton Mitrofanov [Sun, 27 Oct 2013 15:27:23 +0000 (19:27 +0400)]
Remove --visualize option.

It probably wasn't used or maintained for last few years.

10 years agoAdd L-SMASH support as preferable alternative for MP4-muxing
Anton Mitrofanov [Tue, 15 Oct 2013 08:32:25 +0000 (12:32 +0400)]
Add L-SMASH support as preferable alternative for MP4-muxing

10 years agoAdd AVC-Intra 1080p50/60 Class 100 parameters
Kieran Kunhya [Sat, 21 Sep 2013 18:16:12 +0000 (19:16 +0100)]
Add AVC-Intra 1080p50/60 Class 100 parameters

Also add some compatibility fixes.

10 years agoAdd --filler option
Fiona Glaser [Mon, 9 Sep 2013 19:37:59 +0000 (12:37 -0700)]
Add --filler option

Allows generation of hard-CBR streams without using NAL HRD.
Useful if you want to be able to reconfigure the bitrate (which you can't do
with NAL HRD on).

10 years agoMake x264_encoder_reconfig more threadsafe
Anton Mitrofanov [Sun, 27 Oct 2013 11:22:51 +0000 (15:22 +0400)]
Make x264_encoder_reconfig more threadsafe

Do the reconfig when the next frame's encode begins.
Fixes some rare crashes with frame-threading and encoder_reconfig.

10 years agochroma-me: take shortcut in BI analysis
Fiona Glaser [Fri, 25 Oct 2013 00:19:00 +0000 (17:19 -0700)]
chroma-me: take shortcut in BI analysis

~100 cycles faster with subme>=9

10 years agoCRF-max: don't warn if VBV underflow occurs
Fiona Glaser [Thu, 24 Oct 2013 21:44:43 +0000 (14:44 -0700)]
CRF-max: don't warn if VBV underflow occurs

Only warn if underflow occurs for reasons other than CRF-max, as CRF-max
implies that VBV underflow is desired by the user.

10 years agox86inc: Make ym# behave the same way as xm#
Henrik Gramner [Fri, 18 Oct 2013 20:43:36 +0000 (22:43 +0200)]
x86inc: Make ym# behave the same way as xm#

This makes more sense for future implementations of templates with zmm registers.

10 years agoUse calloc instead of malloc + memset
Henrik Gramner [Fri, 18 Oct 2013 20:21:38 +0000 (22:21 +0200)]
Use calloc instead of malloc + memset

10 years agoReplace gf_malloc with regular malloc in mp4 muxer
Henrik Gramner [Thu, 10 Oct 2013 14:54:12 +0000 (16:54 +0200)]
Replace gf_malloc with regular malloc in mp4 muxer

It was used as a workaround for a bug that only existed in the GPAC repository
for a few weeks back in 2010. There's no reason to keep it anymore.

10 years agoUpdate to current libav/ffmpeg API
Anton Mitrofanov [Tue, 8 Oct 2013 19:20:40 +0000 (23:20 +0400)]
Update to current libav/ffmpeg API

10 years agoversion.sh: change to use /bin/sh
Rafaël Carré [Fri, 25 Oct 2013 14:12:24 +0000 (07:12 -0700)]
version.sh: change to use /bin/sh

10 years agoconfigure: don't generate a git version number if .git isn't present
Sean McGovern [Wed, 4 Sep 2013 21:15:00 +0000 (14:15 -0700)]
configure: don't generate a git version number if .git isn't present

10 years agoconfigure: include dependency libs in the Libs pkg-config
Martin Storsjo [Tue, 3 Sep 2013 21:56:18 +0000 (14:56 -0700)]
configure: include dependency libs in the Libs pkg-config

If only a static library is built, the user of the library that just
tries to link to the lib using the flags provided by pkg-config
might not know that only a static lib exists and that he'd have to
pass --static to pkg-config to get the internal dependencies to
be able to link the library.

For a shared build, the internal dependencies are kept in Libs.private
as before.

This matches how libav's pkg-config files are generated.

10 years agoFix compilation in case of HAVE_LOG2F check fails spuriously
Anton Mitrofanov [Thu, 17 Oct 2013 20:38:06 +0000 (00:38 +0400)]
Fix compilation in case of HAVE_LOG2F check fails spuriously

10 years agoFix compilation of shared library for Windows with original MinGW toolchain
Anton Mitrofanov [Sat, 12 Oct 2013 08:01:57 +0000 (12:01 +0400)]
Fix compilation of shared library for Windows with original MinGW toolchain

10 years agoFix possible crashes in resize and crop filters with high bitdepth input
Anton Mitrofanov [Tue, 8 Oct 2013 19:32:37 +0000 (23:32 +0400)]
Fix possible crashes in resize and crop filters with high bitdepth input

10 years agoFix INSTALL in configure for Solaris systems
Tim Mooney [Tue, 3 Sep 2013 20:43:50 +0000 (13:43 -0700)]
Fix INSTALL in configure for Solaris systems

10 years agoWorkaround for FFMS indexing bug
Henrik Gramner [Tue, 27 Aug 2013 22:50:31 +0000 (00:50 +0200)]
Workaround for FFMS indexing bug

If FFMS_ReadIndex is used with an empty index file it gets stuck in an infinite loop instead of returning NULL
like it's supposed to do on failure. Explicitly check if the file is empty before calling it as a workaround.

10 years agoFix masked access violation in KERNEL32
Anton Mitrofanov [Mon, 26 Aug 2013 17:20:31 +0000 (21:20 +0400)]
Fix masked access violation in KERNEL32

Caused crashes under gdb in Windows and might cause other unknown problems.

10 years agoFix GPAC support on Windows
Hiroki Taniura [Sat, 24 Aug 2013 16:18:57 +0000 (01:18 +0900)]
Fix GPAC support on Windows

10 years agoWindows Unicode support
Henrik Gramner [Sun, 11 Aug 2013 17:50:42 +0000 (19:50 +0200)]
Windows Unicode support

Windows, unlike most other operating systems, uses UTF-16 for Unicode strings while x264 is designed for UTF-8.

This patch does the following in order to handle things like Unicode filenames:
* Keep strings internally as UTF-8.
* Retrieve the CLI command line as UTF-16 and convert it to UTF-8.
* Always use Unicode versions of Windows API functions and convert strings to UTF-16 when calling them.
* Attempt to use legacy 8.3 short filenames for external libraries without Unicode support.

10 years agoAVC-Intra support
Kieran Kunhya [Sat, 20 Jul 2013 17:47:59 +0000 (18:47 +0100)]
AVC-Intra support

This format has been reverse engineered and x264's output has almost exactly
the same bitstream as Panasonic cameras and encoders produce. It therefore does
not comply with SMPTE RP2027 since Panasonic themselves do not comply with
their own specification. It has been tested in Avid, Premiere, Edius and
Quantel.

Parts of this patch were written by Fiona Glaser and some reverse
engineering was done by Joseph Artsimovich.

10 years agoTransparent hugepage support
Henrik Gramner [Mon, 8 Jul 2013 19:06:42 +0000 (12:06 -0700)]
Transparent hugepage support

Combine frame and mb data mallocs into a single large malloc.
Additionally, on Linux systems with hugepage support, ask for hugepages on
large mallocs.

This gives a small performance improvement (~0.2-0.9%) on systems without
hugepage support, as well as a small memory footprint reduction.

On recent Linux kernels with hugepage support enabled (set to madvise or
always), it improves performance up to 4% at the cost of about 7-12% more
memory usage on typical settings..

It may help even more on Haswell and other recent CPUs with improved 2MB page
support in hardware.

10 years agox86: SSSE3 implementation of pixel_sad_x3 and pixel_sad_x4
Henrik Gramner [Fri, 5 Jul 2013 19:15:54 +0000 (21:15 +0200)]
x86: SSSE3 implementation of pixel_sad_x3 and pixel_sad_x4

10 years agox86: Faster AVX2 pixel_sad_x3 and pixel_sad_x4
Henrik Gramner [Fri, 5 Jul 2013 19:15:49 +0000 (21:15 +0200)]
x86: Faster AVX2 pixel_sad_x3 and pixel_sad_x4

10 years agoconfigure: Support cygwin64
Diogo Franco [Wed, 24 Jul 2013 01:17:44 +0000 (22:17 -0300)]
configure: Support cygwin64

10 years agox86inc: Check for __OUTPUT_FORMAT__ having a value of "x64"
Derek Buitenhuis [Fri, 9 Aug 2013 17:39:27 +0000 (13:39 -0400)]
x86inc: Check for __OUTPUT_FORMAT__ having a value of "x64"

This is also a valid value for WIN64.

10 years agoFix cases in which intra refresh allowed prediction from disallowed pixels
Anton Mitrofanov [Tue, 23 Jul 2013 21:11:50 +0000 (14:11 -0700)]
Fix cases in which intra refresh allowed prediction from disallowed pixels

10 years agoFix a few minor bugs found with a static analyzer
Anton Mitrofanov [Tue, 6 Aug 2013 21:56:34 +0000 (01:56 +0400)]
Fix a few minor bugs found with a static analyzer

10 years agoFix AVX2 detection bug with "limit CPUID" enabled in BIOS
Fiona Glaser [Fri, 12 Jul 2013 23:07:35 +0000 (16:07 -0700)]
Fix AVX2 detection bug with "limit CPUID" enabled in BIOS

10 years agox86: Remove X264_CPU_SSE_MISALIGN functions
Henrik Gramner [Fri, 5 Jul 2013 19:15:43 +0000 (21:15 +0200)]
x86: Remove X264_CPU_SSE_MISALIGN functions

Prevents a crash if the misaligned exception mask bit is cleared for some reason.

Misaligned SSE functions are only used on AMD Phenom CPUs and the benefit is miniscule.
They also require modifying the MXCSR control register and by removing those functions
we can get rid of that complexity altogether.

VEX-encoded instructions also supports unaligned memory operands. I tried adding AVX
implementations of all removed functions but there were no performance improvements on
Ivy Bridge. pixel_sad_x3 and pixel_sad_x4 had significant code size reductions though
so I kept them and added some minor cosmetics fixes and tweaks.

10 years agoTweak i16x16-delta-quant-avoidance code
Fiona Glaser [Thu, 20 Jun 2013 22:51:39 +0000 (15:51 -0700)]
Tweak i16x16-delta-quant-avoidance code

Don't omit the delta quant if it'd raise the quantizer to do so; this fixes
a rare flickering issue caused by deblocking.

10 years agox86: faster AVX2 iDCT, AVX deblock_luma_h, deblock_luma_h_intra
Fiona Glaser [Sun, 9 Jun 2013 16:06:27 +0000 (09:06 -0700)]
x86: faster AVX2 iDCT, AVX deblock_luma_h, deblock_luma_h_intra

10 years agoAdd new color primaries, transfer characteristics, matrix coefficients
Lucien [Mon, 17 Jun 2013 18:28:09 +0000 (18:28 +0000)]
Add new color primaries, transfer characteristics, matrix coefficients

10 years agoAdd "--stitchable" option for segmented encoding
Fiona Glaser [Sat, 1 Jun 2013 00:01:29 +0000 (17:01 -0700)]
Add "--stitchable" option for segmented encoding

Stops x264 from attempting to optimize global stream headers, ensuring that
different segments of a video will have identical headers when used with
identical encoding settings.

10 years agoInterface: if vbv-maxrate < bitrate, set bitrate = vbv-maxrate
Fiona Glaser [Thu, 27 Jun 2013 15:29:06 +0000 (08:29 -0700)]
Interface: if vbv-maxrate < bitrate, set bitrate = vbv-maxrate

This probably makes more sense to the user than setting vbv-maxrate = bitrate,
as before.

10 years agoOpenCL cosmetics
Anton Mitrofanov [Tue, 28 May 2013 12:02:42 +0000 (05:02 -0700)]
OpenCL cosmetics

10 years agoFix possible crash when writing very large filler NALUs
Anton Mitrofanov [Mon, 17 Jun 2013 20:16:33 +0000 (00:16 +0400)]
Fix possible crash when writing very large filler NALUs

Bitstream-reallocation function didn't handle the case of filler.

10 years agoFix build with PIC on some systems
Loren Merritt [Mon, 17 Jun 2013 18:27:09 +0000 (11:27 -0700)]
Fix build with PIC on some systems

10 years agoFix potential misaligment crash in AVX2 denoise_dct
Henrik Gramner [Sun, 2 Jun 2013 16:41:17 +0000 (18:41 +0200)]
Fix potential misaligment crash in AVX2 denoise_dct

11 years agoFix building with compilers without inline asm support
Anton Mitrofanov [Mon, 27 May 2013 21:48:15 +0000 (01:48 +0400)]
Fix building with compilers without inline asm support

Also fix crash in high bit depth builds compiled with unaligned stack.

11 years agoFix compilation with OpenCL on MacOS X
Anton Mitrofanov [Wed, 22 May 2013 18:43:59 +0000 (22:43 +0400)]
Fix compilation with OpenCL on MacOS X

Also fix crash in the case of OpenCL error during encoding.

11 years agoOpenCL support improvement/refactoring
Anton Mitrofanov [Mon, 6 May 2013 18:51:11 +0000 (22:51 +0400)]
OpenCL support improvement/refactoring

Autoload the OpenCL library so that it's not required to run an openCL-enabled
build of x264.

Update X264_BUILD, which should have been changed with the first patch.

11 years agox86: shave a few instructions off AVX deblock
Fiona Glaser [Thu, 16 May 2013 20:51:37 +0000 (13:51 -0700)]
x86: shave a few instructions off AVX deblock

11 years agox86: AVX2 dequant_4x4_dc
Henrik Gramner [Tue, 14 May 2013 16:57:40 +0000 (18:57 +0200)]
x86: AVX2 dequant_4x4_dc

11 years agox86: AVX2 high bit-depth dequant
Henrik Gramner [Tue, 14 May 2013 16:53:12 +0000 (18:53 +0200)]
x86: AVX2 high bit-depth dequant

11 years agox86-64: 64-bit variant of AVX2 hpel_filter
Fiona Glaser [Fri, 10 May 2013 00:20:05 +0000 (17:20 -0700)]
x86-64: 64-bit variant of AVX2 hpel_filter

~5% faster than 32-bit.

11 years agox86: AVX2 high bit-depth denoise_dct
Henrik Gramner [Mon, 6 May 2013 16:41:24 +0000 (18:41 +0200)]
x86: AVX2 high bit-depth denoise_dct

28->15 cycles

Also reorder instructions to use fewer registers, 3 cycles faster on Ivy Bridge with 64-bit Windows.

11 years agox86: AVX2 high bit-depth quant
Henrik Gramner [Sat, 4 May 2013 16:48:58 +0000 (18:48 +0200)]
x86: AVX2 high bit-depth quant

quant_4x4: 13->6 cycles
quant_4x4_dc: 14->8 cycles
quant_8x8: 47->24 cycles
quant_4x4x4: 48->25 cycles

11 years agox86: AVX2 add16x16_idct_dc
Fiona Glaser [Wed, 1 May 2013 21:32:11 +0000 (14:32 -0700)]
x86: AVX2 add16x16_idct_dc

27 -> 19 cycles

11 years agox86: faster AVX2 quant_4x4x4
Fiona Glaser [Mon, 29 Apr 2013 23:16:54 +0000 (16:16 -0700)]
x86: faster AVX2 quant_4x4x4

10->9 cycles

11 years agox86: AVX2 intra_sad_x3_8x8c
Fiona Glaser [Sun, 28 Apr 2013 04:03:32 +0000 (21:03 -0700)]
x86: AVX2 intra_sad_x3_8x8c

30->22 cycles

11 years agox86: AVX2 high bit-depth intra_sad_x3_8x8
Henrik Gramner [Sun, 28 Apr 2013 09:11:03 +0000 (11:11 +0200)]
x86: AVX2 high bit-depth intra_sad_x3_8x8

43->24 cycles

11 years agox86: AVX2 deblock strength
Fiona Glaser [Wed, 24 Apr 2013 21:22:15 +0000 (14:22 -0700)]
x86: AVX2 deblock strength

30->18 cycles

11 years agox86: Faster high bit-depth intra_sad_x3_4x4
Henrik Gramner [Wed, 1 May 2013 15:42:48 +0000 (17:42 +0200)]
x86: Faster high bit-depth intra_sad_x3_4x4

20->16 cycles on Ivy Bridge

11 years agox86: faster SSSE3 hpel
Fiona Glaser [Wed, 1 May 2013 00:36:46 +0000 (17:36 -0700)]
x86: faster SSSE3 hpel

~7% faster using the pmulhrsw trick from mc_chroma.

11 years agox86-64: faster SSSE3 trellis
Fiona Glaser [Mon, 29 Apr 2013 21:22:23 +0000 (14:22 -0700)]
x86-64: faster SSSE3 trellis

~2% faster trellis.

11 years agox86: 32-byte align the stack if possible
Fiona Glaser [Fri, 3 May 2013 00:10:26 +0000 (17:10 -0700)]
x86: 32-byte align the stack if possible

Avoids the need for manual 32 byte array alignment on compilers that support
-mpreferred-stack-boundary.

11 years agox86inc: Utilize the shadow space on 64-bit Windows
Henrik Gramner [Sat, 11 May 2013 21:39:09 +0000 (23:39 +0200)]
x86inc: Utilize the shadow space on 64-bit Windows

Store XMM6 and XMM7 in the shadow space in functions that clobbers them.
This way we don't have to adjust the stack pointer as often,
reducing the number of instructions as well as code size.

11 years agox86: Don't use explicitly aligned versions of SAD on AVX CPUs
Henrik Gramner [Fri, 3 May 2013 21:06:10 +0000 (23:06 +0200)]
x86: Don't use explicitly aligned versions of SAD on AVX CPUs

On modern CPUs movdqu isn't slower than movdqa when used on aligned data and using the same code in both cases saves cache.

This was already done for the high bit-depth AVX2 implementation but the aligned version still exists as dead code so remove that.

11 years agox86: Add missing initializations for high bit-depth sad_aligned
Henrik Gramner [Fri, 3 May 2013 18:18:03 +0000 (20:18 +0200)]
x86: Add missing initializations for high bit-depth sad_aligned

11 years agox86: add Jaguar CPU detection
Fiona Glaser [Mon, 13 May 2013 23:52:18 +0000 (16:52 -0700)]
x86: add Jaguar CPU detection

11 years agox86inc: Remove .rodata kludges
Henrik Gramner [Tue, 7 May 2013 15:21:03 +0000 (17:21 +0200)]
x86inc: Remove .rodata kludges

The Mach-O bug was fixed in yasm 0.8.0 and we don't support versions that old.

a.out was superseded by ELF on sane systems a few decades ago.

11 years agocheckasm: Use 64-bit cycle counters
Henrik Gramner [Sat, 4 May 2013 14:21:32 +0000 (16:21 +0200)]
checkasm: Use 64-bit cycle counters

Prevents overflows that can occur in some cases.

11 years agocheckasm: Fix stack alignment bug
Henrik Gramner [Fri, 10 May 2013 11:55:32 +0000 (13:55 +0200)]
checkasm: Fix stack alignment bug

11 years agoFix invalid memcpy in sliced-threads
Fiona Glaser [Wed, 8 May 2013 17:48:41 +0000 (10:48 -0700)]
Fix invalid memcpy in sliced-threads

Likely didn't actually break in practice, but memcpy with src==dst
is incorrect.

11 years agoFix two bugs in slice-min-mbs and slices-max
Fiona Glaser [Mon, 29 Apr 2013 19:14:01 +0000 (12:14 -0700)]
Fix two bugs in slice-min-mbs and slices-max

Slices-max broke slice-max-size when slice-max wasn't used.
Slice-min-mbs broke in rare cases near the end of a threadslice.

11 years agox86: SSSE3 LUT-based faster coeff_level_run
Fiona Glaser [Fri, 5 Apr 2013 01:00:23 +0000 (18:00 -0700)]
x86: SSSE3 LUT-based faster coeff_level_run

~2x faster coeff_level_run.
Faster CAVLC encoding: {1%,2%,7%} overall with {superfast,medium,slower}.
Uses the same pshufb LUT abuse trick as in the previous ads_mvs patch.

11 years agox86-64: BMI2 cabac_residual functions
Fiona Glaser [Mon, 25 Mar 2013 21:03:37 +0000 (14:03 -0700)]
x86-64: BMI2 cabac_residual functions