git.sesse.net Git

]> git.sesse.net Git - x264/log

projects / x264 / log

commit | commitdiff | tree

Janne Grunau [Wed, 12 Mar 2014 13:35:31 +0000 (14:35 +0100)]

arm: implement x264_pixel_var2_8x16_neon

checkasm --bench on a cortex-a9:
var2_8x16_c: 5677
var2_8x16_neon: 1421

commit | commitdiff | tree

Janne Grunau [Wed, 12 Mar 2014 12:16:00 +0000 (13:16 +0100)]

arm: implement x264_pixel_var_8x16_neon

checkasm --bench on a cortex-a9:
var_8x16_c: 4306
var_8x16_neon: 791

commit | commitdiff | tree

Henrik Gramner [Sun, 23 Feb 2014 14:33:48 +0000 (15:33 +0100)]

x86: SSE2 and SSSE3 plane_copy_deinterleave_rgb

About 5.6x faster than C on Haswell.

commit | commitdiff | tree

Henrik Gramner [Sun, 16 Feb 2014 20:24:54 +0000 (21:24 +0100)]

x86: Minor mbtree_propagate_cost improvements

Reduce the number of registers used from 7 to 6.
Reduce the number of vector registers used by the AVX2 implementation from 8 to 7.
Multiply fps_factor by 1/256 once per frame instead of once per macroblock row.
Use mova instead of movu for dst since it's guaranteed to be aligned.
Some cosmetics.

commit | commitdiff | tree

Henrik Gramner [Sun, 9 Feb 2014 22:58:04 +0000 (23:58 +0100)]

x86inc: Support arbitrary stack alignments

If the stack is known to be at least 32-byte aligned we can safely store ymm
registers on the stack without doing manual alignment.

Change ALLOC_STACK to always align the stack before allocating stack space for
consistency. Previously alignment would occur either before or after allocating
stack space depending on whether manual alignment was required or not.

commit | commitdiff | tree

Anton Mitrofanov [Fri, 14 Feb 2014 11:53:58 +0000 (15:53 +0400)]

x86inc: warn if XOP integer FMA instruction emulation is impossible

Emulation requires a temporary register if arguments 1 and 4 are the same; this
doesn't obey the semantics of the original instruction, so we can't emulate
that in x86inc.

ffmpeg has an x86util emulation for that case; I'll add it if x264's asm ever
needs it.

Also add pmacsdql emulation.

commit | commitdiff | tree

Loren Merritt [Sat, 1 Mar 2014 02:57:56 +0000 (02:57 +0000)]

x86inc: free up variable name "n" in global namespace

commit | commitdiff | tree

Henrik Gramner [Wed, 22 Jan 2014 18:09:12 +0000 (19:09 +0100)]

x86: Pass -Worphan-labels to yasm

Makes it easier to detect typos.

commit | commitdiff | tree

Steve Lhomme [Sun, 16 Feb 2014 12:15:09 +0000 (13:15 +0100)]

Write 3D metadata when outputting Matroska

For when --frame-packing is set.

commit | commitdiff | tree

Anton Mitrofanov [Sun, 23 Feb 2014 12:56:03 +0000 (16:56 +0400)]

Don't set chroma_loc_info_present_flag for non-4:2:0

The H.264 spec says it shouldn't be set in these cases.

commit | commitdiff | tree

Fiona Glaser [Mon, 10 Mar 2014 15:42:50 +0000 (08:42 -0700)]

x264.h: fix documentation

The full details of the return values of encoder_encode and encoder_headers
were mistakenly removed a while ago; re-add them.

commit | commitdiff | tree

Anton Mitrofanov [Sun, 23 Feb 2014 11:52:57 +0000 (15:52 +0400)]

Fix pointer cast warning for 64-bit builds

commit | commitdiff | tree

Anton Mitrofanov [Mon, 10 Mar 2014 12:48:02 +0000 (16:48 +0400)]

mbaff: fix mb_field_decoding_flag tracking and simplify allow skip check

Fixes an issue with too many forced non-skips in mbaff+cavlc, as well as
non-deterministic output with mbaff+cavlc+sliced-threads.

commit | commitdiff | tree

Anton Mitrofanov [Sun, 9 Mar 2014 23:22:57 +0000 (03:22 +0400)]

Fix memory overwrite in x264_deblock_h_chroma_mbaff_sse2

Fixes possible corruption with MBAFF+sliced threads.

commit | commitdiff | tree

Fiona Glaser [Sun, 2 Mar 2014 18:09:01 +0000 (10:09 -0800)]

Fix corruption with CAVLC overflow handling in MBAFF+main profile

Probably a regression in r2178.

commit | commitdiff | tree

Anton Mitrofanov [Mon, 10 Mar 2014 17:17:19 +0000 (21:17 +0400)]

Fix checkasm --bench output when nop_cycles is too large

commit | commitdiff | tree

Anton Mitrofanov [Wed, 22 Jan 2014 08:54:49 +0000 (12:54 +0400)]

Really fix quantization factor allocation

Actually allocate less (instead of just initialize less) and fix comments.

commit | commitdiff | tree

Yu Xiaolei [Sun, 23 Feb 2014 12:12:51 +0000 (04:12 -0800)]

Fix build with Android NDK

Android NDK does not expose sched_getaffinity.

commit | commitdiff | tree

Loren Merritt [Thu, 16 Jan 2014 21:34:46 +0000 (13:34 -0800)]

x86inc: speed up compilation with yasm

Work around yasm's inefficiency with handling large numbers of variables
in the global scope.

commit | commitdiff | tree

Kieran Kunhya [Fri, 10 Jan 2014 23:27:33 +0000 (23:27 +0000)]

Add support for AVC-Intra Class 200

commit | commitdiff | tree

James Weaver [Tue, 7 Jan 2014 10:31:58 +0000 (10:31 +0000)]

v210 input support

Assembly based on code by Henrik Gramner and Loren Merritt.

commit | commitdiff | tree

Fiona Glaser [Tue, 21 Jan 2014 21:39:33 +0000 (13:39 -0800)]

Fix quantization factor allocation

We don't need to wastefully allocate quant tables above QP_MAX_SPEC; they're
never used.

commit | commitdiff | tree

Henrik Gramner [Wed, 8 Jan 2014 00:06:56 +0000 (01:06 +0100)]

Avoid some unneccesary memory loads in macroblock_encode

commit | commitdiff | tree

Henrik Gramner [Sun, 5 Jan 2014 14:25:05 +0000 (15:25 +0100)]

Bump dates to 2014

Also update AUTHORS file and my e-mail address in the headers of various files.

commit | commitdiff | tree

Henrik Gramner [Sun, 5 Jan 2014 23:18:31 +0000 (00:18 +0100)]

Remove tools/xyuv.c

It's an old stand-alone application that isn't relevant to x264.

commit | commitdiff | tree

Anton Mitrofanov [Wed, 6 Nov 2013 22:37:23 +0000 (02:37 +0400)]

Use 8x16c wrappers with x86 asm functions for 4:2:2 with high bit depth

commit | commitdiff | tree

Henrik Gramner [Fri, 20 Dec 2013 21:44:28 +0000 (22:44 +0100)]

CLI: Avoid redundant 16-bit upconversions in piped raw input

It's not possible to seek in pipes, so if we want to skip frames we have to read and
discard unused ones. It's pointless to do bit-depth upconversions in those frames.

commit | commitdiff | tree

Anton Mitrofanov [Fri, 3 Jan 2014 16:06:06 +0000 (20:06 +0400)]

Fix input support from named pipes in Windows

commit | commitdiff | tree

Steve Clark [Wed, 20 Nov 2013 17:40:23 +0000 (21:40 +0400)]

Fix ARM asm compilation with Apple assembler

commit | commitdiff | tree

Anton Mitrofanov [Wed, 13 Nov 2013 15:24:48 +0000 (19:24 +0400)]

Fix uninitialized variable

Caused if the timebase is not specified in stats file. Found by Clang.

commit | commitdiff | tree

Anton Mitrofanov [Sun, 27 Oct 2013 15:27:23 +0000 (19:27 +0400)]

Remove --visualize option.

It probably wasn't used or maintained for last few years.

commit | commitdiff | tree

Anton Mitrofanov [Tue, 15 Oct 2013 08:32:25 +0000 (12:32 +0400)]

Add L-SMASH support as preferable alternative for MP4-muxing

commit | commitdiff | tree

Kieran Kunhya [Sat, 21 Sep 2013 18:16:12 +0000 (19:16 +0100)]

Add AVC-Intra 1080p50/60 Class 100 parameters

Also add some compatibility fixes.

commit | commitdiff | tree

Fiona Glaser [Mon, 9 Sep 2013 19:37:59 +0000 (12:37 -0700)]

Add --filler option

Allows generation of hard-CBR streams without using NAL HRD.
Useful if you want to be able to reconfigure the bitrate (which you can't do
with NAL HRD on).

commit | commitdiff | tree

Anton Mitrofanov [Sun, 27 Oct 2013 11:22:51 +0000 (15:22 +0400)]

Make x264_encoder_reconfig more threadsafe

Do the reconfig when the next frame's encode begins.
Fixes some rare crashes with frame-threading and encoder_reconfig.

commit | commitdiff | tree

Fiona Glaser [Fri, 25 Oct 2013 00:19:00 +0000 (17:19 -0700)]

chroma-me: take shortcut in BI analysis

~100 cycles faster with subme>=9

commit | commitdiff | tree

Fiona Glaser [Thu, 24 Oct 2013 21:44:43 +0000 (14:44 -0700)]

CRF-max: don't warn if VBV underflow occurs

Only warn if underflow occurs for reasons other than CRF-max, as CRF-max
implies that VBV underflow is desired by the user.

commit | commitdiff | tree

Henrik Gramner [Fri, 18 Oct 2013 20:43:36 +0000 (22:43 +0200)]

x86inc: Make ym# behave the same way as xm#

This makes more sense for future implementations of templates with zmm registers.

commit | commitdiff | tree

Henrik Gramner [Fri, 18 Oct 2013 20:21:38 +0000 (22:21 +0200)]

Use calloc instead of malloc + memset

commit | commitdiff | tree

Henrik Gramner [Thu, 10 Oct 2013 14:54:12 +0000 (16:54 +0200)]

Replace gf_malloc with regular malloc in mp4 muxer

It was used as a workaround for a bug that only existed in the GPAC repository
for a few weeks back in 2010. There's no reason to keep it anymore.

commit | commitdiff | tree

Anton Mitrofanov [Tue, 8 Oct 2013 19:20:40 +0000 (23:20 +0400)]

Update to current libav/ffmpeg API

commit | commitdiff | tree

Rafaël Carré [Fri, 25 Oct 2013 14:12:24 +0000 (07:12 -0700)]

version.sh: change to use /bin/sh

commit | commitdiff | tree

Sean McGovern [Wed, 4 Sep 2013 21:15:00 +0000 (14:15 -0700)]

configure: don't generate a git version number if .git isn't present

commit | commitdiff | tree

Martin Storsjo [Tue, 3 Sep 2013 21:56:18 +0000 (14:56 -0700)]

configure: include dependency libs in the Libs pkg-config

If only a static library is built, the user of the library that just
tries to link to the lib using the flags provided by pkg-config
might not know that only a static lib exists and that he'd have to
pass --static to pkg-config to get the internal dependencies to
be able to link the library.

For a shared build, the internal dependencies are kept in Libs.private
as before.

This matches how libav's pkg-config files are generated.

commit | commitdiff | tree

Anton Mitrofanov [Thu, 17 Oct 2013 20:38:06 +0000 (00:38 +0400)]

Fix compilation in case of HAVE_LOG2F check fails spuriously

commit | commitdiff | tree

Anton Mitrofanov [Sat, 12 Oct 2013 08:01:57 +0000 (12:01 +0400)]

Fix compilation of shared library for Windows with original MinGW toolchain

commit | commitdiff | tree

Anton Mitrofanov [Tue, 8 Oct 2013 19:32:37 +0000 (23:32 +0400)]

Fix possible crashes in resize and crop filters with high bitdepth input

commit | commitdiff | tree

Tim Mooney [Tue, 3 Sep 2013 20:43:50 +0000 (13:43 -0700)]

Fix INSTALL in configure for Solaris systems

commit | commitdiff | tree

Henrik Gramner [Tue, 27 Aug 2013 22:50:31 +0000 (00:50 +0200)]

Workaround for FFMS indexing bug

If FFMS_ReadIndex is used with an empty index file it gets stuck in an infinite loop instead of returning NULL
like it's supposed to do on failure. Explicitly check if the file is empty before calling it as a workaround.

commit | commitdiff | tree

Anton Mitrofanov [Mon, 26 Aug 2013 17:20:31 +0000 (21:20 +0400)]

Fix masked access violation in KERNEL32

Caused crashes under gdb in Windows and might cause other unknown problems.

commit | commitdiff | tree

Hiroki Taniura [Sat, 24 Aug 2013 16:18:57 +0000 (01:18 +0900)]

Fix GPAC support on Windows

commit | commitdiff | tree

Henrik Gramner [Sun, 11 Aug 2013 17:50:42 +0000 (19:50 +0200)]

Windows Unicode support

Windows, unlike most other operating systems, uses UTF-16 for Unicode strings while x264 is designed for UTF-8.

This patch does the following in order to handle things like Unicode filenames:
* Keep strings internally as UTF-8.
* Retrieve the CLI command line as UTF-16 and convert it to UTF-8.
* Always use Unicode versions of Windows API functions and convert strings to UTF-16 when calling them.
* Attempt to use legacy 8.3 short filenames for external libraries without Unicode support.

commit | commitdiff | tree

Kieran Kunhya [Sat, 20 Jul 2013 17:47:59 +0000 (18:47 +0100)]

AVC-Intra support

This format has been reverse engineered and x264's output has almost exactly
the same bitstream as Panasonic cameras and encoders produce. It therefore does
not comply with SMPTE RP2027 since Panasonic themselves do not comply with
their own specification. It has been tested in Avid, Premiere, Edius and
Quantel.

Parts of this patch were written by Fiona Glaser and some reverse
engineering was done by Joseph Artsimovich.

commit | commitdiff | tree

Henrik Gramner [Mon, 8 Jul 2013 19:06:42 +0000 (12:06 -0700)]

Transparent hugepage support

Combine frame and mb data mallocs into a single large malloc.
Additionally, on Linux systems with hugepage support, ask for hugepages on
large mallocs.

This gives a small performance improvement (~0.2-0.9%) on systems without
hugepage support, as well as a small memory footprint reduction.

On recent Linux kernels with hugepage support enabled (set to madvise or
always), it improves performance up to 4% at the cost of about 7-12% more
memory usage on typical settings..

It may help even more on Haswell and other recent CPUs with improved 2MB page
support in hardware.

commit | commitdiff | tree

Henrik Gramner [Fri, 5 Jul 2013 19:15:54 +0000 (21:15 +0200)]

x86: SSSE3 implementation of pixel_sad_x3 and pixel_sad_x4

commit | commitdiff | tree

Henrik Gramner [Fri, 5 Jul 2013 19:15:49 +0000 (21:15 +0200)]

x86: Faster AVX2 pixel_sad_x3 and pixel_sad_x4

commit | commitdiff | tree

Diogo Franco [Wed, 24 Jul 2013 01:17:44 +0000 (22:17 -0300)]

configure: Support cygwin64

commit | commitdiff | tree

Derek Buitenhuis [Fri, 9 Aug 2013 17:39:27 +0000 (13:39 -0400)]

x86inc: Check for __OUTPUT_FORMAT__ having a value of "x64"

This is also a valid value for WIN64.

commit | commitdiff | tree

Anton Mitrofanov [Tue, 23 Jul 2013 21:11:50 +0000 (14:11 -0700)]

Fix cases in which intra refresh allowed prediction from disallowed pixels

commit | commitdiff | tree

Anton Mitrofanov [Tue, 6 Aug 2013 21:56:34 +0000 (01:56 +0400)]

Fix a few minor bugs found with a static analyzer

commit | commitdiff | tree

Fiona Glaser [Fri, 12 Jul 2013 23:07:35 +0000 (16:07 -0700)]

Fix AVX2 detection bug with "limit CPUID" enabled in BIOS

commit | commitdiff | tree

Henrik Gramner [Fri, 5 Jul 2013 19:15:43 +0000 (21:15 +0200)]

x86: Remove X264_CPU_SSE_MISALIGN functions

Prevents a crash if the misaligned exception mask bit is cleared for some reason.

Misaligned SSE functions are only used on AMD Phenom CPUs and the benefit is miniscule.
They also require modifying the MXCSR control register and by removing those functions
we can get rid of that complexity altogether.

VEX-encoded instructions also supports unaligned memory operands. I tried adding AVX
implementations of all removed functions but there were no performance improvements on
Ivy Bridge. pixel_sad_x3 and pixel_sad_x4 had significant code size reductions though
so I kept them and added some minor cosmetics fixes and tweaks.

commit | commitdiff | tree

Fiona Glaser [Thu, 20 Jun 2013 22:51:39 +0000 (15:51 -0700)]

Tweak i16x16-delta-quant-avoidance code

Don't omit the delta quant if it'd raise the quantizer to do so; this fixes
a rare flickering issue caused by deblocking.

commit | commitdiff | tree

Fiona Glaser [Sun, 9 Jun 2013 16:06:27 +0000 (09:06 -0700)]

x86: faster AVX2 iDCT, AVX deblock_luma_h, deblock_luma_h_intra

commit | commitdiff | tree

Lucien [Mon, 17 Jun 2013 18:28:09 +0000 (18:28 +0000)]

Add new color primaries, transfer characteristics, matrix coefficients

commit | commitdiff | tree

Fiona Glaser [Sat, 1 Jun 2013 00:01:29 +0000 (17:01 -0700)]

Add "--stitchable" option for segmented encoding

Stops x264 from attempting to optimize global stream headers, ensuring that
different segments of a video will have identical headers when used with
identical encoding settings.

commit | commitdiff | tree

Fiona Glaser [Thu, 27 Jun 2013 15:29:06 +0000 (08:29 -0700)]

Interface: if vbv-maxrate < bitrate, set bitrate = vbv-maxrate

This probably makes more sense to the user than setting vbv-maxrate = bitrate,
as before.

commit | commitdiff | tree

Anton Mitrofanov [Tue, 28 May 2013 12:02:42 +0000 (05:02 -0700)]

OpenCL cosmetics

commit | commitdiff | tree

Anton Mitrofanov [Mon, 17 Jun 2013 20:16:33 +0000 (00:16 +0400)]

Fix possible crash when writing very large filler NALUs

Bitstream-reallocation function didn't handle the case of filler.

commit | commitdiff | tree

Loren Merritt [Mon, 17 Jun 2013 18:27:09 +0000 (11:27 -0700)]

Fix build with PIC on some systems

commit | commitdiff | tree

Henrik Gramner [Sun, 2 Jun 2013 16:41:17 +0000 (18:41 +0200)]

Fix potential misaligment crash in AVX2 denoise_dct

commit | commitdiff | tree

Anton Mitrofanov [Mon, 27 May 2013 21:48:15 +0000 (01:48 +0400)]

Fix building with compilers without inline asm support

Also fix crash in high bit depth builds compiled with unaligned stack.

commit | commitdiff | tree

Anton Mitrofanov [Wed, 22 May 2013 18:43:59 +0000 (22:43 +0400)]

Fix compilation with OpenCL on MacOS X

Also fix crash in the case of OpenCL error during encoding.

commit | commitdiff | tree

Anton Mitrofanov [Mon, 6 May 2013 18:51:11 +0000 (22:51 +0400)]

OpenCL support improvement/refactoring

Autoload the OpenCL library so that it's not required to run an openCL-enabled
build of x264.

Update X264_BUILD, which should have been changed with the first patch.

commit | commitdiff | tree

Fiona Glaser [Thu, 16 May 2013 20:51:37 +0000 (13:51 -0700)]

x86: shave a few instructions off AVX deblock

commit | commitdiff | tree

Henrik Gramner [Tue, 14 May 2013 16:57:40 +0000 (18:57 +0200)]

x86: AVX2 dequant_4x4_dc

commit | commitdiff | tree

Henrik Gramner [Tue, 14 May 2013 16:53:12 +0000 (18:53 +0200)]

x86: AVX2 high bit-depth dequant

commit | commitdiff | tree

Fiona Glaser [Fri, 10 May 2013 00:20:05 +0000 (17:20 -0700)]

x86-64: 64-bit variant of AVX2 hpel_filter

~5% faster than 32-bit.

commit | commitdiff | tree

Henrik Gramner [Mon, 6 May 2013 16:41:24 +0000 (18:41 +0200)]

x86: AVX2 high bit-depth denoise_dct

28->15 cycles

Also reorder instructions to use fewer registers, 3 cycles faster on Ivy Bridge with 64-bit Windows.

commit | commitdiff | tree

Henrik Gramner [Sat, 4 May 2013 16:48:58 +0000 (18:48 +0200)]

x86: AVX2 high bit-depth quant

quant_4x4: 13->6 cycles
quant_4x4_dc: 14->8 cycles
quant_8x8: 47->24 cycles
quant_4x4x4: 48->25 cycles

commit | commitdiff | tree

Fiona Glaser [Wed, 1 May 2013 21:32:11 +0000 (14:32 -0700)]

x86: AVX2 add16x16_idct_dc

27 -> 19 cycles

commit | commitdiff | tree

Fiona Glaser [Mon, 29 Apr 2013 23:16:54 +0000 (16:16 -0700)]

x86: faster AVX2 quant_4x4x4

10->9 cycles

commit | commitdiff | tree

Fiona Glaser [Sun, 28 Apr 2013 04:03:32 +0000 (21:03 -0700)]

x86: AVX2 intra_sad_x3_8x8c

30->22 cycles

commit | commitdiff | tree

Henrik Gramner [Sun, 28 Apr 2013 09:11:03 +0000 (11:11 +0200)]

x86: AVX2 high bit-depth intra_sad_x3_8x8

43->24 cycles

commit | commitdiff | tree

Fiona Glaser [Wed, 24 Apr 2013 21:22:15 +0000 (14:22 -0700)]

x86: AVX2 deblock strength

30->18 cycles

commit | commitdiff | tree

Henrik Gramner [Wed, 1 May 2013 15:42:48 +0000 (17:42 +0200)]

x86: Faster high bit-depth intra_sad_x3_4x4

20->16 cycles on Ivy Bridge

commit | commitdiff | tree

Fiona Glaser [Wed, 1 May 2013 00:36:46 +0000 (17:36 -0700)]

x86: faster SSSE3 hpel

~7% faster using the pmulhrsw trick from mc_chroma.

commit | commitdiff | tree

Fiona Glaser [Mon, 29 Apr 2013 21:22:23 +0000 (14:22 -0700)]

x86-64: faster SSSE3 trellis

~2% faster trellis.

commit | commitdiff | tree

Fiona Glaser [Fri, 3 May 2013 00:10:26 +0000 (17:10 -0700)]

x86: 32-byte align the stack if possible

Avoids the need for manual 32 byte array alignment on compilers that support
-mpreferred-stack-boundary.

commit | commitdiff | tree

Henrik Gramner [Sat, 11 May 2013 21:39:09 +0000 (23:39 +0200)]

x86inc: Utilize the shadow space on 64-bit Windows

Store XMM6 and XMM7 in the shadow space in functions that clobbers them.
This way we don't have to adjust the stack pointer as often,
reducing the number of instructions as well as code size.

commit | commitdiff | tree

Henrik Gramner [Fri, 3 May 2013 21:06:10 +0000 (23:06 +0200)]

x86: Don't use explicitly aligned versions of SAD on AVX CPUs

On modern CPUs movdqu isn't slower than movdqa when used on aligned data and using the same code in both cases saves cache.

This was already done for the high bit-depth AVX2 implementation but the aligned version still exists as dead code so remove that.

commit | commitdiff | tree

Henrik Gramner [Fri, 3 May 2013 18:18:03 +0000 (20:18 +0200)]

x86: Add missing initializations for high bit-depth sad_aligned

commit | commitdiff | tree

Fiona Glaser [Mon, 13 May 2013 23:52:18 +0000 (16:52 -0700)]

x86: add Jaguar CPU detection

commit | commitdiff | tree

Henrik Gramner [Tue, 7 May 2013 15:21:03 +0000 (17:21 +0200)]

x86inc: Remove .rodata kludges

The Mach-O bug was fixed in yasm 0.8.0 and we don't support versions that old.

a.out was superseded by ELF on sane systems a few decades ago.

commit | commitdiff | tree

Henrik Gramner [Sat, 4 May 2013 14:21:32 +0000 (16:21 +0200)]

checkasm: Use 64-bit cycle counters

Prevents overflows that can occur in some cases.

commit | commitdiff | tree

Henrik Gramner [Fri, 10 May 2013 11:55:32 +0000 (13:55 +0200)]

checkasm: Fix stack alignment bug

commit | commitdiff | tree

Fiona Glaser [Wed, 8 May 2013 17:48:41 +0000 (10:48 -0700)]

Fix invalid memcpy in sliced-threads

Likely didn't actually break in practice, but memcpy with src==dst
is incorrect.

commit | commitdiff | tree

Fiona Glaser [Mon, 29 Apr 2013 19:14:01 +0000 (12:14 -0700)]

Fix two bugs in slice-min-mbs and slices-max

Slices-max broke slice-max-size when slice-max wasn't used.
Slice-min-mbs broke in rare cases near the end of a threadslice.

commit | commitdiff | tree

Fiona Glaser [Fri, 5 Apr 2013 01:00:23 +0000 (18:00 -0700)]

x86: SSSE3 LUT-based faster coeff_level_run

~2x faster coeff_level_run.
Faster CAVLC encoding: {1%,2%,7%} overall with {superfast,medium,slower}.
Uses the same pshufb LUT abuse trick as in the previous ads_mvs patch.

commit | commitdiff | tree

Fiona Glaser [Mon, 25 Mar 2013 21:03:37 +0000 (14:03 -0700)]

x86-64: BMI2 cabac_residual functions

Unnamed repository; edit this file 'description' to name the repository.

RSS Atom