]>
git.sesse.net Git - x264/log
Mark Webster [Wed, 5 Aug 2015 03:28:17 +0000 (04:28 +0100)]
Simplify inclusion of x264.h in C++ projects
Name all structs to support forward declarations.
Add a conditional extern "C" wrapper in x264.h itself instead of having to
specify it in every location where it's included.
Henrik Gramner [Sun, 16 Aug 2015 19:59:26 +0000 (21:59 +0200)]
checkasm: Properly save rdx/edx in checkasm_call() on x86
If the return value doesn't fit in a single register rdx/edx can in some
cases be used in addition to rax/eax.
Doesn't affect any of the existing checkasm tests but it's more correct
behavior and it might be useful in the future.
Henrik Gramner [Tue, 11 Aug 2015 15:19:35 +0000 (17:19 +0200)]
x86: Enable SSE2 by default on x86-32
It makes more sense to tune the defaults to benefit the vast majority of users.
Anyone still using a Pentium III for video encoding is of course free to
explicitly set different flags when compiling.
Henrik Gramner [Mon, 10 Aug 2015 20:30:21 +0000 (22:30 +0200)]
msvs/icl: Improve default CFLAGS
Use -fp:fast as a substitute for -ffast-math.
Increase warning level from -W0 to -W1 (the default setting).
Disable -GS (stack cookies) on MSVS. It's disabled by default on ICL.
Henrik Gramner [Wed, 12 Aug 2015 20:23:31 +0000 (22:23 +0200)]
Use a relative $SRCPATH for out-of-tree builds
Fixes out-of-tree MSVS builds on Cygwin.
Henrik Gramner [Sat, 8 Aug 2015 20:26:38 +0000 (22:26 +0200)]
cygwin: Enable MSVS support
`cl -showIncludes` creates absolute Windows paths for some files, attempt
to convert those to Unix paths.
Use relative paths for dependencies located in or below the working directory
in order to mimic the behavior of gcc and to make the paths more readable.
Make the dependency generation script a bit more robust in general.
Henrik Gramner [Sat, 8 Aug 2015 16:34:21 +0000 (18:34 +0200)]
cltostr.sh: Minor fixes
Henrik Gramner [Sat, 8 Aug 2015 10:21:54 +0000 (12:21 +0200)]
Simplify version.sh
Also remove some non-POSIX syntax and improve robustness.
As a bonus the script now runs about 2-3 times faster.
`git rev-list --count` could be used to simplify things even further,
but that functionality was added in git 1.7.2 so keep `wc -l` for now
to maintain compatibility with older git versions.
장영훈 [Fri, 7 Aug 2015 05:43:24 +0000 (14:43 +0900)]
msvs: Fix cl detection in non-English environments
Henrik Gramner [Mon, 3 Aug 2015 19:05:11 +0000 (21:05 +0200)]
x86inc: Sync minor changes from ffmpeg/libav
Henrik Gramner [Wed, 29 Jul 2015 17:30:52 +0000 (19:30 +0200)]
matroska: Add comments for the remaining element names
Henrik Gramner [Wed, 29 Jul 2015 17:30:41 +0000 (19:30 +0200)]
Silence various static analyzer warnings
Those are false positives, but it doesn't hurt to get rid of them.
Henrik Gramner [Sun, 26 Jul 2015 21:13:29 +0000 (23:13 +0200)]
mingw: Enable the tsaware linker flag
Avoids an irrelevant compatibility layer in Terminal Services environments.
https://msdn.microsoft.com/en-us/library/
cc834995 .aspx
Henrik Gramner [Sun, 26 Jul 2015 21:13:26 +0000 (23:13 +0200)]
msvs: Don't redefine snprintf for VS2015
Visual Studio 2015 has a proper snprintf implementation.
Henrik Gramner [Sun, 26 Jul 2015 21:13:19 +0000 (23:13 +0200)]
msvs: Prefer link.exe from the same directory as cl.exe
/usr/bin/link from coreutils may be located before the MSVS linker in $PATH
which causes linking to fail due to using the wrong binary.
Henrik Gramner [Sun, 26 Jul 2015 22:10:00 +0000 (00:10 +0200)]
frame_dump: check fseek() return value
Henrik Gramner [Sun, 26 Jul 2015 22:08:38 +0000 (00:08 +0200)]
x264_vfprintf: use va_copy
It's undefined behavior to use the same va_list twice.
This most likely didn't cause any issues in practice since the string would
have to be larger than 4 KiB to trigger the fallback path.
Use workaround for ICL as it doesn't define va_copy even for C99.
Henrik Gramner [Sun, 26 Jul 2015 22:08:31 +0000 (00:08 +0200)]
param_parse: Fix framerate rounding issues
Marcin Juszkiewicz [Mon, 1 Jun 2015 09:24:45 +0000 (11:24 +0200)]
aarch64: Remove broken CFLAGS in configure
GCC doesn't have an "-arch" switch, but works when that entire line is removed.
Rong Yan [Mon, 20 Jul 2015 08:34:20 +0000 (03:34 -0500)]
ppc: Add little-endian PowerPC support
Rishikesh More [Thu, 18 Jun 2015 12:18:46 +0000 (17:48 +0530)]
mips: MSA quant optimizations
Signed-off-by: Rishikesh More <rishikesh.more@imgtec.com>
Rishikesh More [Thu, 18 Jun 2015 12:18:45 +0000 (17:48 +0530)]
mips: MSA predict optimizations
Signed-off-by: Rishikesh More <rishikesh.more@imgtec.com>
Rishikesh More [Thu, 18 Jun 2015 12:18:44 +0000 (17:48 +0530)]
mips: MSA pixel optimizations
Signed-off-by: Rishikesh More <rishikesh.more@imgtec.com>
Rishikesh More [Thu, 18 Jun 2015 12:18:43 +0000 (17:48 +0530)]
mips: MSA deblock optimizations
Signed-off-by: Rishikesh More <rishikesh.more@imgtec.com>
Rishikesh More [Thu, 18 Jun 2015 12:18:42 +0000 (17:48 +0530)]
mips: MSA dct optimizations
Signed-off-by: Rishikesh More <rishikesh.more@imgtec.com>
Rishikesh More [Thu, 18 Jun 2015 12:18:40 +0000 (17:48 +0530)]
mips: MSA mc optimizations
Signed-off-by: Rishikesh More <rishikesh.more@imgtec.com>
Rishikesh More [Thu, 18 Jun 2015 12:18:38 +0000 (17:48 +0530)]
mips: Common MSA macros
Add macros for load/store, slide, shift, transpose and basic arithmetic
operations required by subsequent patches.
Signed-off-by: Rishikesh More <rishikesh.more@imgtec.com>
Rishikesh More [Tue, 12 May 2015 14:08:09 +0000 (19:38 +0530)]
mips: Add MSA support to checkasm
Signed-off-by: Rishikesh More <rishikesh.more@imgtec.com>
Kaustubh Raste [Fri, 17 Apr 2015 12:08:58 +0000 (17:38 +0530)]
mips: Initial MSA support
MSA is the MIPS SIMD Architecture.
Add X264_CPU_MSA define.
Update configure to detect MIPS platform and set flags.
CPU-specific gcc options are expected through --extra-cflags.
Sample command line for mips32r5:
./configure --host=mipsel-linux-gnu --cross-prefix=<TOOLCHAIN>/mips-mti-linux-gnu-
--extra-cflags="-EL -mips32r5 -msched-weight -mload-store-pairs"
Signed-off-by: Kaustubh Raste <kaustubh.raste@imgtec.com>
Anton Mitrofanov [Thu, 16 Jul 2015 21:22:29 +0000 (00:22 +0300)]
Limit autodetection of threads number according to the source height
Anton Mitrofanov [Thu, 16 Jul 2015 16:04:59 +0000 (19:04 +0300)]
Fine-tune of frame's size predictors at ratecontrol start
This is attempt to improve VBV at start of video with a lot of threads which
delay feedback for predictors.
Anton Mitrofanov [Thu, 16 Jul 2015 13:15:56 +0000 (16:15 +0300)]
Use forced frame types in slicetype analysis
This should improve MBTree and VBV when a lot of forced frame types are used.
Henrik Gramner [Mon, 1 Dec 2014 21:05:42 +0000 (22:05 +0100)]
x86: SSSE3 and AVX2 implementations of plane_copy_swap
For NV21 input.
Yu Xiaolei [Fri, 6 Jun 2014 08:05:27 +0000 (16:05 +0800)]
NV21 input support
Eliminates an extra copy when encoding Android camera preview images.
Checkasm test by Janne Grunau.
ARM assembly with improvements from Janne Grunau.
Henrik Gramner [Tue, 23 Jun 2015 15:00:47 +0000 (17:00 +0200)]
deblock: Write combining
Henrik Gramner [Tue, 23 Jun 2015 12:59:59 +0000 (14:59 +0200)]
Get rid of some tabs and trailing whitespaces
Henrik Gramner [Sat, 23 May 2015 17:44:16 +0000 (19:44 +0200)]
x86: Experimental nasm support
Enables the use of nasm as an alternative to yasm.
Note that nasm cannot assemble x264 with PIC enabled since it currently doesn't
support [symbol-$$] addressing which is used extensively by x264's PIC code.
This includes all 64-bit Windows and 64-bit OS X builds, even non-shared.
For the above reason nasm is currently intentionally not auto-detected, instead
the assembler must be explicitly specified using "AS=nasm ./configure".
Also drop -O2 from ASFLAGS since it's simply ignored anyway.
Timothy Gu [Tue, 26 May 2015 17:12:42 +0000 (19:12 +0200)]
x86inc: Prevent warnings when using `struc` and `endstruc`
struc and endstruc attempts to revert to the previous section state set by
the SECTION macro.
Use the primitive [SECTION] directive instead of the SECTION macro for the
.note.GNU-stack section to prevent it from being emitted again during endstruc.
Henrik Gramner [Wed, 27 May 2015 19:38:14 +0000 (21:38 +0200)]
x86inc: Drop SECTION_TEXT macro
The .text section is already 16-byte aligned by default on all supported
platforms so `SECTION_TEXT` isn't any different from `SECTION .text`.
Henrik Gramner [Sat, 23 May 2015 11:38:05 +0000 (13:38 +0200)]
x86inc: Disable vpbroadcastq workaround in newer yasm versions
The bug was fixed in 1.3.0, so only perform the workaround in earlier versions.
Henrik Gramner [Sun, 24 May 2015 20:57:00 +0000 (22:57 +0200)]
Prefer Unicode versions of Windows API calls
Just for consistency, doesn't affect behavior.
Henrik Gramner [Sun, 24 May 2015 21:21:20 +0000 (23:21 +0200)]
Get rid of fPIC warnings when compiling a shared library on Windows
PIC is always enabled when compiling for Windows so gcc complains when using
-fPIC since it doesn't do anything.
Henrik Gramner [Sat, 25 Jul 2015 20:42:59 +0000 (22:42 +0200)]
matroska: Write the correct DocTypeVersion when using frame-packing
The StereoMode element is only valid with DocTypeVersion 3 or higher.
Anton Mitrofanov [Fri, 24 Jul 2015 21:21:52 +0000 (00:21 +0300)]
dump_yuv: Fix file handle leak
Anton Mitrofanov [Fri, 24 Jul 2015 21:20:47 +0000 (00:20 +0300)]
mp4: Fix file handle leak
Henrik Gramner [Tue, 23 Jun 2015 22:40:45 +0000 (00:40 +0200)]
flv: Check fseek() and fwrite() return values
Henrik Gramner [Tue, 23 Jun 2015 22:22:56 +0000 (00:22 +0200)]
flv: Fix memory and file handle leaks
Henrik Gramner [Tue, 23 Jun 2015 23:23:35 +0000 (01:23 +0200)]
avs: Fix file handle leak
Henrik Gramner [Tue, 23 Jun 2015 11:38:02 +0000 (13:38 +0200)]
matroska: Fix memory leak
Henrik Gramner [Tue, 23 Jun 2015 11:24:29 +0000 (13:24 +0200)]
rdo: Fix potential CAVLC overflow issues
Henrik Gramner [Tue, 23 Jun 2015 20:08:35 +0000 (22:08 +0200)]
slurp_file: Various minor bug fixes
* Fix unsigned <= 0 check.
* Add additional size sanity check on 32-bit systems.
* Don't read uninitialized data if fread() fails.
Henrik Gramner [Tue, 23 Jun 2015 20:47:53 +0000 (22:47 +0200)]
param_parse: Check strdup() return value
Henrik Gramner [Tue, 23 Jun 2015 13:38:16 +0000 (15:38 +0200)]
param_parse: Fix memory leak
Anton Mitrofanov [Fri, 19 Jun 2015 13:01:12 +0000 (16:01 +0300)]
Add FreeBSD's stdint.h header guard to allowed list
Patch written by Koop Mast <kwm@FreeBSD.org>
Henrik Gramner [Fri, 22 May 2015 17:23:33 +0000 (19:23 +0200)]
x86: Prevent overread of src in plane_copy_interleave
Could only occur in 4:2:2 with height == 1.
Also enable asm for inputs with different U/V strides as long as the strides
have identical signs.
Anton Mitrofanov [Wed, 20 May 2015 20:10:20 +0000 (23:10 +0300)]
checkasm: Fix incorrect memcmp size for ARM architecture
Anton Mitrofanov [Sun, 26 Apr 2015 17:51:05 +0000 (20:51 +0300)]
Fix possible use of uninitialized MVs in lookahead analysis for B-frames
Anton Mitrofanov [Tue, 21 Apr 2015 20:08:19 +0000 (23:08 +0300)]
Catch incorrect usage of libx264 API for delayed frames flushing
Anton Mitrofanov [Sat, 7 Mar 2015 20:00:09 +0000 (23:00 +0300)]
Fix detection of system libx264 configuration
Anton Mitrofanov [Mon, 23 Feb 2015 11:23:18 +0000 (14:23 +0300)]
Cosmetic changes
Anton Mitrofanov [Tue, 30 Dec 2014 23:15:05 +0000 (02:15 +0300)]
Update configure for auto detection of system libx264 configuration
Anton Mitrofanov [Tue, 3 Feb 2015 11:51:28 +0000 (14:51 +0300)]
Add tile format frame packing value
Defined in 2014-02 edition.
Anton Mitrofanov [Tue, 3 Feb 2015 10:39:14 +0000 (13:39 +0300)]
Stricter validation of crop-rect values
Vittorio Giovara [Tue, 20 Jan 2015 16:15:56 +0000 (16:15 +0000)]
Add mono frame packing value
Defined in 2013-04 edition.
Vittorio Giovara [Tue, 20 Jan 2015 15:57:41 +0000 (15:57 +0000)]
Validate frame packing value instead of clipping
Christophe Gisquet [Tue, 3 Feb 2015 19:40:41 +0000 (20:40 +0100)]
x86inc: Correctly warn on use of SSE2 instructions in SSE functions
SSE2 instructions that are XMM-implementations of pre-existing MMX/MMX2
instructions did not issue warnings when used in SSE functions. Handle
it by also checking the register type when such instructions are used.
Christophe Gisquet [Tue, 3 Feb 2015 17:02:30 +0000 (18:02 +0100)]
x86inc: Fix instantiation of YMM registers
Vittorio Giovara [Tue, 20 Jan 2015 16:28:54 +0000 (16:28 +0000)]
matroska: Correctly write display width and height in stereo mode
According to the specifications, when stereo mode is set, these values
represent the single view size.
Kieran Kunhya [Tue, 20 Jan 2015 15:38:00 +0000 (09:38 -0600)]
Use POC type 0 for AVC-Intra
Based on a patch from Capella Systems
Anton Mitrofanov [Sat, 3 Jan 2015 12:46:19 +0000 (15:46 +0300)]
Fix ARCH variable name conflict with BSD ports (bsd.port.mk) read-only variable
Anton Mitrofanov [Sat, 27 Dec 2014 17:35:39 +0000 (20:35 +0300)]
Fix negative percentages in final stats output
They were caused by integer overflow when encoding long UHD video.
Anton Mitrofanov [Sat, 3 Jan 2015 20:35:23 +0000 (23:35 +0300)]
Bump dates to 2015
Anton Mitrofanov [Mon, 15 Dec 2014 15:49:23 +0000 (18:49 +0300)]
x86: Update intel compiler cpu dispatcher override for new versions of ICC/ICL
Anton Mitrofanov [Tue, 6 Sep 2011 17:53:29 +0000 (21:53 +0400)]
New AQ mode: auto-variance AQ with bias to dark scenes
Also known as --aq-mode 3 or auto-variance AQ modification.
Anton Mitrofanov [Tue, 28 Aug 2012 23:02:27 +0000 (03:02 +0400)]
Improve HRD conformance
Henrik Gramner [Fri, 28 Nov 2014 22:24:56 +0000 (23:24 +0100)]
x86: SSE and AVX implementations of plane_copy
Also remove the MMX2 implementation and fix src overread for height == 1.
Anton Mitrofanov [Mon, 29 Sep 2014 19:26:19 +0000 (23:26 +0400)]
Update to the latest version of gas-preprocessor.pl from http://git.libav.org/?p=gas-preprocessor.git
Contributions by Janne Grunau, Martin Storsjo, Mans Rullgard, David Conrad, Martin Aumuller and others
Janne Grunau [Tue, 18 Nov 2014 23:33:55 +0000 (00:33 +0100)]
aarch64: cabac_encode_{decision,bypass,terminal}_asm
benchmarks on a Nexus 9 (nvidia denver):
101.3 cycles in x264_cabac_encode_decision_c,
67105369 runs, 3495 skips
97.3 cycles in x264_cabac_encode_decision_asm,
67105493 runs, 3371 skips
132.8 cycles in x264_cabac_encode_terminal_c,
1046950 runs, 1626 skips
116.1 cycles in x264_cabac_encode_terminal_asm,
1048424 runs, 152 skips
92.4 cycles in x264_cabac_encode_bypass_c,
16776192 runs, 1024 skips
89.6 cycles in x264_cabac_encode_bypass_asm,
16776453 runs, 763 skips
Cycle counts are not as stable as one would like. The dynamic code
optimisation seems to produce different results for small chnages in a
binary. Repeated runs with the same binary produce stable results
though (ignoring the first run).
Janne Grunau [Thu, 6 Nov 2014 08:20:17 +0000 (09:20 +0100)]
checkasm: add cycle counter read for aarch64
Needs kernel support since user space access to the cycle counter is not
allowed on all available AArch64 systems (Android 5 and iOS).
Janne Grunau [Wed, 5 Nov 2014 10:35:13 +0000 (11:35 +0100)]
aarch64: nal_escape_neon
3-4 times faster.
Janne Grunau [Fri, 31 Oct 2014 13:49:04 +0000 (14:49 +0100)]
aarch64: {plane_copy,memcpy_aligned,memzero_aligned}_neon
2-3 times faster than C.
Janne Grunau [Wed, 29 Oct 2014 17:17:48 +0000 (18:17 +0100)]
aarch64: x264_mbtree_propagate_{cost,list}_neon
x264_mbtree_propagate_cost_neon is ~7 times faster.
x264_mbtree_propagate_list_neon is 33% faster.
Janne Grunau [Tue, 21 Oct 2014 13:18:49 +0000 (15:18 +0200)]
aarch64: x264_denoise_dct_neon
3.5 times faster.
Janne Grunau [Mon, 20 Oct 2014 11:12:14 +0000 (13:12 +0200)]
aarch64: x264_coeff_level_run{4,8,15,16}
All functions ~33% faster.
Janne Grunau [Tue, 14 Oct 2014 17:20:52 +0000 (19:20 +0200)]
aarch64: NEON asm for intra luma deblocking
deblock_luma_intra[0]_neon is 2 times fastes,
deblock_luma_intra[1]_neon is ~4 times faster.
Janne Grunau [Mon, 13 Oct 2014 15:29:22 +0000 (17:29 +0200)]
aarch64: x264_deblock_h_chroma_422_neon
deblock_h_chroma_422 2.5 times faster
Janne Grunau [Mon, 13 Oct 2014 10:43:50 +0000 (12:43 +0200)]
aarch64: x264_deblock_h_chroma_mbaff_neon
deblock_chroma_420_mbaff_neon 2 times faster
Janne Grunau [Fri, 10 Oct 2014 08:29:15 +0000 (10:29 +0200)]
aarch64: NEON asm for intra chroma deblocking
deblock_h_chroma_420_intra, deblock_h_chroma_422_intra and
x264_deblock_h_chroma_intra_mbaff_neon are ~3 times faster.
deblock_chroma_intra[1] is ~4 times faster than C.
Janne Grunau [Tue, 2 Sep 2014 08:27:22 +0000 (10:27 +0200)]
aarch64: add myself as author to aarch64/mc.h
Janne Grunau [Thu, 14 Aug 2014 13:22:50 +0000 (14:22 +0100)]
aarch64: NEON asm for integral init
integral_init4h_neon and integral_init8h_neon are 3-4 times faster than
C. integral_init8v_neon is 6 times faster and integral_init4v_neon is 10
times faster.
Janne Grunau [Wed, 13 Aug 2014 12:30:53 +0000 (13:30 +0100)]
aarch64: NEON asm for 8x16c intra prediction
Between 10% and 40% faster than C.
Janne Grunau [Tue, 12 Aug 2014 15:26:10 +0000 (17:26 +0200)]
aarch64: NEON asm for decimate_score
decimate_score15 and 16 are 60% faster, decimate_score64 is 4 times
faster than C.
Janne Grunau [Fri, 8 Aug 2014 10:19:35 +0000 (11:19 +0100)]
aarch64: implement x264_sub8x16_dct_dc_neon
4 times faster than C.
Janne Grunau [Thu, 7 Aug 2014 17:46:07 +0000 (19:46 +0200)]
aarch64: implement x264_pixel_asd8_neon
7 times faster than C.
Janne Grunau [Thu, 7 Aug 2014 14:49:12 +0000 (16:49 +0200)]
aarch64: NEON asm for 4x16 sad, satd and ssd
pixel_sad_4x16_neon: 33% faster than C
pixel_satd_4x16_neon: 5 times faster
pixel_ssd_4x16_neon: 4 times faster
Janne Grunau [Wed, 30 Jul 2014 14:48:25 +0000 (15:48 +0100)]
aarch64: implement x264_pixel_ssd_nv12_core_neon
13 times faster than C.
Janne Grunau [Tue, 29 Jul 2014 17:26:11 +0000 (18:26 +0100)]
aarch64: implement x264_pixel_vsad_neon
35 times faster than C.
Janne Grunau [Tue, 29 Jul 2014 10:06:24 +0000 (11:06 +0100)]
aarch64: NEON asm for missing x264_zigzag_* functions
zigzag_scan_4x4_field_neon, zigzag_sub_4x4_field_neon,
zigzag_sub_4x4ac_field_neon, zigzag_sub_4x4_frame_neon,
igzag_sub_4x4ac_frame_neon more than 2 times faster
zigzag_scan_8x8_frame_neon, zigzag_scan_8x8_field_neon,
zigzag_sub_8x8_field_neon, zigzag_sub_8x8_frame_neon 4-5 times faster
zigzag_interleave_8x8_cavlc_neon 6 times faster
Janne Grunau [Fri, 25 Jul 2014 10:53:17 +0000 (11:53 +0100)]
aarch64: implement x264_pixel_sa8d_satd_16x16_neon
~20% faster than calling pixel_sa8d_16x16 and pixel_satd_16x16
separately.
Janne Grunau [Thu, 14 Aug 2014 21:13:27 +0000 (23:13 +0200)]
aarch64: optimize x264_predict_8x8c_dc_left_neon
25% faster than the previous version.