narabu
22 months agoMore fixes of hard-coded values. master
Steinar H. Gunderson [Sun, 12 Nov 2017 10:30:20 +0000 (11:30 +0100)]
More fixes of hard-coded values.

22 months agoParametrize STREAM_BUF_SIZE in the .cpp file.
Steinar H. Gunderson [Sun, 12 Nov 2017 10:07:14 +0000 (11:07 +0100)]
Parametrize STREAM_BUF_SIZE in the .cpp file.

22 months agoFix some hardcoded resolutions in the encoder.
Steinar H. Gunderson [Sun, 12 Nov 2017 10:00:09 +0000 (11:00 +0100)]
Fix some hardcoded resolutions in the encoder.

23 months agoRequire GL_ARB_gpu_shader_int64 instead of GL_NV_gpu_shader5.
Steinar H. Gunderson [Tue, 17 Oct 2017 21:40:00 +0000 (23:40 +0200)]
Require GL_ARB_gpu_shader_int64 instead of GL_NV_gpu_shader5.

23 months agoMake rans.shader write uint32s, shedding the GL_NV_gpu_shader5 demand.
Steinar H. Gunderson [Tue, 17 Oct 2017 21:38:32 +0000 (23:38 +0200)]
Make rans.shader write uint32s, shedding the GL_NV_gpu_shader5 demand.

23 months agoEncapsulate things a bit better in rans.shader.
Steinar H. Gunderson [Tue, 17 Oct 2017 21:28:25 +0000 (23:28 +0200)]
Encapsulate things a bit better in rans.shader.

23 months agoAdd an assert to die early if the shader really fails.
Steinar H. Gunderson [Tue, 17 Oct 2017 21:13:12 +0000 (23:13 +0200)]
Add an assert to die early if the shader really fails.

23 months agoActivate the same-block optimization for the GPU encoder, too.
Steinar H. Gunderson [Tue, 17 Oct 2017 21:01:01 +0000 (23:01 +0200)]
Activate the same-block optimization for the GPU encoder, too.

23 months agoTiny optimization.
Steinar H. Gunderson [Tue, 17 Oct 2017 20:36:42 +0000 (22:36 +0200)]
Tiny optimization.

23 months agoPull the rANS distributions into uniforms instead of SSBOs. Speeds up stuff a bit.
Steinar H. Gunderson [Tue, 17 Oct 2017 20:33:54 +0000 (22:33 +0200)]
Pull the rANS distributions into uniforms instead of SSBOs. Speeds up stuff a bit.

23 months agoKill the division in the rANS GPU encoder.
Steinar H. Gunderson [Tue, 17 Oct 2017 20:05:45 +0000 (22:05 +0200)]
Kill the division in the rANS GPU encoder.

23 months agoReverse the ac3/ac4 coefficients, too.
Steinar H. Gunderson [Mon, 16 Oct 2017 22:21:02 +0000 (00:21 +0200)]
Reverse the ac3/ac4 coefficients, too.

23 months agoFix some texture permissions.
Steinar H. Gunderson [Mon, 16 Oct 2017 22:20:53 +0000 (00:20 +0200)]
Fix some texture permissions.

23 months agoUpdate a boring comment.
Steinar H. Gunderson [Mon, 16 Oct 2017 22:15:13 +0000 (00:15 +0200)]
Update a boring comment.

23 months agoRemove some redundant includes.
Steinar H. Gunderson [Mon, 16 Oct 2017 22:10:29 +0000 (00:10 +0200)]
Remove some redundant includes.

23 months agoFlip the encoding order so that it is correct.
Steinar H. Gunderson [Mon, 16 Oct 2017 22:01:07 +0000 (00:01 +0200)]
Flip the encoding order so that it is correct.

23 months agoAdd more off-by-one fixes.
Steinar H. Gunderson [Mon, 16 Oct 2017 21:58:36 +0000 (23:58 +0200)]
Add more off-by-one fixes.

23 months agoFix an operator precedence issue.
Steinar H. Gunderson [Mon, 16 Oct 2017 21:27:04 +0000 (23:27 +0200)]
Fix an operator precedence issue.

23 months agoRemove coded.dat; it changes too much and is easily recreatable.
Steinar H. Gunderson [Mon, 16 Oct 2017 21:13:13 +0000 (23:13 +0200)]
Remove coded.dat; it changes too much and is easily recreatable.

23 months agoAdd a cargo-culting barrier that seems to help the tally shader.
Steinar H. Gunderson [Mon, 16 Oct 2017 21:12:02 +0000 (23:12 +0200)]
Add a cargo-culting barrier that seems to help the tally shader.

23 months agoSave some needless sign extending, and fix an escaping bug.
Steinar H. Gunderson [Mon, 16 Oct 2017 20:30:57 +0000 (22:30 +0200)]
Save some needless sign extending, and fix an escaping bug.

23 months agoImplement sign_bias in the rANS GPU encoder. Still not working properly.
Steinar H. Gunderson [Mon, 16 Oct 2017 20:16:41 +0000 (22:16 +0200)]
Implement sign_bias in the rANS GPU encoder. Still not working properly.

23 months agoFix some off-by-ones in the tally shader.
Steinar H. Gunderson [Mon, 16 Oct 2017 20:16:27 +0000 (22:16 +0200)]
Fix some off-by-ones in the tally shader.

23 months agoFix some voting problems in the tally shader.
Steinar H. Gunderson [Mon, 16 Oct 2017 19:43:24 +0000 (21:43 +0200)]
Fix some voting problems in the tally shader.

23 months agoMake the encoder 100% GPU. Not working yet, though.
Steinar H. Gunderson [Mon, 16 Oct 2017 19:27:50 +0000 (21:27 +0200)]
Make the encoder 100% GPU. Not working yet, though.

23 months agoSilence some Mesa warnings.
Steinar H. Gunderson [Thu, 12 Oct 2017 17:05:31 +0000 (19:05 +0200)]
Silence some Mesa warnings.

23 months agoAdd rANS normalization to the encoder.
Steinar H. Gunderson [Thu, 12 Oct 2017 17:02:07 +0000 (19:02 +0200)]
Add rANS normalization to the encoder.

23 months agoSpeed up the histogram counting immensely by adding via local memory.
Steinar H. Gunderson [Tue, 10 Oct 2017 20:49:21 +0000 (22:49 +0200)]
Speed up the histogram counting immensely by adding via local memory.

23 months agoMake quant_matrix a bit more compact.
Steinar H. Gunderson [Tue, 10 Oct 2017 16:07:23 +0000 (18:07 +0200)]
Make quant_matrix a bit more compact.

23 months agoStart trying to count the rANS distributions from the encoding shader.
Steinar H. Gunderson [Tue, 10 Oct 2017 16:04:07 +0000 (18:04 +0200)]
Start trying to count the rANS distributions from the encoding shader.

23 months agoMinor whitespace fix.
Steinar H. Gunderson [Mon, 9 Oct 2017 21:13:05 +0000 (23:13 +0200)]
Minor whitespace fix.

23 months agoAdd some more debugging.
Steinar H. Gunderson [Mon, 9 Oct 2017 21:12:55 +0000 (23:12 +0200)]
Add some more debugging.

23 months agoFix the DCT scaling (I believe).
Steinar H. Gunderson [Mon, 9 Oct 2017 21:11:44 +0000 (23:11 +0200)]
Fix the DCT scaling (I believe).

23 months agoFix the upload type of the image.
Steinar H. Gunderson [Sun, 8 Oct 2017 21:26:08 +0000 (23:26 +0200)]
Fix the upload type of the image.

23 months agoUpdate qdd with newer DC coefficient predictions.
Steinar H. Gunderson [Sun, 8 Oct 2017 13:05:37 +0000 (15:05 +0200)]
Update qdd with newer DC coefficient predictions.

23 months agoFix an IDCT error.
Steinar H. Gunderson [Sat, 7 Oct 2017 23:08:25 +0000 (01:08 +0200)]
Fix an IDCT error.

23 months agoA sign fix in the FDCT.
Steinar H. Gunderson [Sat, 7 Oct 2017 22:54:43 +0000 (00:54 +0200)]
A sign fix in the FDCT.

23 months agoAdd the beginnings of a GPU encoder.
Steinar H. Gunderson [Sat, 7 Oct 2017 21:22:14 +0000 (23:22 +0200)]
Add the beginnings of a GPU encoder.

It doesn't really work currently (too buggy), only does DCT
(not the rANS part), and only encodes luma.

23 months agoAdd support for repeating blocks. About 2% size reduction.
Steinar H. Gunderson [Fri, 6 Oct 2017 18:08:46 +0000 (20:08 +0200)]
Add support for repeating blocks. About 2% size reduction.

23 months agoAdd some code for calculating maximum coefficent ranges, for bit allocation.
Steinar H. Gunderson [Thu, 5 Oct 2017 18:30:19 +0000 (20:30 +0200)]
Add some code for calculating maximum coefficent ranges, for bit allocation.

23 months agoRevert "Switch to 64-bit rANS, although probably due for immediate revert (just want...
Steinar H. Gunderson [Tue, 3 Oct 2017 22:44:41 +0000 (00:44 +0200)]
Revert "Switch to 64-bit rANS, although probably due for immediate revert (just want to preserve history)."

A bit larger files, no real speed gain, a few slight bugs.

This reverts commit 3fb87c6b953be3382cd216c74ff6aa025c8eaa2a.

23 months agoSwitch to 64-bit rANS, although probably due for immediate revert (just want to prese...
Steinar H. Gunderson [Tue, 3 Oct 2017 22:38:36 +0000 (00:38 +0200)]
Switch to 64-bit rANS, although probably due for immediate revert (just want to preserve history).

23 months agoDon't print out the shader on failure, as it's not autogenerated.
Steinar H. Gunderson [Tue, 3 Oct 2017 22:30:02 +0000 (00:30 +0200)]
Don't print out the shader on failure, as it's not autogenerated.

23 months agoReduce the spam level from qdc a little bit.
Steinar H. Gunderson [Sun, 24 Sep 2017 19:10:36 +0000 (21:10 +0200)]
Reduce the spam level from qdc a little bit.

23 months agoMake the number of GPU iterations a named constant.
Steinar H. Gunderson [Sun, 24 Sep 2017 17:46:00 +0000 (19:46 +0200)]
Make the number of GPU iterations a named constant.

23 months agoMake the GPU decoder (finally) work with any resolution.
Steinar H. Gunderson [Sun, 24 Sep 2017 17:45:40 +0000 (19:45 +0200)]
Make the GPU decoder (finally) work with any resolution.

23 months agoMake blocks per stream a named constant.
Steinar H. Gunderson [Sun, 24 Sep 2017 17:39:25 +0000 (19:39 +0200)]
Make blocks per stream a named constant.

23 months agoGet -Wall clean.
Steinar H. Gunderson [Sun, 24 Sep 2017 17:28:07 +0000 (19:28 +0200)]
Get -Wall clean.

23 months agoStop hardcoding blocks per row in the shader.
Steinar H. Gunderson [Sun, 24 Sep 2017 16:42:10 +0000 (18:42 +0200)]
Stop hardcoding blocks per row in the shader.

23 months agoPredict Y DC from 128 instead of 0; microscopic improvement.
Steinar H. Gunderson [Sun, 24 Sep 2017 13:35:10 +0000 (15:35 +0200)]
Predict Y DC from 128 instead of 0; microscopic improvement.

23 months agoPredict DC across the entire slice instead of resetting each row. Opens up for slices...
Steinar H. Gunderson [Sun, 24 Sep 2017 13:29:44 +0000 (15:29 +0200)]
Predict DC across the entire slice instead of resetting each row. Opens up for slices crossing rows easier.

23 months agoSanitize compile flags.
Steinar H. Gunderson [Sun, 24 Sep 2017 13:23:03 +0000 (15:23 +0200)]
Sanitize compile flags.

23 months agoMake num_blocks a uniform.
Steinar H. Gunderson [Thu, 21 Sep 2017 21:50:58 +0000 (23:50 +0200)]
Make num_blocks a uniform.

2 years agoUse WIDTH and HEIGHT some places instead of 1280 and 720. narabu is still not ready...
Steinar H. Gunderson [Wed, 20 Sep 2017 21:22:20 +0000 (23:22 +0200)]
Use WIDTH and HEIGHT some places instead of 1280 and 720. narabu is still not ready for anything but 1280px wide, though.

2 years agoPrepare for more flexible slices.
Steinar H. Gunderson [Tue, 19 Sep 2017 23:01:19 +0000 (01:01 +0200)]
Prepare for more flexible slices.

They're now always 320 blocks long, but this will probably change in
the future. Note that changing it from “two rows” to “320 blocks”
made chroma blocks a lot longer, which saved 4.9% bitrate overall.

2 years agoDC predict chroma. ~1.5% lower bitrate.
Steinar H. Gunderson [Sun, 17 Sep 2017 11:45:59 +0000 (13:45 +0200)]
DC predict chroma. ~1.5% lower bitrate.

2 years agoSymbolize NUM_SYMS a bit.
Steinar H. Gunderson [Sun, 17 Sep 2017 09:53:21 +0000 (11:53 +0200)]
Symbolize NUM_SYMS a bit.

2 years agoGo down to 4 rANS streams instead of 8.
Steinar H. Gunderson [Sun, 17 Sep 2017 09:50:04 +0000 (11:50 +0200)]
Go down to 4 rANS streams instead of 8.

Costs approx 0.8% bitrate, but reduces GPU cost from 1,3 to 1,2 ms
(~8%) due to less L1 cache pressure.

2 years agoRevert "k-means instead of k-medoids; doesn't work as well, so just keep it here...
Steinar H. Gunderson [Sun, 17 Sep 2017 09:41:41 +0000 (11:41 +0200)]
Revert "k-means instead of k-medoids; doesn't work as well, so just keep it here to be immediately reverted."

This reverts commit fb83fc30cf33cec1d155b3a63c338bbb64adb4e3.

2 years agok-means instead of k-medoids; doesn't work as well, so just keep it here to be immedi...
Steinar H. Gunderson [Sun, 17 Sep 2017 09:41:36 +0000 (11:41 +0200)]
k-means instead of k-medoids; doesn't work as well, so just keep it here to be immediately reverted.

2 years agoAdd some code for (semi-)optimal assignment of rANS coefficients to streams.
Steinar H. Gunderson [Sun, 17 Sep 2017 09:06:32 +0000 (11:06 +0200)]
Add some code for (semi-)optimal assignment of rANS coefficients to streams.

2 years agoAdd a Makefile.
Steinar H. Gunderson [Sat, 16 Sep 2017 13:57:33 +0000 (15:57 +0200)]
Add a Makefile.

2 years agoAdd a .gitignore file.
Steinar H. Gunderson [Sat, 16 Sep 2017 13:57:24 +0000 (15:57 +0200)]
Add a .gitignore file.

2 years agoAdd some inactive debugging code to store the coefficients.
Steinar H. Gunderson [Sat, 16 Sep 2017 13:38:22 +0000 (15:38 +0200)]
Add some inactive debugging code to store the coefficients.

2 years agoAdd some parallel slicing code (not really a win).
Steinar H. Gunderson [Sat, 16 Sep 2017 13:38:11 +0000 (15:38 +0200)]
Add some parallel slicing code (not really a win).

2 years agoRemove some obsolete caching code.
Steinar H. Gunderson [Sat, 16 Sep 2017 13:36:56 +0000 (15:36 +0200)]
Remove some obsolete caching code.

2 years agoAdd a test image.
Steinar H. Gunderson [Sat, 16 Sep 2017 13:15:52 +0000 (15:15 +0200)]
Add a test image.

2 years agoAdd a PSNR measurement tool.
Steinar H. Gunderson [Sat, 16 Sep 2017 13:11:24 +0000 (15:11 +0200)]
Add a PSNR measurement tool.

2 years agoEncode sign bit directly in rANS, using some symmetry trickery.
Steinar H. Gunderson [Sat, 16 Sep 2017 13:10:19 +0000 (15:10 +0200)]
Encode sign bit directly in rANS, using some symmetry trickery.

2 years agoAdd the GPU decoder itself.
Steinar H. Gunderson [Sat, 16 Sep 2017 13:19:13 +0000 (15:19 +0200)]
Add the GPU decoder itself.

2 years agoAdd the decoder.
Steinar H. Gunderson [Sat, 16 Sep 2017 13:14:36 +0000 (15:14 +0200)]
Add the decoder.

2 years agoAdd color support.
Steinar H. Gunderson [Sat, 16 Sep 2017 13:03:57 +0000 (15:03 +0200)]
Add color support.

2 years agoChange quantization to MPEG-2, some other changes.
Steinar H. Gunderson [Sat, 16 Sep 2017 13:03:21 +0000 (15:03 +0200)]
Change quantization to MPEG-2, some other changes.

2 years agoRevert "Encoder with 4x4 blocks (using TF switching)."
Steinar H. Gunderson [Sat, 16 Sep 2017 13:02:15 +0000 (15:02 +0200)]
Revert "Encoder with 4x4 blocks (using TF switching)."

This reverts commit 13db6a93d746d4152eb30f2d7cc7035d441df8ba.

2 years agoEncoder with 4x4 blocks (using TF switching).
Steinar H. Gunderson [Sat, 16 Sep 2017 13:01:46 +0000 (15:01 +0200)]
Encoder with 4x4 blocks (using TF switching).

2 years agoAdd support for optimal renormalization.
Steinar H. Gunderson [Sun, 27 Aug 2017 18:44:04 +0000 (20:44 +0200)]
Add support for optimal renormalization.

The current code for rounding probabilities down to a fixed resolution
is a bit too crude when resolution is low; whether it's optimal to round
up or down depends on the other frequencies, and the code for stealing
slots from other symbols also doesn't take this into account (as the
comment rightfully points out). These effects only really show up when
getting down to lower resolution, e.g. prob_bits = 10, but there, they
can be quite pronounced.

Add a function (0-clause BSD licensed for optimal usefulness, ie. public
domain without potential difficulty about whether private persons can put
anything in the public domain) to calculate the optimal distribution of
encoding slots, basically through brute force plus some memoization.
For e.g. the basic example (main.cpp), the number of bytes for prob_bits = 10
goes down from 440895 to 437590 bytes. (At the default prob_bits = 14, the
difference is very low; from 435113 to 435093 bytes.)

For main_alias.cpp, which has prob_bits = 16, I've kept the existing code,
so that it's not lost to the mists of time in case someone wants something
that's nearly even cheaper in terms of startup cost.

2 years agoAdd the missing DCT code.
Steinar H. Gunderson [Sat, 16 Sep 2017 13:12:35 +0000 (15:12 +0200)]
Add the missing DCT code.

2 years agoEmbed ryg_rans (from https://github.com/rygorous/ryg_rans).
Steinar H. Gunderson [Sat, 16 Sep 2017 13:05:19 +0000 (15:05 +0200)]
Embed ryg_rans (from https://github.com/rygorous/ryg_rans).

2 years agoInitial checkin.
Steinar H. Gunderson [Sat, 16 Sep 2017 13:01:23 +0000 (15:01 +0200)]
Initial checkin.