=============================================
-SNOW Video Codec Specification Draft 20070103
+Snow Video Codec Specification Draft 20080110
=============================================
-Intro:
-======
-This Specification describes the snow syntax and semmantics as well as
-how to decode snow.
-The decoding process is precissely described and any compliant decoder
-MUST produce the exactly same output for a spec conformant snow stream.
-For encoding though any process which generates a stream compliant to
-the syntactical and semmantical requirements and which is decodeable by
+Introduction:
+=============
+This specification describes the Snow bitstream syntax and semantics as
+well as the formal Snow decoding process.
+
+The decoding process is described precisely and any compliant decoder
+MUST produce the exact same output for a spec-conformant Snow stream.
+For encoding, though, any process which generates a stream compliant to
+the syntactical and semantic requirements and which is decodable by
the process described in this spec shall be considered a conformant
-snow encoder.
+Snow encoder.
Definitions:
============
max_ref_frames-1 u header_state
qlogs
}
+ if(!keyframe){
+ update_mc b header_state
+ if(update_mc){
+ for(plane=0; plane<2; plane++){
+ diag_mc b header_state
+ htaps/2-1 u header_state
+ for(i= p->htaps/2; i; i--)
+ |hcoeff[i]| u header_state
+ }
+ }
+ update_qlogs b header_state
+ if(update_qlogs){
+ spatial_decomposition_count u header_state
+ qlogs
+ }
+ }
spatial_decomposition_type s header_state
qlog s header_state
block(0)
block(level):
+ mvx_diff=mvy_diff=y_diff=cb_diff=cr_diff=0
if(keyframe){
intra=1
- y_diff=cb_diff=cr_diff=0
}else{
if(level!=max_block_depth){
s_context= 2*left->level + 2*top->level + topleft->level + topright->level
residual:
+ residual2(luma)
+ residual2(chroma_cr)
+ residual2(chroma_cb)
+
+residual2:
+ for(level=0; level<spatial_decomposition_count; level++){
+ if(level==0)
+ subband(LL, 0)
+ subband(HL, level)
+ subband(LH, level)
+ subband(HH, level)
+ }
+
+subband:
FIXME
maximum number of reference frames
this MUST NOT change within a bitstream
+update_mc
+ indicates that motion compensation filter parameters are stored in the
+ header
+
+diag_mc
+ flag to enable faster diagonal interpolation
+ this SHOULD be 1 unless it turns out to be covered by a valid patent
+
+htaps
+ number of half pel interpolation filter taps, MUST be even, >0 and <10
+
+hcoeff
+ half pel interpolation filter coefficients, hcoeff[0] are the 2 middle
+ coefficients [1] are the next outer ones and so on, resulting in a filter
+ like: ...eff[2], hcoeff[1], hcoeff[0], hcoeff[0], hcoeff[1], hcoeff[2] ...
+ the sign of the coefficients is not explicitly stored but alternates
+ after each coeff and coeff[0] is positive, so ...,+,-,+,-,+,+,-,+,-,+,...
+ hcoeff[0] is not explicitly stored but found by subtracting the sum
+ of all stored coefficients with signs from 32
+ hcoeff[0]= 32 - hcoeff[1] - hcoeff[2] - ...
+ a good choice for hcoeff and htaps is
+ htaps= 6
+ hcoeff={40,-10,2}
+ an alternative which requires more computations at both encoder and
+ decoder side and may or may not be better is
+ htaps= 8
+ hcoeff={42,-14,6,-2}
+
+
ref_frames
minimum of the number of available reference frames and max_ref_frames
for example the first frame after a key frame always has ref_frames=1
quant_table
quantiztation table
+
+Highlevel bitstream structure:
+=============================
+ --------------------------------------------
+| Header |
+ --------------------------------------------
+| ------------------------------------ |
+| | Block0 | |
+| | split? | |
+| | yes no | |
+| | ......... intra? | |
+| | : Block01 : yes no | |
+| | : Block02 : ....... .......... | |
+| | : Block03 : : y DC : : ref index: | |
+| | : Block04 : : cb DC : : motion x : | |
+| | ......... : cr DC : : motion y : | |
+| | ....... .......... | |
+| ------------------------------------ |
+| ------------------------------------ |
+| | Block1 | |
+| ... |
+ --------------------------------------------
+| ------------ ------------ ------------ |
+|| Y subbands | | Cb subbands| | Cr subbands||
+|| --- --- | | --- --- | | --- --- ||
+|| |LL0||HL0| | | |LL0||HL0| | | |LL0||HL0| ||
+|| --- --- | | --- --- | | --- --- ||
+|| --- --- | | --- --- | | --- --- ||
+|| |LH0||HH0| | | |LH0||HH0| | | |LH0||HH0| ||
+|| --- --- | | --- --- | | --- --- ||
+|| --- --- | | --- --- | | --- --- ||
+|| |HL1||LH1| | | |HL1||LH1| | | |HL1||LH1| ||
+|| --- --- | | --- --- | | --- --- ||
+|| --- --- | | --- --- | | --- --- ||
+|| |HH1||HL2| | | |HH1||HL2| | | |HH1||HL2| ||
+|| ... | | ... | | ... ||
+| ------------ ------------ ------------ |
+ --------------------------------------------
+
+Decoding process:
+=================
+
+ ------------
+ | |
+ | Subbands |
+ ------------ | |
+ | | ------------
+ | Intra DC | |
+ | | LL0 subband prediction
+ ------------ |
+ \ Dequantizaton
+ ------------------- \ |
+| Reference frames | \ IDWT
+| ------- ------- | Motion \ |
+||Frame 0| |Frame 1|| Compensation . OBMC v -------
+| ------- ------- | --------------. \------> + --->|Frame n|-->output
+| ------- ------- | -------
+||Frame 2| |Frame 3||<----------------------------------/
+| ... |
+ -------------------
+
+
Range Coder:
============
+
+Binary Range Coder:
+-------------------
+The implemented range coder is an adapted version based upon "Range encoding:
+an algorithm for removing redundancy from a digitised message." by G. N. N.
+Martin.
+The symbols encoded by the Snow range coder are bits (0|1). The
+associated probabilities are not fix but change depending on the symbol mix
+seen so far.
+
+
+bit seen | new state
+---------+-----------------------------------------------
+ 0 | 256 - state_transition_table[256 - old_state];
+ 1 | state_transition_table[ old_state];
+
+state_transition_table = {
+ 0, 0, 0, 0, 0, 0, 0, 0, 20, 21, 22, 23, 24, 25, 26, 27,
+ 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 37, 38, 39, 40, 41, 42,
+ 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 56, 57,
+ 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73,
+ 74, 75, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88,
+ 89, 90, 91, 92, 93, 94, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103,
+104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 114, 115, 116, 117, 118,
+119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 133,
+134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149,
+150, 151, 152, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164,
+165, 166, 167, 168, 169, 170, 171, 171, 172, 173, 174, 175, 176, 177, 178, 179,
+180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 190, 191, 192, 194, 194,
+195, 196, 197, 198, 199, 200, 201, 202, 202, 204, 205, 206, 207, 208, 209, 209,
+210, 211, 212, 213, 215, 215, 216, 217, 218, 219, 220, 220, 222, 223, 224, 225,
+226, 227, 227, 229, 229, 230, 231, 232, 234, 234, 235, 236, 237, 238, 239, 240,
+241, 242, 243, 244, 245, 246, 247, 248, 248, 0, 0, 0, 0, 0, 0, 0};
+
FIXME
+
+Range Coding of integers:
+-------------------------
+FIXME
+
+
Neighboring Blocks:
===================
left and top are set to the respective blocks unless they are outside of
the used luma and chroma is the sum of the predictor and y_diff, cb_diff, cr_diff
to reverse this in the decoder apply the following:
-block[y][x].dc[0] += block[y][x-1].dc[0];
-block[y][x].dc[1] += block[y][x-1].dc[1];
-block[y][x].dc[2] += block[y][x-1].dc[2];
+block[y][x].dc[0] = block[y][x-1].dc[0] + y_diff;
+block[y][x].dc[1] = block[y][x-1].dc[1] + cb_diff;
+block[y][x].dc[2] = block[y][x-1].dc[2] + cr_diff;
block[*][-1].dc[*]= 128;
Motion Compensation:
====================
+
+Halfpel interpolation:
+----------------------
+halfpel interpolation is done by convolution with the halfpel filter stored
+in the header:
+
+horizontal halfpel samples are found by
+H1[y][x] = hcoeff[0]*(F[y][x ] + F[y][x+1])
+ + hcoeff[1]*(F[y][x-1] + F[y][x+2])
+ + hcoeff[2]*(F[y][x-2] + F[y][x+3])
+ + ...
+h1[y][x] = (H1[y][x] + 32)>>6;
+
+vertical halfpel samples are found by
+H2[y][x] = hcoeff[0]*(F[y ][x] + F[y+1][x])
+ + hcoeff[1]*(F[y-1][x] + F[y+2][x])
+ + ...
+h2[y][x] = (H2[y][x] + 32)>>6;
+
+vertical+horizontal halfpel samples are found by
+H3[y][x] = hcoeff[0]*(H2[y][x ] + H2[y][x+1])
+ + hcoeff[1]*(H2[y][x-1] + H2[y][x+2])
+ + ...
+H3[y][x] = hcoeff[0]*(H1[y ][x] + H1[y+1][x])
+ + hcoeff[1]*(H1[y+1][x] + H1[y+2][x])
+ + ...
+h3[y][x] = (H3[y][x] + 2048)>>12;
+
+
+ F H1 F
+ | | |
+ | | |
+ | | |
+ F H1 F
+ | | |
+ | | |
+ | | |
+ F-------F-------F-> H1<-F-------F-------F
+ v v v
+ H2 H3 H2
+ ^ ^ ^
+ F-------F-------F-> H1<-F-------F-------F
+ | | |
+ | | |
+ | | |
+ F H1 F
+ | | |
+ | | |
+ | | |
+ F H1 F
+
+
+unavailable fullpel samples (outside the picture for example) shall be equal
+to the closest available fullpel sample
+
+
+Smaller pel interpolation:
+--------------------------
+if diag_mc is set then points which lie on a line between 2 vertically,
+horiziontally or diagonally adjacent halfpel points shall be interpolated
+linearls with rounding to nearest and halfway values rounded up.
+points which lie on 2 diagonals at the same time should only use the one
+diagonal not containing the fullpel point
+
+
+
+ F-->O---q---O<--h1->O---q---O<--F
+ v \ / v \ / v
+ O O O O O O O
+ | / | \ |
+ q q q q q
+ | / | \ |
+ O O O O O O O
+ ^ / \ ^ / \ ^
+ h2-->O---q---O<--h3->O---q---O<--h2
+ v \ / v \ / v
+ O O O O O O O
+ | \ | / |
+ q q q q q
+ | \ | / |
+ O O O O O O O
+ ^ / \ ^ / \ ^
+ F-->O---q---O<--h1->O---q---O<--F
+
+
+
+the remaining points shall be bilinearly interpolated from the
+up to 4 surrounding halfpel and fullpel points, again rounding should be to
+nearest and halfway values rounded up
+
+compliant Snow decoders MUST support 1-1/8 pel luma and 1/2-1/16 pel chroma
+interpolation at least
+
+
+Overlapped block motion compensation:
+-------------------------------------
FIXME
LL band prediction:
| + | + | + | + +1/2
-snows 9/7 Integer filter:
+Snow's 9/7 Integer filter:
1. s[i] -= (3*(s[i-1] + s[i+1]) + 4)>>3; for all even i < w
2. s[i] -= s[i-1] + s[i+1] ; for all odd i < w
3. s[i] += ( s[i-1] + s[i+1] + 4*s[i] + 8)>>4; for all even i < w
(3a)>>1 == a + (a>>1)
(a + 4b + 8)>>4 == ((a>>2) + b + 2)>>2
+16bit implementation note:
+The IDWT can be implemented with 16bits, but this requires some care to
+prevent overflows, the following list, lists the minimum number of bits needed
+for some terms
+1. lifting step
+A= s[i-1] + s[i+1] 16bit
+3*A + 4 18bit
+A + (A>>1) + 2 17bit
+
+3. lifting step
+s[i-1] + s[i+1] 17bit
+
+4. lifiting step
+3*(s[i-1] + s[i+1]) 17bit
+
+
TODO:
=====
Important:
finetune initial contexts
-spatial_decomposition_count per frame?
flip wavelet?
try to use the wavelet transformed predicted image (motion compensated image) as context for coding the residual coefficients
try the MV length as context for coding the residual coefficients
use extradata for stuff which is in the keyframes now?
the MV median predictor is patented IIRC
-change MC so per picture halfpel interpolation can be done and finish the implementation of it
-compare the 6 tap and 8 tap hpel filters (psnr/bitrate and subjective quality)
+implement per picture halfpel interpolation
try different range coder state transition tables for different contexts
Not Important:
+compare the 6 tap and 8 tap hpel filters (psnr/bitrate and subjective quality)
spatial_scalability b vs u (!= 0 breaks syntax anyway so we can add a u later)