1 <chapter> <title> How to write a decoder </title>
3 <sect1> <title> What is precisely a decoder in the VLC scheme ? </title>
6 The decoder does the mathematical part of the process of playing a
7 stream. It is separated from the demultiplexers (in the input module),
8 which manage packets to rebuild a continuous elementary stream, and from
9 the output thread, which takes samples reconstituted by the decoder
10 and plays them. Basically, a decoder has no interaction with devices,
11 it is purely algorithmic.
15 In the next section we will describe how the decoder retrieves the
16 stream from the input. The output API (how to say "this sample is
17 decoded and can be played at xx") will be talked about in the next
23 <sect1> <title> Decoder configuration </title>
26 The input thread spawns the appropriate decoder modules from <filename>
27 src/input/input_dec.c</filename>. The <function>Dec_CreateThread</function>
28 function selects the more accurate decoder module. Each decoder module
29 looks at decoder_config.i_type and returns a score [ see the modules
30 section ]. It then launches <function> module.pf_run()</function>,
31 with a <type>decoder_config_t</type>, described in <filename>
32 include/input_ext-dec.h</filename>.
36 The generic <type>decoder_config_t</type> structure, gives the decoder
37 the ES ID and type, and pointers to a <type> stream_control_t </type>
38 structure (gives information on the play status), a <type> decoder_fifo_t
39 </type> and <parameter> pf_init_bit_stream</parameter>, which will be
40 described in the next two sections.
45 <sect1> <title> Packet structures </title>
48 The input module provides an advanced API for delivering stream data
49 to the decoders. First let's have a look at the packet structures.
50 They are defined in <filename> include/input_ext-dec.h</filename>.
54 <type>data_packet_t</type> contains a pointer to the physical location
55 of data. Decoders should only start to read them at <parameter>
56 p_payload_start </parameter> until <parameter> p_payload_end</parameter>.
57 Thereafter, it will switch to the next packet, <parameter> p_next
58 </parameter> if it is not <constant>NULL</constant>. If the
59 <parameter> b_discard_payload
60 </parameter> flag is up, the content of the packet is messed up and it
65 <type>data_packet_t</type> are contained into <type>pes_packet_t</type>.
66 <type>pes_packet_t</type> features a chained list
67 (<parameter>p_first</parameter>) of <type>data_packet_t
68 </type> representing (in the MPEG paradigm) a complete PES packet. For
69 PS streams, a <type> pes_packet_t </type> usually only contains one
70 <type>data_packet_t</type>. In TS streams though, one PES can be split
71 among dozens of TS packets. A PES packet has PTS dates (see your
72 MPEG specification for more information) and the current pace of reading
73 that should be applied for interpolating dates (<parameter>i_rate</parameter>).
74 <parameter> b_data_alignment </parameter> (if available in the system
75 layer) indicates if the packet is a random access point, and <parameter>
76 b_discontinuity </parameter> tells whether previous packets have been
82 <imagedata fileref="ps.eps" format="EPS" scalefit="1" scale="95" />
85 <imagedata fileref="ps.gif" format="GIF" />
88 <phrase> A PES packet in a Program Stream </phrase>
91 <para> In a Program Stream, a PES packet features only one
92 data packet, whose buffer contains the PS header, the PES
93 header, and the data payload.
100 <imagedata fileref="ts.eps" format="EPS" scalefit="1" scale="95" />
103 <imagedata fileref="ts.gif" format="GIF" />
106 <phrase> A PES packet in a Transport Stream </phrase>
109 <para> In a Transport Stream, a PES packet can feature an
110 unlimited number of data packets (three on the figure)
111 whose buffers contains the PS header, the PES
112 header, and the data payload.
118 The structure shared by both the input and the decoder is <type>
119 decoder_fifo_t</type>. It features a rotative FIFO of PES packets to
120 be decoded. The input provides macros to manipulate it : <function>
121 DECODER_FIFO_ISEMPTY, DECODER_FIFO_ISFULL, DECODER_FIFO_START,
122 DECODER_FIFO_INCSTART, DECODER_FIFO_END, DECODER_FIFO_INCEND</function>.
123 Please remember to take <parameter>p_decoder_fifo->data_lock
124 </parameter> before any operation on the FIFO.
128 The next packet to be decoded is DECODER_FIFO_START( *p_decoder_fifo ).
129 When it is finished, you need to call <function>
130 p_decoder_fifo->pf_delete_pes( p_decoder_fifo->p_packets_mgt,
131 DECODER_FIFO_START( *p_decoder_fifo ) ) </function> and then
132 <function> DECODER_FIFO_INCSTART( *p_decoder_fifo )</function> to
133 return the PES to the <link linkend="input_buff">buffer manager</link>.
137 If the FIFO is empty (<function>DECODER_FIFO_ISEMPTY</function>), you
138 can block until a new packet is received with a cond signal :
139 <function> vlc_cond_wait( &p_fifo->data_wait,
140 &p_fifo->data_lock )</function>. You have to hold the lock before
141 entering this function. If the file is over or the user quits,
142 <parameter>p_fifo->b_die</parameter> will be set to 1. It indicates
143 that you must free all your data structures and call <function>
144 vlc_thread_exit() </function> as soon as possible.
149 <sect1> <title> The bit stream (input module) </title>
152 This classical way of reading packets is not convenient, though, since
153 the elementary stream can be split up arbitrarily. The input module
154 provides primitives which make reading a bit stream much easier.
155 Whether you use it or not is at your option, though if you use it you
156 shouldn't access the packet buffer any longer.
160 The bit stream allows you to just call <function> GetBits()</function>,
161 and this functions will transparently read the packet buffers, change
162 data packets and pes packets when necessary, without any intervention
163 from you. So it is much more convenient for you to read a continuous
164 Elementary Stream, you don't have to deal with packet boundaries
165 and the FIFO, the bit stream will do it for you.
169 The central idea is to introduce a buffer of 32 bits [normally
170 <type> WORD_TYPE</type>, but 64-bit version doesn't work yet], <type>
171 bit_fifo_t</type>. It contains the word buffer and the number of
172 significant bits (higher part). The input module provides five
173 inline functions to manage it :
177 <listitem> <para> <type> u32 </type> <function> GetBits </function>
178 <parameter>( bit_stream_t * p_bit_stream, unsigned int i_bits )
180 Returns the next <parameter> i_bits </parameter> bits from the
181 bit buffer. If there are not enough bits, it fetches the following
182 word from the <type>decoder_fifo_t</type>. This function is only
183 guaranteed to work with up to 24 bits. For the moment it works until
184 31 bits, but it is a side effect. We were obliged to write a different
185 function, <function>GetBits32</function>, for 32-bit reading,
186 because of the << operator.
189 <listitem> <para> <function> RemoveBits </function> <parameter>
190 ( bit_stream_t * p_bit_stream, unsigned int i_bits ) </parameter> :
191 The same as <function> GetBits()</function>, except that the bits
192 aren't returned (we spare a few CPU cycles). It has the same
193 limitations, and we also wrote <function> RemoveBits32</function>.
196 <listitem> <para> <type> u32 </type> <function> ShowBits </function>
197 <parameter>( bit_stream_t * p_bit_stream, unsigned int i_bits )
199 The same as <function> GetBits()</function>, except that the bits
200 don't get flushed after reading, so that you need to call
201 <function> RemoveBits() </function> by hand afterwards. Beware,
202 this function won't work above 24 bits, except if you're aligned
203 on a byte boundary (see next function).
206 <listitem> <para> <function> RealignBits </function> <parameter>
207 ( bit_stream_t * p_bit_stream ) </parameter> :
208 Drops the n higher bits (n < 8), so that the first bit of
209 the buffer be aligned an a byte boundary. It is useful when
210 looking for an aligned startcode (MPEG for instance).
213 <listitem> <para> <function> GetChunk </function> <parameter>
214 ( bit_stream_t * p_bit_stream, byte_t * p_buffer, size_t i_buf_len )
216 It is an analog of <function> memcpy()</function>, but taking
217 a bit stream as first argument. <parameter> p_buffer </parameter>
218 must be allocated and at least <parameter> i_buf_len </parameter>
219 long. It is useful to copy data you want to keep track of.
224 All these functions recreate a continuous elementary stream paradigm.
225 When the bit buffer is empty, they take the following word in the
226 current packet. When the packet is empty, it switches to the next
227 <type>data_packet_t</type>, or if unapplicable to the next <type>
228 pes_packet_t</type> (see <function>
229 p_bit_stream->pf_next_data_packet</function>). All this is
230 completely transparent.
233 <note> <title> Packet changes and alignment issues </title>
235 We have to study the conjunction of two problems. First, a
236 <type> data_packet_t </type> can have an even number of bytes,
237 for instance 177, so the last word will be truncated. Second,
238 many CPU (sparc, alpha...) can only read words aligned on a
239 word boundary (that is, 32 bits for a 32-bit word). So packet
240 changes are a lot more complicated than you can imagine, because
241 we have to read truncated words and get aligned.
245 For instance <function> GetBits() </function> will call
246 <function> UnalignedGetBits() </function> from <filename>
247 src/input/input_ext-dec.c</filename>. Basically it will
248 read byte after byte until the stream gets realigned. <function>
249 UnalignedShowBits() </function> is a bit more complicated
250 and may require a temporary packet
251 (<parameter>p_bit_stream->showbits_data</parameter>).
255 To use the bit stream, you have to call <parameter>
256 p_decoder_config->pf_init_bit_stream( bit_stream_t * p_bit_stream,
257 decoder_fifo_t * p_fifo )</parameter> to set up all variables. You will
258 probably need to regularly fetch specific information from the packet,
259 for instance the PTS. If <parameter> p_bit_stream->pf_bit_stream_callback
260 </parameter> is not <constant> NULL</constant>, it will be called
261 on a packet change. See <filename> src/video_parser/video_parser.c
262 </filename> for an example. The second argument
263 indicates whether it is just a new <type>data_packet_t</type> or
264 also a new <type>pes_packet_t</type>. You can store your own structure in
265 <parameter> p_bit_stream->p_callback_arg</parameter>.
269 When you call <function>pf_init_bit_stream</function>, the
270 <function>pf_bitstream_callback</function> is not defined yet,
271 but it jumps to the first packet, though. You will probably
272 want to call your bitstream callback by hand just after
273 <function> pf_init_bit_stream</function>.
278 <sect1> <title> Built-in decoders </title>
281 VLC already features an MPEG layer 1 and 2 audio decoder, an MPEG MP@ML
282 video decoder, an AC3 decoder (borrowed from LiViD), a DVD SPU decoder,
283 and an LPCM decoder. You can write your own decoder, just mimic the
287 <note> <title> Limitations in the current design </title>
289 To add a new decoder, you'll still have to add the stream type as there's
290 still a a hard-wired piece of code in <filename> src/input/input_programs.c
295 The MPEG audio decoder is native, but doesn't support layer 3 decoding
296 [too much trouble], the AC3 decoder is a port from Aaron
297 Holtzman's libac3 (the original libac3 isn't reentrant), and the
298 SPU decoder is native. You may want to have a look at <function>
299 BitstreamCallback </function> in the AC3 decoder. In that case we have
300 to jump the first 3 bytes of a PES packet, which are not part of the
301 elementary stream. The video decoder is a bit special and will
302 be described in the following section.
307 <sect1> <title> The MPEG video decoder </title>
310 VideoLAN Client provides an MPEG-1, and an MPEG-2 Main Profile @
311 Main Level decoder. It has been natively written for VLC, and is quite
312 mature. Its status is a bit special, since it is splitted between two
313 logicial entities : video parser and video decoder.
314 The initial goal is to separate bit stream parsing functions from
315 highly parallelizable mathematical algorithms. In theory, there can be
316 one video parser thread (and only one, otherwise we would have race
317 conditions reading the bit stream), along with a pool of video decoder
318 threads, which do IDCT and motion compensation on several blocks
323 It doesn't (and won't) support MPEG-4 or DivX decoding. It is not an
324 encoder. It should support the whole MPEG-2 MP@ML specification, though
325 some features are still left untested, like Differential Motion Vectors.
326 Please bear in mind before complaining that the input elementary stream
327 must be valid (for instance this is not the case when you directly read
328 a DVD multi-angle .vob file).
332 The most interesting file is <filename> vpar_synchro.c</filename>, it is
333 really worth the shot. It explains the whole frame dropping algorithm.
334 In a nutshell, if the machine is powerful enough, we decoder all IPBs,
335 otherwise we decode all IPs and Bs if we have enough time (this is
336 based on on-the-fly decoding time statistics). Another interesting file
337 is <filename>vpar_blocks.c</filename>, which describes all block
338 (including coefficients and motion vectors) parsing algorithms. Look
339 at the bottom of the file, we indeed generate one optimized function
340 for every common picture type, and one slow generic function. There
341 are also several levels of optimization (which makes compilation slower
342 but certain types of files faster decoded) called <constant>
343 VPAR_OPTIM_LEVEL</constant>, level 0 means no optimization, level 1
344 means optimizations for MPEG-1 and MPEG-2 frame pictures, level 2
345 means optimizations for MPEG-1 and MPEG-2 field and frame pictures.
348 <sect2> <title> Motion compensation plug-ins </title>
351 Motion compensation (i.e. copy of regions from a reference picture) is
352 very platform-dependant (for instance with MMX or AltiVec versions), so
353 we moved it to the <filename> plugins/motion </filename> directory. It
354 is more convenient for the video decoder, and resulting plug-ins may
355 be used by other video decoders (MPEG-4 ?). A motion plugin must
356 define 6 functions, coming straight from the specification :
357 <function> vdec_MotionFieldField420, vdec_MotionField16x8420,
358 vdec_MotionFieldDMV420, vdec_MotionFrameFrame420, vdec_MotionFrameField420,
359 vdec_MotionFrameDMV420</function>. The equivalent 4:2:2 and 4:4:4
360 functions are unused, since these formats are forbidden in MP@ML (it
361 would only take longer compilation time).
365 Look at the C version of the algorithms if you want more information.
366 Note also that the DMV algorithm is untested and is probably buggy.
371 <sect2> <title> IDCT plug-ins </title>
374 Just like motion compensation, IDCT is platform-specific. So we moved it
375 to <filename> plugins/idct</filename>. This module does the IDCT
376 calculation, and copies the data to the final picture. You need to define
381 <listitem> <para> <function> vdec_IDCT </function> <parameter>
382 ( decoder_config_t * p_config, dctelem_t * p_block, int )
384 Does the complete 2-D IDCT. 64 coefficients are in <parameter>
388 <listitem> <para> <function> vdec_SparseIDCT </function>
389 <parameter> ( vdec_thread_t * p_vdec, dctelem_t * p_block,
390 int i_sparse_pos ) </parameter> :
391 Does an IDCT on a block with only one non-NULL coefficient
392 (designated by <parameter> i_sparse_pos</parameter>). You can
393 use the function defined in <filename> plugins/idct/idct_common.c
394 </filename> which precalculates these 64 matrices at
398 <listitem> <para> <function> vdec_InitIDCT </function>
399 <parameter> ( vdec_thread_t * p_vdec ) </parameter> :
400 Does the initialization stuff needed by <function>
401 vdec_SparseIDCT</function>.
404 <listitem> <para> <function> vdec_NormScan </function>
405 <parameter> ( u8 ppi_scan[2][64] ) </parameter> :
406 Normally, this function does nothing. For minor optimizations,
407 some IDCT (MMX) need to invert certain coefficients in the
408 MPEG scan matrices (see ISO/IEC 13818-2).
411 <listitem> <para> <function> vdec_InitDecode </function>
412 <parameter> ( struct vdec_thread_s * p_vdec ) </parameter> :
413 Initializes the IDCT and optional crop tables.
416 <listitem> <para> <function> vdec_DecodeMacroblockC </function>
417 <parameter> ( struct vdec_thread_s *p_vdec,
418 struct macroblock_s * p_mb ); </parameter> :
419 Decodes an entire macroblock and copies its data to the final
420 picture, including chromatic information.
423 <listitem> <para> <function> vdec_DecodeMacroblockBW </function>
424 <parameter> ( struct vdec_thread_s *p_vdec,
425 struct macroblock_s * p_mb ); </parameter> :
426 Decodes an entire macroblock and copies its data to the final
427 picture, except chromatic information (used in grayscale mode).
432 Currently we have implemented optimized versions for : MMX, MMXEXT, and
433 AltiVec [doesn't work]. We have two plain C versions, the normal
434 (supposedly optimized) Berkeley version (<filename>idct.c</filename>),
435 and the simple 1-D separation IDCT from the ISO reference decoder
436 (<filename>idctclassic.c</filename>).
441 <sect2> <title> Symmetrical Multiprocessing </title>
444 The MPEG video decoder of VLC can take advantage of several processors if
445 necessary. The idea is to launch a pool of decoders, which will do
446 IDCT/motion compensation on several macroblocks at once.
450 The functions managing the pool are in <filename>
451 src/video_decoder/vpar_pool.c</filename>. Its use on non-SMP machines is
452 not recommanded, since it is actually slower than the monothread version.
453 Even on SMP machines sometimes...