git.sesse.net Git - vlc/blob - doc/developer/decoders.xml

   1 <chapter> <title> How to write a decoder </title>
   2
   3   <sect1> <title> What is precisely a decoder in the VLC scheme ? </title>
   4
   5     <para>
   6 The decoder does the mathematical part of the process of playing a
   7 stream. It is separated from the demultiplexers (in the input module),
   8 which manage packets to rebuild a continuous elementary stream, and from
   9 the output thread, which takes samples reconstituted by the decoder
  10 and plays them. Basically, a decoder has no interaction with devices,
  11 it is purely algorithmic.
  12     </para>
  13
  14     <para>
  15 In the next section we will describe how the decoder retrieves the
  16 stream from the input. The output API (how to say "this sample is
  17 decoded and can be played at xx") will be talked about in the next
  18 chapters.
  19     </para>
  20
  21   </sect1>
  22
  23   <sect1> <title> Decoder configuration </title>
  24
  25     <para>
  26 The input thread spawns the appropriate decoder modules from <filename>
  27 src/input/input_dec.c</filename>. The <function>Dec_CreateThread</function>
  28 function selects the more accurate decoder module. Each decoder module
  29 looks at decoder_config.i_type and returns a score [ see the modules
  30 section ]. It then launches <function> module.pf_run()</function>,
  31 with a <type>decoder_config_t</type>, described in <filename>
  32 include/input_ext-dec.h</filename>.
  33     </para>
  34
  35     <para>
  36 The generic <type>decoder_config_t</type> structure, gives the decoder
  37 the ES ID and type, and pointers to a <type> stream_control_t </type>
  38 structure (gives information on the play status), a <type> decoder_fifo_t
  39 </type> and <parameter> pf_init_bit_stream</parameter>, which will be
  40 described in the next two sections.
  41     </para>
  42
  43   </sect1>
  44
  45   <sect1> <title> Packet structures </title>
  46
  47     <para>
  48 The input module provides an advanced API for delivering stream data
  49 to the decoders. First let's have a look at the packet structures.
  50 They are defined in <filename> include/input_ext-dec.h</filename>.
  51     </para>
  52
  53     <para>
  54 <type>data_packet_t</type> contains a pointer to the physical location
  55 of data. Decoders should only start to read them at <parameter>
  56 p_payload_start </parameter> until <parameter> p_payload_end</parameter>.
  57 Thereafter, it will switch to the next packet, <parameter> p_next
  58 </parameter> if it is not <constant>NULL</constant>. If the
  59 <parameter> b_discard_payload
  60 </parameter> flag is up, the content of the packet is messed up and it
  61 should be discarded.
  62     </para>
  63
  64     <para>
  65 <type>data_packet_t</type> are contained into <type>pes_packet_t</type>.
  66 <type>pes_packet_t</type> features a chained list
  67 (<parameter>p_first</parameter>) of <type>data_packet_t
  68 </type> representing (in the MPEG paradigm) a complete PES packet. For
  69 PS streams, a <type> pes_packet_t </type> usually only contains one
  70 <type>data_packet_t</type>. In TS streams though, one PES can be split
  71 among dozens of TS packets. A PES packet has PTS dates (see your
  72 MPEG specification for more information) and the current pace of reading
  73 that should be applied for interpolating dates (<parameter>i_rate</parameter>).
  74 <parameter> b_data_alignment </parameter> (if available in the system
  75 layer) indicates if the packet is a random access point, and <parameter>
  76 b_discontinuity </parameter> tells whether previous packets have been
  77 dropped.
  78     </para>
  79
  80     <mediaobject>
  81       <imageobject>
  82         <imagedata fileref="ps.png" format="PNG" scalefit="1" scale="95" />
  83       </imageobject>
  84       <imageobject>
  85         <imagedata fileref="ps.gif" format="GIF" />
  86       </imageobject>
  87       <textobject>
  88         <phrase> A PES packet in a Program Stream </phrase>
  89       </textobject>
  90       <caption>
  91         <para> In a Program Stream, a PES packet features only one
  92         data packet, whose buffer contains the PS header, the PES
  93         header, and the data payload.
  94         </para>
  95       </caption>
  96     </mediaobject>
  97
  98     <mediaobject>
  99       <imageobject>
 100         <imagedata fileref="ts.png" format="PNG" scalefit="1" scale="95" />
 101       </imageobject>
 102       <imageobject>
 103         <imagedata fileref="ts.gif" format="GIF" />
 104       </imageobject>
 105       <textobject>
 106         <phrase> A PES packet in a Transport Stream </phrase>
 107       </textobject>
 108       <caption>
 109         <para> In a Transport Stream, a PES packet can feature an
 110         unlimited number of data packets (three on the figure)
 111         whose buffers contains the PS header, the PES
 112         header, and the data payload.
 113         </para>
 114       </caption>
 115     </mediaobject>
 116
 117     <para>
 118 The structure shared by both the input and the decoder is <type>
 119 decoder_fifo_t</type>. It features a rotative FIFO of PES packets to
 120 be decoded. The input provides macros to manipulate it : <function>
 121 DECODER_FIFO_ISEMPTY, DECODER_FIFO_ISFULL, DECODER_FIFO_START,
 122 DECODER_FIFO_INCSTART, DECODER_FIFO_END, DECODER_FIFO_INCEND</function>.
 123 Please remember to take <parameter>p_decoder_fifo-&gt;data_lock
 124 </parameter> before any operation on the FIFO.
 125     </para>
 126
 127     <para>
 128 The next packet to be decoded is DECODER_FIFO_START( *p_decoder_fifo ).
 129 When it is finished, you need to call <function>
 130 p_decoder_fifo-&gt;pf_delete_pes( p_decoder_fifo-&gt;p_packets_mgt,
 131 DECODER_FIFO_START( *p_decoder_fifo ) ) </function> and then
 132 <function> DECODER_FIFO_INCSTART( *p_decoder_fifo )</function> to
 133 return the PES to the <link linkend="input_buff">buffer manager</link>.
 134     </para>
 135
 136     <para>
 137 If the FIFO is empty (<function>DECODER_FIFO_ISEMPTY</function>), you
 138 can block until a new packet is received with a cond signal :
 139 <function> vlc_cond_wait( &amp;p_fifo-&gt;data_wait,
 140 &amp;p_fifo-&gt;data_lock )</function>. You have to hold the lock before
 141 entering this function. If the file is over or the user quits,
 142 <parameter>p_fifo-&gt;b_die</parameter> will be set to 1. It indicates
 143 that you must free all your data structures and call <function>
 144 vlc_thread_exit() </function> as soon as possible.
 145     </para>
 146
 147   </sect1>
 148
 149   <sect1> <title> The bit stream (input module) </title>
 150
 151     <para>
 152 This classical way of reading packets is not convenient, though, since
 153 the elementary stream can be split up arbitrarily. The input module
 154 provides primitives which make reading a bit stream much easier.
 155 Whether you use it or not is at your option, though if you use it you
 156 shouldn't access the packet buffer any longer.
 157     </para>
 158
 159     <para>
 160 The bit stream allows you to just call <function> GetBits()</function>,
 161 and this functions will transparently read the packet buffers, change
 162 data packets and pes packets when necessary, without any intervention
 163 from you. So it is much more convenient for you to read a continuous
 164 Elementary Stream, you don't have to deal with packet boundaries
 165 and the FIFO, the bit stream will do it for you.
 166     </para>
 167
 168     <para>
 169 The central idea is to introduce a buffer of 32 bits [normally
 170 <type> WORD_TYPE</type>, but 64-bit version doesn't work yet], <type>
 171 bit_fifo_t</type>. It contains the word buffer and the number of
 172 significant bits (higher part). The input module provides five
 173 inline functions to manage it :
 174     </para>
 175
 176     <itemizedlist>
 177       <listitem> <para> <type> u32 </type> <function> GetBits </function>
 178       <parameter>( bit_stream_t * p_bit_stream, unsigned int i_bits )
 179       </parameter> :
 180       Returns the next <parameter> i_bits </parameter> bits from the
 181       bit buffer. If there are not enough bits, it fetches the following
 182       word from the <type>decoder_fifo_t</type>. This function is only
 183       guaranteed to work with up to 24 bits. For the moment it works until
 184       31 bits, but it is a side effect. We were obliged to write a different
 185       function, <function>GetBits32</function>, for 32-bit reading,
 186       because of the &lt;&lt; operator.
 187       </para> </listitem>
 188
 189       <listitem> <para> <function> RemoveBits </function> <parameter>
 190       ( bit_stream_t * p_bit_stream, unsigned int i_bits ) </parameter> :
 191       The same as <function> GetBits()</function>, except that the bits
 192       aren't returned (we spare a few CPU cycles). It has the same
 193       limitations, and we also wrote <function> RemoveBits32</function>.
 194       </para> </listitem>
 195
 196       <listitem> <para> <type> u32 </type> <function> ShowBits </function>
 197       <parameter>( bit_stream_t * p_bit_stream, unsigned int i_bits )
 198       </parameter> :
 199       The same as <function> GetBits()</function>, except that the bits
 200       don't get flushed after reading, so that you need to call
 201       <function> RemoveBits() </function> by hand afterwards. Beware,
 202       this function won't work above 24 bits, except if you're aligned
 203       on a byte boundary (see next function).
 204       </para> </listitem>
 205
 206       <listitem> <para> <function> RealignBits </function> <parameter>
 207       ( bit_stream_t * p_bit_stream ) </parameter> :
 208       Drops the n higher bits (n &lt; 8), so that the first bit of
 209       the buffer be aligned an a byte boundary. It is useful when
 210       looking for an aligned startcode (MPEG for instance).
 211       </para> </listitem>
 212
 213       <listitem> <para> <function> GetChunk </function> <parameter>
 214       ( bit_stream_t * p_bit_stream, byte_t * p_buffer, size_t i_buf_len )
 215       </parameter> :
 216       It is an analog of <function> memcpy()</function>, but taking
 217       a bit stream as first argument. <parameter> p_buffer </parameter>
 218       must be allocated and at least <parameter> i_buf_len </parameter>
 219       long. It is useful to copy data you want to keep track of.
 220       </para> </listitem>
 221     </itemizedlist>
 222
 223     <para>
 224 All these functions recreate a continuous elementary stream paradigm.
 225 When the bit buffer is empty, they take the following word in the
 226 current packet. When the packet is empty, it switches to the next
 227 <type>data_packet_t</type>, or if unapplicable to the next <type>
 228 pes_packet_t</type> (see <function>
 229 p_bit_stream-&gt;pf_next_data_packet</function>). All this is
 230 completely transparent.
 231     </para>
 232
 233     <note> <title> Packet changes and alignment issues </title>
 234     <para>
 235       We have to study the conjunction of two problems. First, a
 236       <type> data_packet_t </type> can have an even number of bytes,
 237       for instance 177, so the last word will be truncated. Second,
 238       many CPU (sparc, alpha...) can only read words aligned on a
 239       word boundary (that is, 32 bits for a 32-bit word). So packet
 240       changes are a lot more complicated than you can imagine, because
 241       we have to read truncated words and get aligned.
 242     </para>
 243
 244     <para>
 245       For instance <function> GetBits() </function> will call
 246       <function> UnalignedGetBits() </function> from <filename>
 247       src/input/input_ext-dec.c</filename>. Basically it will
 248       read byte after byte until the stream gets realigned. <function>
 249       UnalignedShowBits() </function> is a bit more complicated
 250       and may require a temporary packet
 251       (<parameter>p_bit_stream-&gt;showbits_data</parameter>).
 252     </para> </note>
 253
 254     <para>
 255 To use the bit stream, you have to call <parameter>
 256 p_decoder_config-&gt;pf_init_bit_stream( bit_stream_t * p_bit_stream,
 257 decoder_fifo_t * p_fifo )</parameter> to set up all variables. You will
 258 probably need to regularly fetch specific information from the packet,
 259 for instance the PTS. If <parameter> p_bit_stream-&gt;pf_bit_stream_callback
 260 </parameter> is not <constant> NULL</constant>, it will be called
 261 on a packet change. See <filename> src/video_parser/video_parser.c
 262 </filename> for an example. The second argument
 263 indicates whether it is just a new <type>data_packet_t</type> or
 264 also a new <type>pes_packet_t</type>. You can store your own structure in
 265 <parameter> p_bit_stream-&gt;p_callback_arg</parameter>.
 266     </para>
 267
 268     <warning> <para>
 269       When you call <function>pf_init_bit_stream</function>, the
 270       <function>pf_bitstream_callback</function> is not defined yet,
 271       but it jumps to the first packet, though. You will probably
 272       want to call your bitstream callback by hand just after
 273       <function> pf_init_bit_stream</function>.
 274     </para> </warning>
 275
 276   </sect1>
 277
 278   <sect1> <title> Built-in decoders </title>
 279
 280     <para>
 281 VLC already features an MPEG layer 1 and 2 audio decoder, an MPEG MP@ML
 282 video decoder, an AC3 decoder (borrowed from LiViD), a DVD SPU decoder,
 283 and an LPCM decoder. You can write your own decoder, just mimic the
 284 video parser.
 285     </para>
 286
 287     <note> <title> Limitations in the current design </title>
 288     <para>
 289 To add a new decoder, you'll still have to add the stream type as there's
 290 still a a hard-wired piece of code in <filename> src/input/input_programs.c
 291 </filename>.
 292     </para> </note>
 293
 294     <para>
 295 The MPEG audio decoder is native, but doesn't support layer 3 decoding
 296 [too much trouble], the AC3 decoder is a port from Aaron
 297 Holtzman's libac3 (the original libac3 isn't reentrant), and the
 298 SPU decoder is native. You may want to have a look at <function>
 299 BitstreamCallback </function> in the AC3 decoder. In that case we have
 300 to jump the first 3 bytes of a PES packet, which are not part of the
 301 elementary stream. The video decoder is a bit special and will
 302 be described in the following section.
 303     </para>
 304
 305     </sect1>
 306
 307     <sect1> <title> The MPEG video decoder </title>
 308
 309       <para>
 310 VLC media player provides an MPEG-1, and an MPEG-2 Main Profile @
 311 Main Level decoder. It has been natively written for VLC, and is quite
 312 mature. Its status is a bit special, since it is splitted between two
 313 logicial entities : video parser and video decoder.
 314 The initial goal is to separate bit stream parsing functions from
 315 highly parallelizable mathematical algorithms. In theory, there can be
 316 one video parser thread (and only one, otherwise we would have race
 317 conditions reading the bit stream), along with a pool of video decoder
 318 threads, which do IDCT and motion compensation on several blocks
 319 at once.
 320       </para>
 321
 322       <para>
 323 It doesn't (and won't) support MPEG-4 or DivX decoding. It is not an
 324 encoder. It should support the whole MPEG-2 MP@ML specification, though
 325 some features are still left untested, like Differential Motion Vectors.
 326 Please bear in mind before complaining that the input elementary stream
 327 must be valid (for instance this is not the case when you directly read
 328 a DVD multi-angle .vob file).
 329       </para>
 330
 331       <para>
 332 The most interesting file is <filename> vpar_synchro.c</filename>, it is
 333 really worth the shot. It explains the whole frame dropping algorithm.
 334 In a nutshell, if the machine is powerful enough, we decoder all IPBs,
 335 otherwise we decode all IPs and Bs if we have enough time (this is
 336 based on on-the-fly decoding time statistics). Another interesting file
 337 is <filename>vpar_blocks.c</filename>, which describes all block
 338 (including coefficients and motion vectors) parsing algorithms. Look
 339 at the bottom of the file, we indeed generate one optimized function
 340 for every common picture type, and one slow generic function. There
 341 are also several levels of optimization (which makes compilation slower
 342 but certain types of files faster decoded) called <constant>
 343 VPAR_OPTIM_LEVEL</constant>, level 0 means no optimization, level 1
 344 means optimizations for MPEG-1 and MPEG-2 frame pictures, level 2
 345 means optimizations for MPEG-1 and MPEG-2 field and frame pictures.
 346       </para>
 347
 348       <sect2> <title> Motion compensation plug-ins </title>
 349
 350         <para>
 351 Motion compensation (i.e. copy of regions from a reference picture) is
 352 very platform-dependant (for instance with MMX or AltiVec versions), so
 353 we moved it to the <filename> plugins/motion </filename> directory. It
 354 is more convenient for the video decoder, and resulting plug-ins may
 355 be used by other video decoders (MPEG-4 ?). A motion plugin must
 356 define 6 functions, coming straight from the specification :
 357 <function> vdec_MotionFieldField420, vdec_MotionField16x8420,
 358 vdec_MotionFieldDMV420, vdec_MotionFrameFrame420, vdec_MotionFrameField420,
 359 vdec_MotionFrameDMV420</function>. The equivalent 4:2:2 and 4:4:4
 360 functions are unused, since these formats are forbidden in MP@ML (it
 361 would only take longer compilation time).
 362         </para>
 363
 364         <para>
 365 Look at the C version of the algorithms if you want more information.
 366 Note also that the DMV algorithm is untested and is probably buggy.
 367         </para>
 368
 369       </sect2>
 370
 371       <sect2> <title> IDCT plug-ins </title>
 372
 373         <para>
 374 Just like motion compensation, IDCT is platform-specific. So we moved it
 375 to <filename> plugins/idct</filename>. This module does the IDCT
 376 calculation, and copies the data to the final picture. You need to define
 377 seven methods :
 378         </para>
 379
 380         <itemizedlist>
 381           <listitem> <para> <function> vdec_IDCT </function> <parameter>
 382           ( decoder_config_t * p_config, dctelem_t * p_block, int )
 383           </parameter> :
 384           Does the complete 2-D IDCT. 64 coefficients are in <parameter>
 385           p_block</parameter>.
 386           </para> </listitem>
 387
 388           <listitem> <para> <function> vdec_SparseIDCT </function>
 389           <parameter> ( vdec_thread_t * p_vdec, dctelem_t * p_block,
 390           int i_sparse_pos ) </parameter> :
 391           Does an IDCT on a block with only one non-NULL coefficient
 392           (designated by <parameter> i_sparse_pos</parameter>). You can
 393           use the function defined in <filename> plugins/idct/idct_common.c
 394           </filename> which precalculates these 64 matrices at
 395           initialization time.
 396           </para> </listitem>
 397
 398           <listitem> <para> <function> vdec_InitIDCT </function>
 399           <parameter> ( vdec_thread_t * p_vdec ) </parameter> :
 400           Does the initialization stuff needed by <function>
 401           vdec_SparseIDCT</function>.
 402           </para> </listitem>
 403
 404           <listitem> <para> <function> vdec_NormScan </function>
 405           <parameter> ( u8 ppi_scan[2][64] ) </parameter> :
 406           Normally, this function does nothing. For minor optimizations,
 407           some IDCT (MMX) need to invert certain coefficients in the
 408           MPEG scan matrices (see ISO/IEC 13818-2).
 409           </para> </listitem>
 410
 411           <listitem> <para> <function> vdec_InitDecode </function>
 412           <parameter> ( struct vdec_thread_s * p_vdec ) </parameter> :
 413           Initializes the IDCT and optional crop tables.
 414           </para> </listitem>
 415
 416           <listitem> <para> <function> vdec_DecodeMacroblockC </function>
 417           <parameter> ( struct vdec_thread_s *p_vdec,
 418           struct macroblock_s * p_mb ); </parameter> :
 419           Decodes an entire macroblock and copies its data to the final
 420           picture, including chromatic information.
 421           </para> </listitem>
 422
 423           <listitem> <para> <function> vdec_DecodeMacroblockBW </function>
 424           <parameter> ( struct vdec_thread_s *p_vdec,
 425           struct macroblock_s * p_mb ); </parameter> :
 426           Decodes an entire macroblock and copies its data to the final
 427           picture, except chromatic information (used in grayscale mode).
 428           </para> </listitem>
 429         </itemizedlist>
 430
 431         <para>
 432 Currently we have implemented optimized versions for : MMX, MMXEXT, and
 433 AltiVec [doesn't work]. We have two plain C versions, the normal
 434 (supposedly optimized) Berkeley version (<filename>idct.c</filename>),
 435 and the simple 1-D separation IDCT from the ISO reference decoder
 436 (<filename>idctclassic.c</filename>).
 437         </para>
 438
 439       </sect2>
 440
 441       <sect2> <title> Symmetrical Multiprocessing </title>
 442
 443         <para>
 444 The MPEG video decoder of VLC can take advantage of several processors if
 445 necessary. The idea is to launch a pool of decoders, which will do
 446 IDCT/motion compensation on several macroblocks at once.
 447         </para>
 448
 449         <para>
 450 The functions managing the pool are in <filename>
 451 src/video_decoder/vpar_pool.c</filename>. Its use on non-SMP machines is
 452 not recommanded, since it is actually slower than the monothread version.
 453 Even on SMP machines sometimes...
 454         </para>
 455
 456       </sect2>
 457
 458   </sect1>
 459
 460 </chapter>