git.sesse.net Git - vlc/blob - doc/developer/input.xml

   1 <chapter> <title> The complex multi-layer input </title>
   2
   3   <para>
   4 The idea behind the input module is to treat packets, without knowing
   5 at all what is in it. It only takes a packet,
   6 reads its ID, and delivers it to the decoder at the right time
   7 indicated in the packet header (SCR and PCR fields in MPEG).
   8 All the basic browsing operations are implemented without peeking at the
   9 content of the elementary stream.
  10   </para>
  11
  12   <para>
  13 Thus it remains very generic. This also means you can't do stuff like
  14 "play 3 frames now" or "move forward 10 frames" or "play as fast as you
  15 can but play all frames". It doesn't even know what a "frame" is. There
  16 is no privileged elementary stream, like the video one could be (for
  17 the simple reason that, according to MPEG, a stream may contain
  18 several video ES).
  19   </para>
  20
  21   <sect1> <title> What happens to a file </title>
  22
  23     <para>
  24 An input thread is spawned for every file read. Indeed, input structures
  25 and decoders need to be reinitialized because the specificities of
  26 the stream may be different. <function> input_CreateThread </function>
  27 is called by the interface thread (playlist module).
  28     </para>
  29
  30     <para>
  31 At first, an input plug-in capable of reading the plugin item is looked
  32 for [this is inappropriate : we should first open the socket,
  33 and then probe the beginning of the stream to see which plug-in can read
  34 it]. The socket is opened by either <function> input_FileOpen</function>,
  35 <function> input_NetworkOpen</function>, or <function>
  36 input_DvdOpen</function>. This function sets two very important parameters :
  37 <parameter> b_pace_control </parameter> and <parameter> b_seekable
  38 </parameter> (see next section).
  39     </para>
  40
  41     <note> <para>
  42       We could use so-called "access" plugins for this whole mechanism
  43       of opening the input socket. This is not the case because we
  44       thought only those three methods were to be used at present,
  45       and if we need others we can still build them in.
  46     </para> </note>
  47
  48     <para>
  49 Now we can launch the input plugin's <function> pf_init </function>
  50 function, and an endless loop doing <function> pf_read </function>
  51 and <function> pf_demux</function>. The plugin is responsible
  52 for initializing the stream structures
  53 (<parameter>p_input-&gt;stream</parameter>), managing packet buffers,
  54 reading packets and demultiplex them. But in most tasks it will
  55 be assisted by functions from the advanced input API (c). That is
  56 what we will study in the coming sections !
  57     </para>
  58
  59   </sect1>
  60
  61   <sect1> <title> Stream Management </title>
  62
  63     <para>
  64 The function which has opened the input socket must specify two
  65 properties about it :
  66     </para>
  67
  68     <orderedlist>
  69       <listitem> <para> <emphasis> p_input-&gt;stream.b_pace_control
  70       </emphasis> : Whether or not the stream can be read at our own
  71       pace (determined by the stream's frequency and
  72       the host computer's system clock). For instance a file or a pipe
  73       (including TCP/IP connections) can be read at our pace, if we don't
  74       read fast enough, the other end of the pipe will just block on a
  75       <function> write() </function> operation. On the contrary, UDP
  76       streaming (such as the one used by VideoLAN Server) is done at
  77       the server's pace, and if we don't read fast enough, packets will
  78       simply be lost when the kernel's buffer is full. So the drift
  79       introduced by the server's clock must be regularly compensated.
  80       This property controls the clock management, and whether
  81       or not fast forward and slow motion can be done.</para>
  82
  83       <note> <title> Subtilities in the clock management </title> <para>
  84       With a UDP socket and a distant server, the drift is not
  85       negligible because on a whole movie it can account for
  86       seconds if one of the clocks is slightly fucked up. That means
  87       that presentation dates given by the input thread may be
  88       out of sync, to some extent, with the frequencies given in
  89       every Elementary Stream. Output threads (and, anecdotically,
  90       decoder threads) must deal with it. </para>
  91
  92       <para> The same kind of problems may happen when reading from
  93       a device (like video4linux's <filename> /dev/video </filename>)
  94       connected for instance to a video encoding board.
  95       There is no way we could differentiate
  96       it from a simple <command> cat foo.mpg | vlc - </command>, which
  97       doesn't imply any clock problem. So the Right Thing (c) would be
  98       to ask the user about the value of <parameter> b_pace_control
  99       </parameter>, but nobody would understand what it means (you are
 100       not the dumbest person on Earth, and obviously you have read this
 101       paragraph several times to understand it :-). Anyway,
 102       the drift should be negligible since the board would share the
 103       same clock as the CPU, so we chose to neglect it. </para> </note>
 104       </listitem>
 105
 106       <listitem> <para> <emphasis> p_input-&gt;stream.b_seekable
 107       </emphasis> : Whether we can do <function> lseek() </function>
 108       calls on the file descriptor or not. Basically whether we can
 109       jump anywhere in the stream (and thus display a scrollbar) or
 110       if we can only read one byte after the other. This has less impact
 111       on the stream management than the previous item, but it
 112       is not redundant, because for instance
 113       <command> cat foo.mpg | vlc - </command> is b_pace_control = 1
 114       but b_seekable = 0. On the contrary, you cannot have
 115       b_pace_control = 0 along with b_seekable = 1. If a stream is seekable,
 116       <parameter> p_input-&gt;stream.p_selected_area-&gt;i_size </parameter>
 117       must be set (in an arbitrary unit, for instance bytes, but it
 118       must be the same as p_input-&gt;i_tell which indicates the byte
 119       we are currently reading from the stream).</para>
 120
 121       <note> <title> Offset to time conversions </title> <para>
 122       Functions managing clocks are located in <filename>
 123       src/input/input_clock.c</filename>. All we know about a file
 124       is its start offset and its end offset
 125       (<parameter>p_input-&gt;stream.p_selected_area-&gt;i_size</parameter>),
 126       currently in bytes, but it could be plugin-dependant. So
 127       how the hell can we display in the interface a time in seconds ?
 128       Well, we cheat. PS streams have a <parameter> mux_rate </parameter>
 129       property which indicates how many bytes we should read in
 130       a second. This is subject to change at any time, but practically
 131       it is a constant for all streams we know. So we use it to
 132       determine time offsets. </para> </note> </listitem>
 133     </orderedlist>
 134
 135   </sect1>
 136
 137   <sect1> <title> Structures exported to the interface </title>
 138
 139     <para>
 140 Let's focus on the communication API between the input module and the
 141 interface. The most important file is <filename> include/input_ext-intf.h,
 142 </filename> which you should know almost by heart. This file defines
 143 the input_thread_t structure, the stream_descriptor_t and all programs
 144 and ES descriptors included (you can view it as a tree).
 145     </para>
 146
 147     <para>
 148 First, note that the input_thread_t structure features two <type> void *
 149 </type> pointers, <parameter> p_method_data </parameter> and <parameter>
 150 p_plugin_data</parameter>, which you can respectivly use for buffer
 151 management data and plugin data.
 152     </para>
 153
 154     <para>
 155 Second, a stream description is stored in a tree featuring program
 156 descriptors, which themselves contain several elementary stream
 157 descriptors. For those of you who don't know all MPEG concepts, an
 158 elementary stream, aka ES, is a continuous stream of video or
 159 (exclusive) audio data, directly readable by a decoder, without
 160 decapsulation.
 161     </para>
 162
 163     <para>
 164 This tree structure is illustrated by the following
 165 figure, where one stream holds two programs.
 166 In most cases there will only be one program (to my
 167 knowledge only TS streams can carry several programs, for instance
 168 a movie and a football game at the same time - this is adequate
 169 for satellite and cable broadcasting).
 170     </para>
 171
 172     <mediaobject>
 173       <imageobject>
 174         <imagedata fileref="stream.png" format="PNG" scalefit="1" scale="80"/>
 175       </imageobject>
 176       <imageobject>
 177         <imagedata fileref="stream.gif" format="GIF" />
 178       </imageobject>
 179       <textobject>
 180         <phrase> The program tree </phrase>
 181       </textobject>
 182       <caption>
 183         <para> <emphasis> p_input-&gt;stream </emphasis> :
 184         The stream, programs and elementary streams can be viewed as a tree.
 185         </para>
 186       </caption>
 187     </mediaobject>
 188
 189     <warning> <para>
 190     For all modifications and accesses to the <parameter>p_input-&gt;stream
 191     </parameter> structure, you <emphasis>must</emphasis> hold
 192     the p_input-&gt;stream.stream_lock.
 193     </para> </warning>
 194
 195     <para>
 196 ES are described by an ID (the ID the appropriate demultiplexer will
 197 look for), a <parameter> stream_id </parameter> (the real MPEG stream
 198 ID), a type (defined
 199 in ISO/IEC 13818-1 table 2-29) and a litteral description. It also
 200 contains context information for the demultiplexer, and decoder
 201 information <parameter> p_decoder_fifo </parameter> we will talk
 202 about in the next chapter. If the stream you want to read is not an
 203 MPEG system layer (for instance AVI or RTP), a specific demultiplexer
 204 will have to be written. In that case, if you need to carry additional
 205 information, you can use <type> void * </type> <parameter> p_demux_data
 206 </parameter> at your convenience. It will be automatically freed on
 207 shutdown.
 208     </para>
 209
 210     <note> <title> Why ID and not use the plain MPEG <parameter>
 211     stream_id </parameter> ? </title> <para>
 212     When a packet (be it a TS packet, PS packet, or whatever) is read,
 213     the appropriate demultiplexer will look for an ID in the packet, find the
 214     relevant elementary stream, and demultiplex it if the user selected it.
 215     In case of TS packets, the only information we have is the
 216     ES PID, so the reference ID we keep is the PID. PID don't exist
 217     in PS streams, so we have to invent one. It is of course based on
 218     the <parameter> stream_id </parameter> found in all PS packets,
 219     but it is not enough, since private streams (ie. AC3, SPU and
 220     LPCM) all share the same <parameter> stream_id </parameter>
 221     (<constant>0xBD</constant>). In that case the first byte of the
 222     PES payload is a stream private ID, so we combine this with
 223     the stream_id to get our ID (if you did not understand everything,
 224     it isn't very important - just remember we used our brains
 225     before writing the code :-).
 226     </para> </note>
 227
 228     <para>
 229 The stream, program and ES structures are filled in by the plugin's
 230 <function> pf_init()
 231 </function> using functions in <filename> src/input/input_programs.c,
 232 </filename> but are subject to change at any time. The DVD plugin
 233 parses .ifo files to know which ES are in the stream; the TS plugin
 234 reads the PAT and PMT structures in the stream; the PS plugin can
 235 either parse the PSM structure (but it is rarely present), or build
 236 the tree "on the fly" by pre-parsing the first megabyte of data.
 237     </para>
 238
 239     <warning> <para>
 240 In most cases we need to pre-parse (that is, read the first MB of data,
 241 and go back to the beginning) a PS stream, because the PSM (Program
 242 Stream Map) structure is almost never present. This is not appropriate,
 243 though, but we don't have the choice. A few problems will arise. First,
 244 non-seekable streams cannot be pre-parsed, so the ES tree will be
 245 built on the fly. Second, if a new elementary stream starts after the
 246 first MB of data (for instance a subtitle track won't show up
 247 during the credits), it won't appear in the menu before we encounter
 248 the first packet. We cannot pre-parse the entire stream because it
 249 would take hours (even without decoding it).
 250     </para> </warning>
 251
 252     <para>
 253 It is currently the responsibility of the input plugin to spawn the necessary
 254 decoder threads. It must call <function> input_SelectES </function>
 255 <parameter>( input_thread_t * p_input, es_descriptor_t * p_es )
 256 </parameter> on the selected ES.
 257     </para>
 258
 259     <para>
 260 The stream descriptor also contains a list of areas. Areas are logical
 261 discontinuities in the stream, for instance chapters and titles in a
 262 DVD. There is only one area in TS and PS streams, though we could
 263 use them when the PSM (or PAT/PMT) version changes. The goal is that
 264 when you seek to another area, the input plugin loads the new stream
 265 descriptor tree (otherwise the selected ID may be wrong).
 266     </para>
 267
 268   </sect1>
 269
 270   <sect1> <title> Methods used by the interface </title>
 271
 272     <para>
 273 Besides, <filename> input_ext-intf.c </filename>provides a few functions
 274 to control the reading of the stream :
 275     </para>
 276
 277     <itemizedlist>
 278       <listitem> <para> <function> input_SetStatus </function>
 279       <parameter> ( input_thread_t * p_input, int i_mode ) </parameter> :
 280       Changes the pace of reading. <parameter> i_mode </parameter> can
 281       be one of <constant> INPUT_STATUS_END, INPUT_STATUS_PLAY,
 282       INPUT_STATUS_PAUSE, INPUT_STATUS_FASTER, INPUT_STATUS_SLOWER.
 283       </constant> </para>
 284
 285         <note> <para> Internally, the pace of reading is determined
 286         by the variable <parameter>
 287         p_input-&gt;stream.control.i_rate</parameter>. The default
 288         value is <constant> DEFAULT_RATE</constant>. The lower the
 289         value, the faster the pace is. Rate changes are taken into account
 290         in <function> input_ClockManageRef</function>. Pause is
 291         accomplished by simply stopping the input thread (it is
 292         then awaken by a pthread signal). In that case, decoders
 293         will be stopped too. Please remember this if you do statistics
 294         on decoding times (like <filename> src/video_parser/vpar_synchro.c
 295         </filename> does). Don't call this function if <parameter>
 296         p_input-&gt;b_pace_control </parameter> == 0.</para> </note>
 297       </listitem>
 298
 299       <listitem> <para> <function> input_Seek </function> <parameter>
 300       ( input_thread_t * p_input, off_t i_position ) </parameter> :
 301       Changes the offset of reading. Used to jump to another place in a
 302       file. You <emphasis>mustn't</emphasis> call this function if
 303       <parameter> p_input-&gt;stream.b_seekable </parameter> == 0.
 304       The position is a number (usually long long, depends on your
 305       libc) between <parameter>p_input-&gt;p_selected_area-&gt;i_start
 306       </parameter> and <parameter>p_input-&gt;p_selected_area-&gt;i_size
 307       </parameter> (current value is in <parameter>
 308       p_input-&gt;p_selected_area-&gt;i_tell</parameter>). </para>
 309
 310         <note> <para> Multimedia files can be very large, especially
 311         when we read a device like <filename> /dev/dvd</filename>, so
 312         offsets must be 64 bits large. Under a lot of systems, like
 313         FreeBSD, off_t are 64 bits by default, but it is not the
 314         case under GNU libc 2.x. That is why we need to compile VLC
 315         with -D_FILE_OFFSET_BITS=64 -D__USE_UNIX98. </para> </note>
 316
 317         <note> <title> Escaping stream discontinuities </title>
 318         <para>
 319           Changing the reading position at random can result in a
 320           messed up stream, and the decoder which reads it may
 321           segfault. To avoid this, we send several NULL packets
 322           (ie. packets containing nothing but zeros) before changing
 323           the reading position. Indeed, under most video and audio
 324           formats, a long enough stream of zeros is an escape sequence
 325           and the decoder can exit cleanly.
 326         </para> </note>
 327       </listitem>
 328
 329       <listitem> <para> <function> input_OffsetToTime </function>
 330       <parameter> ( input_thread_t * p_input, char * psz_buffer,
 331       off_t i_offset ) </parameter> : Converts an offset value to
 332       a time coordinate (used for interface display).
 333       [currently it is broken with MPEG-2 files]
 334       </para> </listitem>
 335
 336       <listitem> <para> <function> input_ChangeES </function>
 337       <parameter> ( input_thread_t * p_input, es_descriptor_t * p_es,
 338       u8 i_cat ) </parameter> : Unselects all elementary streams of
 339       type <parameter> i_cat </parameter> and selects <parameter>
 340       p_es</parameter>. Used for instance to change language or
 341       subtitle track.
 342       </para> </listitem>
 343
 344       <listitem> <para> <function> input_ToggleES </function>
 345       <parameter> ( input_thread_t * p_input, es_descriptor_t * p_es,
 346       boolean_t b_select ) </parameter> : This is the clean way to
 347       select or unselect a particular elementary stream from the
 348       interface.
 349       </para> </listitem>
 350     </itemizedlist>
 351
 352   </sect1>
 353
 354   <sect1 id="input_buff"> <title> Buffers management </title>
 355
 356     <para>
 357 Input plugins must implement a way to allocate and deallocate packets
 358 (whose structures will be described in the next chapter). We
 359 basically need four functions :
 360     </para>
 361
 362     <itemizedlist>
 363       <listitem> <para> <function> pf_new_packet </function>
 364       <parameter> ( void * p_private_data, size_t i_buffer_size )
 365       </parameter> :
 366       Allocates a new <type> data_packet_t </type> and an associated
 367       buffer of i_buffer_size bytes.
 368       </para> </listitem>
 369
 370       <listitem> <para> <function> pf_new_pes </function>
 371       <parameter> ( void * p_private_data ) </parameter> :
 372       Allocates a new <type> pes_packet_t</type>.
 373       </para> </listitem>
 374
 375       <listitem> <para> <function> pf_delete_packet </function>
 376       <parameter> ( void * p_private_data, data_packet_t * p_data )
 377       </parameter>&nbsp;:
 378       Deallocates <parameter> p_data</parameter>.
 379       </para> </listitem>
 380
 381       <listitem> <para> <function> pf_delete_pes </function>
 382       <parameter> ( void * p_private_data, pes_packet_t * p_pes )
 383       </parameter> :
 384       Deallocates <parameter> p_pes</parameter>.
 385       </para> </listitem>
 386     </itemizedlist>
 387
 388     <para>
 389 All functions are given <parameter> p_input-&gt;p_method_data </parameter>
 390 as first parameter, so that you can keep records of allocated and freed
 391 packets.
 392     </para>
 393
 394     <note> <title> Buffers management strategies </title>
 395       <para> Buffers management can be done in three ways : </para>
 396
 397       <orderedlist>
 398         <listitem> <para> <emphasis> Traditional libc allocation </emphasis> :
 399           For a long time we have used in the PS plugin
 400           <function> malloc()
 401           </function> and <function> free() </function> every time
 402           we needed to allocate or deallocate a packet. Contrary
 403           to a popular belief, it is not <emphasis>that</emphasis>
 404           slow.
 405         </para> </listitem>
 406
 407         <listitem> <para> <emphasis> Netlist </emphasis> :
 408           In this method we allocate a very big buffer at the
 409           beginning of the problem, and then manage a list of pointers
 410           to free packets (the "netlist"). This only works well if
 411           all packets have the same size. It is used for long for
 412           the TS input. The DVD plugin also uses it, but adds a
 413           <emphasis> refcount </emphasis> flag because buffers (2048
 414           bytes) can be shared among several packets. It is now
 415           deprecated and won't be documented.
 416         </para> </listitem>
 417
 418         <listitem> <para> <emphasis> Buffer cache </emphasis> :
 419           We are currently developing a new method. It is
 420           already in use in the PS plugin. The idea is to call
 421           <function> malloc() </function> and <function> free()
 422           </function> to absorb stream irregularities, but re-use
 423           all allocated buffers via a cache system. We are
 424           extending it so that it can be used in any plugin without
 425           performance hit, but it is currently left undocumented.
 426         </para> </listitem>
 427       </orderedlist>
 428     </note>
 429   </sect1>
 430
 431   <sect1> <title> Demultiplexing the stream </title>
 432
 433     <para>
 434 After being read by <function> pf_read </function>, your plugin must
 435 give a function pointer to the demultiplexer function. The demultiplexer
 436 is responsible for parsing the packet, gathering PES, and feeding decoders.
 437     </para>
 438
 439     <para>
 440 Demultiplexers for standard MPEG structures (PS and TS) have already
 441 been written. You just need to indicate <function> input_DemuxPS
 442 </function> and <function> input_DemuxTS </function> for <function>
 443 pf_demux</function>. You can also write your own demultiplexer.
 444     </para>
 445
 446     <para>
 447 It is not the purpose of this document to describe the different levels
 448 of encapsulation in an MPEG stream. Please refer to your MPEG specification
 449 for that.
 450     </para>
 451
 452   </sect1>
 453
 454 </chapter>