Seemingly holding queued_data_mutex over add_data_raw(), which does writev(),
could be slow on systems where /tmp is not on tmpfs, causing the queued_data_mutex
to be held for so long (up to a second has been observed) that the input thread
couldn't keep up.
To fix this, we move queued_data_mutex into a per-stream variable (not sure
if it's ideal, but it was the simplest way to avoid ugliness), and then hold it
for as short as possible in process_queued_data().
While we're at it, document that queued_data_last_starting_point has the same
locking rules as queued_data.