Display a copy of the Y'CbCr images instead of an RGB565 copy.
This is both higher-quality (the 16-bit artifacts were getting rather
annoying), more true to what's actually being output, _and_ higher performance
(well, at least lower memory bandwidth; I haven't benchmarked in practice),
since we can use multi-output to make extra copies on-the-fly when writing
instead of doing it explicitly. Sample calculation for a 1280x720 image; let's
say it is one megapixel for ease of calculation:
GL_565: 2 MB written (565 texture), 2 MB read during display = 4 MB used
Y'CbCr: 1.0 + 0.5 MB written (Y' texture plus half-res dual-channel CbCr texture),
same amount read during display = 3 MB used
We could have reused the full-resolution CbCr texture, saving the 0.5 MB
write, but that make the readback 3 MB instead of 1.5 MB, so it's a net loss.
Ideally, we'd avoid the copies altogether, cutting the writes away
and getting to 1.5 MB, but interactions with VA-API zerocopy seemingly
made that impossible.