Besides the obvious of spending less time copying, this has two positive effects:
- The VA-API thread is no longer a choke point; uploading can happen from
multiple cores.
- With one copy less, we seem to be reducing L3 cache pressure a bit;
at some point between five and six 1080p sources, we “fall off a cliff”
wrt. the L3 and start thrashing. This doesn't fix the issue, but alleviates
it somewhat.
All in all, we seem to go down from ~2.6 to ~2.1–2.2 cores used with one
720p channel and five 1080p channels. I haven't tried saturating channels
yet to see how many we can actually encode.