It turns out I've been misunderstanding parts of Fons' paper;
my estimation is different, and although it works surprisingly well
for something that's hardly supposed to work at all, it has some
significant problems with edge cases like frame rates being _nearly_
off (e.g. 59.94 Hz input on a 60 Hz output); the estimated delay
under the old algorithm will be a very slow sawtooth, which isn't
nice even after being passed through the filter.
The new algorithm probably still isn't 100% identical to zita-ajbridge,
but it should be much closer to how the algorithm is intended to work.
In particular, it makes a real try to understand that an output frame
can arrive between two input frames in time; this makes it dependent
on the system clock, but that's really the core that was missing from
the algorithm, so it's really more a feature than a bug.
I've made some real attempts at making all the received timestamps
more stable; FakeCapture is a bit odd still (especially at startup)
since it has its thing of just doing frames late instead of dropping
them, but it generally seems to work OK. For cases of frame rate
mismatch (even pretty benign ones), the correction rate seems to be
two orders of magnitude more stable, i.e., the maximum difference
from 1.0 during normal operation is greatly reduced.