Discussion:
Precision Hardware Clocks
(too old to reply)
Josef 'Jeff' Sipek
2024-05-09 23:31:34 UTC
Permalink
Hello all,

I've been playing with the idea of extending the kernel to expose various
clock sources to userspace via a character devices. (Yesterday's thread
about the OCP TAP Time Card nudged me to send this out sooner than I
planned. :) )


The code is *very* hacky and full of TODOs & FIXMEs, but I thought I'd share
it now.

What I'm calling a 'precision hardware clock' (PHC for short) is
conceptually some piece of hardware which can provide the consumer a sense
of time passing. Roughly speaking, there are two types of precision
hardware clocks - those that return the current time using some defined
timescale (e.g., kvmclock) and those that are simple oscillators with
counters (e.g., many e1000e devices). My aim is to support both.

My initial goal is to provide a *read-only* access to PHCs as this is
sufficient to make use of them for stabilizing the system clock. That is,
an application can only query them for the current time. Eventually, I
think it'd make sense to allow *setting* PHCs as well.

The devices that return the current time are fairly straight forward to work
with. The ioctl simply calls a device specific method and forwards the
result to the caller.

The counter-type devices are more complicated to support. In my code I took
the approach that's very similar to the timecounter code in the kernel. My
first attempt actually tried to extend timecounters but that resulted in a
lot of additional computation being done in hardclock regardless of whether
or not the additional clocks were in use. That didn't feel right. [1]

My current code borrows the timecounter idea (and some code) of extending
the hardware counter in software. The overflow check is done via a
per-devices callout that's scheduled based for an interval based on the
oscillator's frequency and the counter's width. (For debugging, I cap it at
10s max interval.)

Regardless of which type of PHC it is, the ioctl caller gets what amounts to
a <system clock, PHC clock> reading. Ideally, the two correspond to the
same instant, but there may be some error due to hardware limitations. [2]

Because there is a lot of hardware that doesn't provide a way to capture
these correlated timestamps, a "capture many readings" ioctl is a useful
addition. This ioctl returns a set of interleaved PHC and system clock
readings, which lets the application (e.g., chrony) do the appropriate
filtering to remove noise.


In addition to adding the PHC code to core kernel, I hacked up the if_em
driver to start the 25MHz timekeeping counters on 82574 devices and register
with the PHC code. Finally, I hacked up chrony's PHC refclock driver to
make use of the "get timestamp pair" ioctl.

I ran this code on my test box with two 82574 NICs with both registered as
chrony refclocks [3] for a while. Unsurprisingly, the 82574 oscillators are
not that accurate but they are reasonably stable. (I posted histograms and
allan deviation plots on mastodon [4]. Since the system's oscillator is in
no way special, it is a bit silly to read too much into the graphs.
However, I'd argue that it still shows that the 82574 refclocks were
reasonably good and would likely help in real world scenarios [5].)

You can find my patches can be found at:

https://www.josefsipek.net/freebsd/phc-v1/

There are 3 patches:

1. chrony.patch modifies chronyd to use the PHC ioctls
2. fbsd-phc.patch adds the generic PHC code
3. fbsd-em.patch modifies if_em to register 82574 timekeeping counter with PHC

In addition to cleaning up and generally improving the existing patches, I
hope to implement the bit of code that wires up KVM's KVM_HC_CLOCK_PAIRING
hypercall as a PHC. While 82574 provides a counter-type PHC, this kvm PHC
would be the absolute time-type PHC. Support for kvm PHC would allow
FreeBSD guests to sync *very* accurately to host's system clock.

I also have an incomplete patch that adds support for clock_gettime(3) using
PHC fds as clockid_t values, but since it isn't complete I'll keep it to
myself for now :)


So, that's what I've been up to. As I said in the beginning, I wanted to
get more of this done, but I think it makes sense for me to let others know
about my code now.

I plan to continue hacking away on this, but if people have opinions about
any of this, I'd love to hear them. It really pains me that there is so
much duplication between the PHC and timecounter code, but the current
tc_windup code runs in a rather special context (hardclock) and having it
process *all* devices regardless of use would increase its runtime quite a
bit. I've been thinking about trying to move some of the timecounter and
PHC code into a generic set of helpers or try to reorganize kern_tc.c to
fold the PHC login into it sanely, but that's currently very far down the
todo list.


To summarize, the goals/non-goals for this work are:

Goals:
* read-only interface to various precision hardware clocks (PHCs)
* support for both absolute time and counter-only PHCs
* ability to use software like chrony to stabilize system clocks

Non-goals/future work:
* adjusting PHCs
* support for cross-timestamping techniques (like Intel's ART)
* support for if_em PTP packet timestamping
* external pin timestamping support

Thanks for reading this far. Let me know if you have any questions,
suggestions, etc.

Jeff.

[1] I actually ran for about a week with a e1000e card in my box providing
timekeeping by selecting it via the kern.timecounter sysctls. It worked
and was quite amusing to see, but the additional complexity in tc_windup
made it unworkable.
[2] At some point, Intel added the Always Running Timer (ART) which can be
used by devices to get timestamps that are easily convertible to TSC
readings. Support for this is part of future work.
[3] The chrony config was the following. I ran chronyd with the -x flag to
prevent it from trying to set the clock. The system clock was
disciplined with ptp2d, which was syncing to ptp2d running on the same
server that chrony used for NTP. Note that the refclocks are marked as
'pps local', meaning that they are to be used only as a frequency
source. ('pps' means that the refclock isn't reporting UTC, and 'local'
means that the clock isn't aligned to UTC seconds)

server <server> iburst minpoll 0 maxpoll 4 xleave

refclock PHC /dev/phc-em0 refid EM0 pps local
refclock PHC /dev/phc-em1 refid EM1 pps local

logdir /tmp log measurements statistics tracking refclocks selection rtc
logbanner 0
[4] https://mastodon.radio/@jeffpc/112230743393202103
[5] A huge problem with NTP is that it suffers greatly from any network
latency jitter and asymmetrical routing. Having a stable reference
clock (even if the stability is short-term only) helps NTP software
quite a bit.


--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Poul-Henning Kamp
2024-05-10 05:37:50 UTC
Permalink
Post by Josef 'Jeff' Sipek
* read-only interface to various precision hardware clocks (PHCs)
* support for both absolute time and counter-only PHCs
* ability to use software like chrony to stabilize system clocks
I should and will be the last to discourage anybody from having fun
with timekeeping.

But I do feel a responsibility to point out, that you are trying
to solve already solved problems, in a way that does not work nearly
as well as those solutions.

Chesterton's fence: Before you throw it out or bypass it, you
should find out why the current timekeeping infrastructure is built
like it is.

Back in the mists of time, before even I got involved in it, NTPD
did more or less exactly what you propose, because there were no
kernel support for timekeeping, only for adding device drivers, and
it did not work then, and it wont work much better today, for
fundamental and inescapable reasons.

For starters, exposing the hardware count though a char-dev is going
to be very jittery (= time-noise). The "userland->kernel->userland"
context switches are very unpredictable timewise, because it is
anyones guess how many memory operations it will take, even in the
best case. Worse, there is a high risk that you loose the CPU to
another (kernel)thread which is going to /really/ introduce jitter.

That is why the PPS-API, timecounters and kernel_pll exists: To
keep the "real-time" aspect of the timekeeping firmly inside the
kernel and undisturbed by userland and lower priority kernel
activities.

Unless you can expose the hardware directly to userland, via mmap(2),
timekeeping in userland is simply not going to perform.

With that said, a lot of our timekeeping is stuff I wrote 25 years
old, and it is absolutely due for both a rethink and a refresh, but
if you decide to throw it all out and start from fresh, you will
not get to the interesting parts for years.

So before you continue, at the very least, read this:

https://papers.freebsd.org/2002/phk-timecounters/

And you should think a LOT about page 91 in this one too:

https://www.am1.us/wp-content/uploads/Documents/U11625_VIG-TUTORIAL.pdf

(The other 307 pages are also interesting :-)

Poul-Henning
FreeBSD TimeLord (retired)
--
Poul-Henning Kamp | UNIX since Zilog Zeus 3.20
***@FreeBSD.ORG | TCP/IP since RFC 956
FreeBSD committer | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.


--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Bob Bishop
2024-05-10 14:35:28 UTC
Permalink
Hi,
Post by Poul-Henning Kamp
Post by Josef 'Jeff' Sipek
* read-only interface to various precision hardware clocks (PHCs)
* support for both absolute time and counter-only PHCs
* ability to use software like chrony to stabilize system clocks
I should and will be the last to discourage anybody from having fun
with timekeeping.
But I do feel a responsibility to point out, that you are trying
to solve already solved problems, in a way that does not work nearly
as well as those solutions.
Chesterton's fence: Before you throw it out or bypass it, you
should find out why the current timekeeping infrastructure is built
like it is.
Back in the mists of time, before even I got involved in it, NTPD
did more or less exactly what you propose, because there were no
kernel support for timekeeping, only for adding device drivers, and
it did not work then, and it wont work much better today, for
fundamental and inescapable reasons.
For starters, exposing the hardware count though a char-dev is going
to be very jittery (= time-noise). The "userland->kernel->userland"
context switches are very unpredictable timewise, because it is
anyones guess how many memory operations it will take, even in the
best case. Worse, there is a high risk that you loose the CPU to
another (kernel)thread which is going to /really/ introduce jitter.
I can second this. Having in the past tried to do time-sensitive machine control that way, the jitter was too bad to maintain a few tens of milliseconds. You might do an order of magnitude better today but it’s still nowhere near good enough for modern timekeeping.
Post by Poul-Henning Kamp
That is why the PPS-API, timecounters and kernel_pll exists: To
keep the "real-time" aspect of the timekeeping firmly inside the
kernel and undisturbed by userland and lower priority kernel
activities.
Unless you can expose the hardware directly to userland, via mmap(2),
timekeeping in userland is simply not going to perform.
With that said, a lot of our timekeeping is stuff I wrote 25 years
old, and it is absolutely due for both a rethink and a refresh, but
if you decide to throw it all out and start from fresh, you will
not get to the interesting parts for years.
https://papers.freebsd.org/2002/phk-timecounters/
https://www.am1.us/wp-content/uploads/Documents/U11625_VIG-TUTORIAL.pdf
(The other 307 pages are also interesting :-)
Poul-Henning
FreeBSD TimeLord (retired)
--
Poul-Henning Kamp | UNIX since Zilog Zeus 3.20
FreeBSD committer | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.
--
Bob Bishop
***@gid.co.uk






--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Josef 'Jeff' Sipek
2024-05-10 15:00:18 UTC
Permalink
Post by Poul-Henning Kamp
Post by Josef 'Jeff' Sipek
* read-only interface to various precision hardware clocks (PHCs)
* support for both absolute time and counter-only PHCs
* ability to use software like chrony to stabilize system clocks
I should and will be the last to discourage anybody from having fun
with timekeeping.
:)
Post by Poul-Henning Kamp
But I do feel a responsibility to point out, that you are trying
to solve already solved problems, in a way that does not work nearly
as well as those solutions.
I disagree. I'm sure there are ways of improving my approach, but I don't
think this is a completely solved problem.

There are a *lot* systems out there that currently have only NTP for sync,
they would benefit from a PPS but adding an actual PPS source is not
possible. Many of the same systems have additional oscillators with counters
that can be used as reference clocks.

(I'm using 82574 e1000e cards for my experiments since I have a number of
them and they have a 25MHz clock & counter. There are other sources which
are better, but I don't have any on hand.)

Thorough benchmarking is necessary, of course.
Post by Poul-Henning Kamp
Chesterton's fence: Before you throw it out or bypass it, you
should find out why the current timekeeping infrastructure is built
like it is.
100% agreed. What I have shared is a very work-in-progress
proof-of-concept. As I said before, I don't like the code duplication, etc.

If you have any pointers to docs with that context, please let me know.
I've read what I could find (including timecounters & nanokernel), but there
are probably more.

..
Post by Poul-Henning Kamp
For starters, exposing the hardware count though a char-dev is going
to be very jittery (= time-noise). The "userland->kernel->userland"
context switches are very unpredictable timewise, because it is
anyones guess how many memory operations it will take, even in the
best case. Worse, there is a high risk that you loose the CPU to
another (kernel)thread which is going to /really/ introduce jitter.
I'm not sure that matters. The PHC ioctl used by my hacked-up chrony
returns <PHC, system> time pair. So, from userspace's perspective, it
doesn't matter how long the ioctl took to execute since the result tells the
application the CLOCK_REALTIME timestamp of the PHC reading. Regardless of
how long it took to get the timestamp pair, the application knows exactly
how long ago (according to the system clock) the PHC reading was made and it
can apply the various filtering and statistics to derive the parameters for
ntp_adjtime(2) or however it decides to tweak the system clock.

This (unpredictable) ioctl latency is different from the
software-timestamped PPS interrupt latency, which is a problem.
Post by Poul-Henning Kamp
That is why the PPS-API, timecounters and kernel_pll exists: To
keep the "real-time" aspect of the timekeeping firmly inside the
kernel and undisturbed by userland and lower priority kernel
activities.
Sure, but having the code in the kernel has its limitations too. A huge one
is that it is much harder to fuse multiple time sources into an estimate
that can be used to steer the system clock. Exposing additional time
sources to userspace and having it figure out what to do seems more
practical than trying to implement time source combining in the kernel.

To be clear, I think a hybrid approach is the way to go - userspace does
fancy filtering and feeds the kernel pre-processed information so it can
steer the system clock more effectively. This is already what happens, I
just want to give userspace more time sources to work with.
Post by Poul-Henning Kamp
Unless you can expose the hardware directly to userland, via mmap(2),
timekeeping in userland is simply not going to perform.
Counterexample: Linux's PTP kernel API exposes time via char dev ioctls and
it performs very well. With good PTP-capable hardware, you can get to tens
of ns.
Post by Poul-Henning Kamp
With that said, a lot of our timekeeping is stuff I wrote 25 years
old, and it is absolutely due for both a rethink and a refresh,
Let me know if you have any ideas how the timekeeping code should change,
I'm more than happy to work on it.
Post by Poul-Henning Kamp
but
if you decide to throw it all out and start from fresh, you will
not get to the interesting parts for years.
I definitely don't want to throw it all out. My current code ignores it
because that's the quick & dirty way to experiment with an idea. A
production-ready version must be more cohesive, of course.
Post by Poul-Henning Kamp
https://papers.freebsd.org/2002/phk-timecounters/
I've read it a few times over the past couple of months while I was playing
around with the PHC idea. :)
Post by Poul-Henning Kamp
https://www.am1.us/wp-content/uploads/Documents/U11625_VIG-TUTORIAL.pdf
I didn't see this presentation before, but that slide is pretty much the one
figure in every timekeeping presentation. So, I have been thinking about it
already ;)
Post by Poul-Henning Kamp
(The other 307 pages are also interesting :-)
Indeed! It looks like a really information-dense set of slides. Thanks for
the link & feedback.

Jeff.


--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Loading...