Direct dumped kernel cores

Discussion:

(too old to reply)

Warner Losh

2024-10-31 22:32:51 UTC

Hi everyone,
At Juniper we've been using a so-called 'rescue' kernel for dumping
vmcores directly to the filesystem after a panic. We're now
contributing this feature, implemented by Klara Systems, to FreeBSD, and
looking for feedback. I posted a review
at https://reviews.freebsd.org/D47358 for anyone interested.
* It requires a 2-stage build process, one to build the rescue kernel,
the other to build the main kernel, which embeds the rescue kernel
inside its image. This might need some further work.
* Thus far it's been implemented for amd64 and arm64, once proven out,
other architectures (powerpc64/le, riscv64) can follow suit.
* Kernel environment bits to pass down to the rescue kernel are
prefixed `debug.rescue.`, for instance
`debug.rescue.vfs.root.mountfrom`.

First off, this is kinda cool. I've wanted this occasionally when my swap
partition is too small (though in my case, it was easy enough to add another
drive to the system that was panicking and dump to that).

I do have a question: I'm curious why you didn't follow the Linux lead of
having
a kexec_load(2) system call to load the 'rescue kernel' to make this more
generic.
That would make the leap to having full kexec support (eg reboot(CMD_KEXEC)
a lot easier to implement.

Warner

There are many more details in the review summary.
We'd love to get feedback from anyone interested.
Thanks,
Justin Hibbits

Ravi Pokala

2024-10-31 22:33:39 UTC

Permalink

Hi Justin,

So, this is like the 'crashkernel' thing Linux has, where it kexec()s an alternate kernel when the main one panics?

I haven't looked at the patch -- most of it will be way out of my expertise -- but what, if anything, is done to make sure the on-disk state of the target filesystem is okay, before the "rescue" kernel starts writing to it?

Thanks,

Ravi (rpokala@)

-----Original Message-----
From: <owner-freebsd-***@FreeBSD.org <mailto:owner-freebsd-***@FreeBSD.org>> on behalf of Justin Hibbits <***@FreeBSD.org <mailto:***@FreeBSD.org>>
Date: Thursday, October 31, 2024 at 15:23
To: <freebsd-***@FreeBSD.org <mailto:freebsd-***@FreeBSD.org>>, <freebsd-***@freebsd.org <mailto:freebsd-***@freebsd.org>>
Subject: Direct dumped kernel cores

Hi everyone,

At Juniper we've been using a so-called 'rescue' kernel for dumping
vmcores directly to the filesystem after a panic. We're now
contributing this feature, implemented by Klara Systems, to FreeBSD, and
looking for feedback. I posted a review
at https://reviews.freebsd.org/D47358 <https://reviews.freebsd.org/D47358> for anyone interested.

Interesting bits to keep in mind:
* It requires a 2-stage build process, one to build the rescue kernel,
the other to build the main kernel, which embeds the rescue kernel
inside its image. This might need some further work.
* Thus far it's been implemented for amd64 and arm64, once proven out,
other architectures (powerpc64/le, riscv64) can follow suit.
* Kernel environment bits to pass down to the rescue kernel are
prefixed `debug.rescue.`, for instance
`debug.rescue.vfs.root.mountfrom`.

There are many more details in the review summary.

We'd love to get feedback from anyone interested.

Thanks,
Justin Hibbits

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de

Justin Hibbits

2024-11-01 01:07:34 UTC

Permalink

On Thu, 31 Oct 2024 15:33:39 -0700

Post by Ravi Pokala
Hi Justin,
So, this is like the 'crashkernel' thing Linux has, where it kexec()s
an alternate kernel when the main one panics?

If your description is accurate, then probably. I don't know if the
crashkernel thing from Linux existed when this was started, and
actually hadn't even heard of it until now.

Post by Ravi Pokala
I haven't looked at the patch -- most of it will be way out of my
expertise -- but what, if anything, is done to make sure the on-disk
state of the target filesystem is okay, before the "rescue" kernel
starts writing to it?

Good question. The rescue kernel embeds its own small rootfs it fscks
the target fs before writing to it.

- Justin

Post by Ravi Pokala
Thanks,
-----Original Message-----
Hi everyone,
At Juniper we've been using a so-called 'rescue' kernel for dumping
vmcores directly to the filesystem after a panic. We're now
contributing this feature, implemented by Klara Systems, to FreeBSD,
and looking for feedback. I posted a review
at https://reviews.freebsd.org/D47358
<https://reviews.freebsd.org/D47358> for anyone interested.
* It requires a 2-stage build process, one to build the rescue kernel,
the other to build the main kernel, which embeds the rescue kernel
inside its image. This might need some further work.
* Thus far it's been implemented for amd64 and arm64, once proven out,
other architectures (powerpc64/le, riscv64) can follow suit.
* Kernel environment bits to pass down to the rescue kernel are
prefixed `debug.rescue.`, for instance
`debug.rescue.vfs.root.mountfrom`.
There are many more details in the review summary.
We'd love to get feedback from anyone interested.
Thanks,
Justin Hibbits

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de

Justin Hibbits

2024-11-01 01:11:51 UTC

Permalink

On Thu, 31 Oct 2024 16:32:51 -0600

Post by Warner Losh

Hi everyone,
At Juniper we've been using a so-called 'rescue' kernel for dumping
vmcores directly to the filesystem after a panic. We're now
contributing this feature, implemented by Klara Systems, to
FreeBSD, and looking for feedback. I posted a review
at https://reviews.freebsd.org/D47358 for anyone interested.
* It requires a 2-stage build process, one to build the rescue
kernel, the other to build the main kernel, which embeds the rescue
kernel inside its image. This might need some further work.
* Thus far it's been implemented for amd64 and arm64, once proven
out, other architectures (powerpc64/le, riscv64) can follow suit.
* Kernel environment bits to pass down to the rescue kernel are
prefixed `debug.rescue.`, for instance
`debug.rescue.vfs.root.mountfrom`.

First off, this is kinda cool. I've wanted this occasionally when my
swap partition is too small (though in my case, it was easy enough to
add another drive to the system that was panicking and dump to that).
I do have a question: I'm curious why you didn't follow the Linux
lead of having
a kexec_load(2) system call to load the 'rescue kernel' to make this
more generic.
That would make the leap to having full kexec support (eg
reboot(CMD_KEXEC) a lot easier to implement.
Warner

One problem with trying to kexec_load() a rescue kernel is that the
rescue kernel needs its own memory to work with, a contiguous block, so
needs to be loaded early, or at least reserved early. Without its
reserved memory it would be stomping over the 'host' kernel's
memory. That said, I do like that direction, and it's definitely worth
exploring.

- Justin

Post by Warner Losh

There are many more details in the review summary.
We'd love to get feedback from anyone interested.
Thanks,
Justin Hibbits

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de

Warner Losh

2024-11-01 01:48:53 UTC

Permalink

Post by Justin Hibbits
On Thu, 31 Oct 2024 16:32:51 -0600

Post by Warner Losh

Hi everyone,
At Juniper we've been using a so-called 'rescue' kernel for dumping
vmcores directly to the filesystem after a panic. We're now
contributing this feature, implemented by Klara Systems, to
FreeBSD, and looking for feedback. I posted a review
at https://reviews.freebsd.org/D47358 for anyone interested.
* It requires a 2-stage build process, one to build the rescue
kernel, the other to build the main kernel, which embeds the rescue
kernel inside its image. This might need some further work.
* Thus far it's been implemented for amd64 and arm64, once proven
out, other architectures (powerpc64/le, riscv64) can follow suit.
* Kernel environment bits to pass down to the rescue kernel are
prefixed `debug.rescue.`, for instance
`debug.rescue.vfs.root.mountfrom`.

First off, this is kinda cool. I've wanted this occasionally when my
swap partition is too small (though in my case, it was easy enough to
add another drive to the system that was panicking and dump to that).
I do have a question: I'm curious why you didn't follow the Linux
lead of having
a kexec_load(2) system call to load the 'rescue kernel' to make this
more generic.
That would make the leap to having full kexec support (eg
reboot(CMD_KEXEC) a lot easier to implement.
Warner

That's exactly what kexec_load does. When the crash happens, the current
kernel constructs a new memory map and passes that to the preloaded crash
kernel so it knows what memory can safely be used plus info needed to do
the crash dump.

For the replacement kernel, the reboot copies a miniloader that copies the
kernel to the load address, tears the cpu down to the warm reset state and
jumps to the trampoline used to start the kernel.

Loader.kboot writes that trampoline, creates the EFIlike style metadata and
a memory map. And then calls reboot to boot into the new kernel.

Warner

- Justin

Post by Justin Hibbits

Post by Warner Losh

There are many more details in the review summary.
We'd love to get feedback from anyone interested.
Thanks,
Justin Hibbits