Quick call for info from Bhyve users

Discussion:

Add Reply

Ravi Pokala

2024-11-12 01:45:04 UTC

Hi folks,

Could someone with a handy amd64 Bhyve VM run

kenv | grep smbios

in the VM and shoot me the results? I don't have such a system readily available at the moment, but could really use the info.

Thanks,

Ravi (rpokala@ ; wearing my Vdura (formerly known as Panasas) hat)

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de

void

2024-11-12 11:00:11 UTC

Permalink

Hello,

On Mon, Nov 11, 2024 at 07:45:04PM -0600, Ravi Pokala wrote:
>Hi folks,
>
>Could someone with a handy amd64 Bhyve VM run
>
> kenv | grep smbios
>
>in the VM and shoot me the results? I don't have such a system readily available at the moment, but could really use the info.

% kenv | grep smbios
smbios.bios.vendor="BHYVE"

(14.1-RELEASE-p5 GENERIC amd64 1401000 1401000)
--

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de

Aryeh Friedman

2024-11-12 11:38:49 UTC

Permalink

On Mon, Nov 11, 2024 at 8:45 PM Ravi Pokala <***@freebsd.org> wrote:
>
> Hi folks,
>
> Could someone with a handy amd64 Bhyve VM run
>
> kenv | grep smbios
>
> in the VM and shoot me the results? I don't have such a system readily available at the moment, but could really use the info.
>
> Thanks,

For guest:

***@sarek2048% kenv | grep smbios
hint.smbios.0.mem="0xbfbcb000"
smbios.bios.reldate="03/14/2014"
smbios.bios.revision="0.0"
smbios.bios.vendor="BHYVE"
smbios.bios.version="1.00"
smbios.chassis.maker=" "
smbios.chassis.serial="None"
smbios.chassis.tag="None"
smbios.chassis.type="Unknown"
smbios.chassis.version="1.0"
smbios.memory.enabled="67106816"
smbios.socket.enabled="1"
smbios.socket.populated="1"
smbios.system.family=" "
smbios.system.maker=" "
smbios.system.product="BHYVE"
smbios.system.serial="None"
smbios.system.sku="None"
smbios.system.uuid="80574068-2cc6-6d34-853b-e8ade09ca5f6"
smbios.system.version="1.0"
smbios.version="2.8"

And the host:

smbios.bios.reldate="08/11/2022"
smbios.bios.vendor="American Megatrends International, LLC."
smbios.bios.version="P.20"
smbios.chassis.maker="Micro-Star International Co., Ltd."
smbios.chassis.serial="To be filled by O.E.M."
smbios.chassis.tag="To be filled by O.E.M."
smbios.chassis.type="Desktop"
smbios.chassis.version="5.0"
smbios.memory.enabled="33554432"
smbios.planar.location="To be filled by O.E.M."
smbios.planar.maker="Micro-Star International Co., Ltd."
smbios.planar.product="B550 GAMING GEN3 (MS-7B86)"
smbios.planar.serial="07B8651_N11E812185"
smbios.planar.tag="To be filled by O.E.M."
smbios.planar.version="5.0"
smbios.socket.enabled="1"
smbios.socket.populated="1"
smbios.system.family="To be filled by O.E.M."
smbios.system.maker="Micro-Star International Co., Ltd."
smbios.system.product="MS-7B86"
smbios.system.serial="To be filled by O.E.M."
smbios.system.sku="To be filled by O.E.M."
smbios.system.uuid="86ac8a04-ab86-5911-a547-047c167e3610"
smbios.system.version="5.0"
smbios.version="2.8"

--
Aryeh M. Friedman, Lead Developer, http://www.PetiteCloud.org

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de

void

2024-11-12 13:26:26 UTC

Permalink

On Tue, Nov 12, 2024 at 06:38:49AM -0500, Aryeh Friedman wrote:
>For guest:
>
>***@sarek2048% kenv | grep smbios
>hint.smbios.0.mem="0xbfbcb000"
>smbios.bios.reldate="03/14/2014"
>smbios.bios.revision="0.0"
>smbios.bios.vendor="BHYVE"
>smbios.bios.version="1.00"
>smbios.chassis.maker=" "
>smbios.chassis.serial="None"
>smbios.chassis.tag="None"
>smbios.chassis.type="Unknown"
>smbios.chassis.version="1.0"
>smbios.memory.enabled="67106816"
>smbios.socket.enabled="1"
>smbios.socket.populated="1"
>smbios.system.family=" "
>smbios.system.maker=" "
>smbios.system.product="BHYVE"
>smbios.system.serial="None"
>smbios.system.sku="None"
>smbios.system.uuid="80574068-2cc6-6d34-853b-e8ade09ca5f6"
>smbios.system.version="1.0"
>smbios.version="2.8"

Huh. This vm shows none of that, apart from smbios.bios.vendor.
How are you launching the vm? In this instance, I'm using the
supplied vmrun.sh.

On this vm, there's this:

% doas sysctl -a | grep smbios
Password:

smbios0: <System Management BIOS> at iomem 0xf1000-0xf101e
smbios0: Version: 2.6, BCD Revision: 2.6
device smbios
dev.smbios.0.%parent: nexus0
dev.smbios.0.%pnpinfo:
dev.smbios.0.%location:
dev.smbios.0.%driver: smbios
dev.smbios.0.%desc: System Management BIOS
dev.smbios.%parent:
--

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de

Ravi Pokala

2024-11-12 16:55:11 UTC

Permalink

Many thanks to all the people who replied back to me so quickly!

I got the information I needed.

Thanks again!

-Ravi (rpokala@ ; wearing my Vdura (formerly known as Panasas) hat)

-----Original Message-----
From: Aryeh Friedman <***@gmail.com <mailto:***@gmail.com>>
Date: Tuesday, November 12, 2024 at 05:38
To: Ravi Pokala <***@freebsd.org <mailto:***@freebsd.org>>
Cc: "freebsd-***@freebsd.org <mailto:freebsd-***@freebsd.org>" <freebsd-***@freebsd.org <mailto:freebsd-***@freebsd.org>>
Subject: Re: Quick call for info from Bhyve users

On Mon, Nov 11, 2024 at 8:45 PM Ravi Pokala <***@freebsd.org <mailto:***@freebsd.org>> wrote:
>
> Hi folks,
>
> Could someone with a handy amd64 Bhyve VM run
>
> kenv | grep smbios
>
> in the VM and shoot me the results? I don't have such a system readily available at the moment, but could really use the info.
>
> Thanks,

For guest:

***@sarek2048% kenv | grep smbios
hint.smbios.0.mem="0xbfbcb000"
smbios.bios.reldate="03/14/2014"
smbios.bios.revision="0.0"
smbios.bios.vendor="BHYVE"
smbios.bios.version="1.00"
smbios.chassis.maker=" "
smbios.chassis.serial="None"
smbios.chassis.tag="None"
smbios.chassis.type="Unknown"
smbios.chassis.version="1.0"
smbios.memory.enabled="67106816"
smbios.socket.enabled="1"
smbios.socket.populated="1"
smbios.system.family=" "
smbios.system.maker=" "
smbios.system.product="BHYVE"
smbios.system.serial="None"
smbios.system.sku="None"
smbios.system.uuid="80574068-2cc6-6d34-853b-e8ade09ca5f6"
smbios.system.version="1.0"
smbios.version="2.8"

And the host:

smbios.bios.reldate="08/11/2022"
smbios.bios.vendor="American Megatrends International, LLC."
smbios.bios.version="P.20"
smbios.chassis.maker="Micro-Star International Co., Ltd."
smbios.chassis.serial="To be filled by O.E.M."
smbios.chassis.tag="To be filled by O.E.M."
smbios.chassis.type="Desktop"
smbios.chassis.version="5.0"
smbios.memory.enabled="33554432"
smbios.planar.location="To be filled by O.E.M."
smbios.planar.maker="Micro-Star International Co., Ltd."
smbios.planar.product="B550 GAMING GEN3 (MS-7B86)"
smbios.planar.serial="07B8651_N11E812185"
smbios.planar.tag="To be filled by O.E.M."
smbios.planar.version="5.0"
smbios.socket.enabled="1"
smbios.socket.populated="1"
smbios.system.family="To be filled by O.E.M."
smbios.system.maker="Micro-Star International Co., Ltd."
smbios.system.product="MS-7B86"
smbios.system.serial="To be filled by O.E.M."
smbios.system.sku="To be filled by O.E.M."
smbios.system.uuid="86ac8a04-ab86-5911-a547-047c167e3610"
smbios.system.version="5.0"
smbios.version="2.8"

--
Aryeh M. Friedman, Lead Developer, http://www.PetiteCloud.org <http://www.PetiteCloud.org>

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de

Tomek CEDRO

2024-11-13 22:35:15 UTC

Permalink

On Wed, Nov 13, 2024 at 9:30 PM George Mitchell <george+***@m5p.com> wrote:
> On 11/12/24 17:46, George Mitchell wrote:
> > Has anyone ever used the MCP2221 chip from Microchip Technology (or any
> > device incorporating it) on FreeBSD? If so, does it attach as both a
> > serial port (cuaUn) AND human interface (uhidn), or just one? Does it
> > work well? Thanks for any information you can give me. -- George
>
> Does FreeBSD have the concept of one hardware device attaching as two
> device nodes? -- George

Hey there George :-)

Yes USB device may expose several interfaces and each one of them may
use different kernel module / subsystem, all over single USB cable,
that works well on FreeBSD. For instance Debug Probes (i.e. DAPLink or
STLink) offer Serial Port for console access, JTAG/SWD/CMSIS HID
interface for debug, and UMS (mass storage) emulation for
drag-and-drop bin file for firmware flashing :-)

I cannot tell if MCP2221 works well because I did not play with that
chip you have to verify in practice sorry.. but it seems something
like FT2232 or CH3421 right? I can see 3 options here depending on
what you need :-)

1. The serial port emulation uses CDC standard, the same with anything
that goes over HID, UMS, and others that are supported here. If this
is a generic implementation it should work out of the box.. the only
thing here that _may_ need an update is the VID:PID pair for a new
chip identification numbers to be added to a specific kernel driver so
it attaches on device connect (i.e. CDC so /dev/cuaU* is showed up).

2. If this is some sort of custom implementation (like FT2232) then
additional driver would be necessary to translate between device
internal registers to something that i.e. CDC can talk with. This is
the case when you want to have native OS driver because of reasons.

3. FreeBSD provides native LibUSB interface that can be used by any
application that already can talk to the chip directly over LibUSB
(i.e. PyMCP2221A [1]). No system drivers are necessary in that case.
For example if the chip is not supported at all here by the system but
you have application that can talk to the chip on its own over libusb
all should work fine :-)

https://github.com/nonNoise/PyMCP2221A

There are standard tools on FreeBSD like usbconfig that allows you to
work with USB subsystem and devices (i.e. view usb device descriptors,
reset, configure interfaces, etc), or devd that allows you to tell
what system should do on device attach/detach events. USB stack is
rock solid :-)

https://man.freebsd.org/cgi/man.cgi?usbconfig

I hope that helps a bit :-)
Tomek

--
CeDeROM, SQ7MHZ, http://www.tomek.cedro.info

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de

Tomek CEDRO

2024-11-13 22:59:10 UTC

Permalink

On Wed, Nov 13, 2024 at 11:45 PM George Mitchell <george+***@m5p.com> wrote:
> (..)
> > I cannot tell if MCP2221 works well because I did not play with that
> > chip you have to verify in practice sorry.. but it seems something
> > like FT2232 or CH3421 right? I can see 3 options here depending on
> > what you need :-)
> > (..)
> > I hope that helps a bit :-)
> > Tomek
>
> It does! Thank you. -- George

Okay I just ordered the MCP2221A Breakout Stemma QT/Qwiic for ~10EUR
it should arrive on Friday/Monday will see what happens ;-)

--
CeDeROM, SQ7MHZ, http://www.tomek.cedro.info

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de

George Mitchell

2024-11-13 23:08:38 UTC

Permalink

On 11/13/24 17:59, Tomek CEDRO wrote:
> On Wed, Nov 13, 2024 at 11:45â¯PM George Mitchell <george+***@m5p.com> wrote:
>> (..)
>>> I cannot tell if MCP2221 works well because I did not play with that
>>> chip you have to verify in practice sorry.. but it seems something
>>> like FT2232 or CH3421 right? I can see 3 options here depending on
>>> what you need :-)
>>> (..)
>>> I hope that helps a bit :-)
>>> Tomek
>>
>> It does! Thank you. -- George
>
> Okay I just ordered the MCP2221A Breakout Stemma QT/Qwiic for ~10EUR
> it should arrive on Friday/Monday will see what happens ;-)
>
That just happens to be the board I was hoping to get! -- George

Tomek CEDRO

2024-11-13 23:21:13 UTC

Permalink

On Thu, Nov 14, 2024 at 12:08 AM George Mitchell <george+***@m5p.com> wrote:
> > Okay I just ordered the MCP2221A Breakout Stemma QT/Qwiic for ~10EUR
> > it should arrive on Friday/Monday will see what happens ;-)
> >
> That just happens to be the board I was hoping to get! -- George

Cool I will let you know how it works :-) I also ordered some ESP32S3
module with LCD and touch for NuttX RTOS + LVGL testing, and rPI-Zero
because I did not have one yet and its cheap and seems powerful enough
not only to run NuttX RTOS but also FreeBSD!! :D

--
CeDeROM, SQ7MHZ, http://www.tomek.cedro.info

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de

Bakul Shah

2024-11-13 23:28:46 UTC

Permalink

On Nov 13, 2024, at 3:21 PM, Tomek CEDRO <***@cedro.info> wrote:
>
> rPI-Zero
> because I did not have one yet and its cheap and seems powerful enough
> not only to run NuttX RTOS but also FreeBSD!! :D

piZero models older than Zero 2 W are not 64 bit capable. If it matters.

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de

Tomek CEDRO

2024-11-14 00:02:02 UTC

Permalink

On Thu, Nov 14, 2024 at 12:29 AM Bakul Shah <***@iitbombay.org> wrote:
> On Nov 13, 2024, at 3:21 PM, Tomek CEDRO <***@cedro.info> wrote:
> > rPI-Zero
> > because I did not have one yet and its cheap and seems powerful enough
> > not only to run NuttX RTOS but also FreeBSD!! :D
>
> piZero models older than Zero 2 W are not 64 bit capable. If it matters.

Thanks Bakul!! I just updated my order to rPI02W :-) :-)

--
CeDeROM, SQ7MHZ, http://www.tomek.cedro.info

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de

Tomek CEDRO

2024-11-16 01:27:02 UTC

Permalink

On Thu, Nov 14, 2024 at 12:21 AM Tomek CEDRO <***@cedro.info> wrote:
> On Thu, Nov 14, 2024 at 12:08 AM George Mitchell <george+***@m5p.com> wrote:
> > > Okay I just ordered the MCP2221A Breakout Stemma QT/Qwiic for ~10EUR
> > > it should arrive on Friday/Monday will see what happens ;-)
> > That just happens to be the board I was hoping to get! -- George
> Cool I will let you know how it works :-)

This MCP2221A works like a charm out of the box on FreeBSD 13.3 :-)

Nov 16 02:08:30 octagon kernel: ugen0.6: <Microchip Technology Inc.
MCP2221 USB-I2C/UART Combo> at usbus0
Nov 16 02:08:30 octagon kernel: umodem0 on uhub21
Nov 16 02:08:30 octagon kernel: umodem0: <Microchip Technology Inc.
MCP2221 USB-I2C/UART Combo, class 239/2, rev 2.00/1.00, addr 5> on
usbus0
Nov 16 02:08:30 octagon kernel: umodem0: data interface 1, has no CM
over data, has no break
Nov 16 02:08:30 octagon kernel: usbhid5 on uhub21
Nov 16 02:08:30 octagon kernel: usbhid5: <Microchip Technology Inc.
MCP2221 USB-I2C/UART Combo, class 239/2, rev 2.00/1.00, addr 5> on
usbus0
Nov 16 02:08:30 octagon kernel: hidbus5: <HID bus> on usbhid5

Then /dev/cuaU0 shows up and I can connect to some board over UART:

nsh> uname -a
NuttX 12.7.0 5d8cdeaea8-dirty Nov 16 2024 02:02:28 arm nucleo-f4x1re

I used minicom to change baud and it seems to work. Note this chip
does not have control lines so you probably need to use GPIO to reset
/ select boot if necessary.

Regarding the additional stuff it seems to work in Python over hid but
I did not play any further (after cable disconnect I got errors on
commands so there is some communication).

(venv3.11embedded) python
Python 3.11.10 (main, Sep 14 2024, 01:19:37) [Clang 17.0.6
(https://github.com/llvm/llvm-project.git llvmorg-17.0.6-0-g600970 on
freebsd13
Type "help", "copyright", "credits" or "license" for more information.
>>> from PyMCP2221A import PyMCP2221A
>>> mcp2221 = PyMCP2221A.PyMCP2221A()
>>> mcp2221.GPIO_Init()
mcp2221.ADC_1_Init() mcp2221.CLKDIV_2
mcp2221.CLKDUTY_25 mcp2221.DAC_2_Init()
mcp2221.GPIO_0_InputMode() mcp2221.GPIO_1_Input()
mcp2221.GPIO_2_DIR mcp2221.GPIO_3_BIT
mcp2221.GPIO_3_OutputMode() mcp2221.I2C_Read(
mcp2221.Read_Flash_Data(
mcp2221.ADC_2_Init() mcp2221.CLKDIV_32
mcp2221.CLKDUTY_50 mcp2221.DAC_Datawrite(
mcp2221.GPIO_0_MODE mcp2221.GPIO_1_InputMode()
mcp2221.GPIO_2_Input() mcp2221.GPIO_3_DIR
mcp2221.GPIO_Init() mcp2221.I2C_Read_Repeated(
mcp2221.Reset()
mcp2221.ADC_3_Init() mcp2221.CLKDIV_4
mcp2221.CLKDUTY_75 mcp2221.DeviceDriverInfo()
mcp2221.GPIO_0_Output( mcp2221.GPIO_1_MODE
mcp2221.GPIO_2_InputMode() mcp2221.GPIO_3_Input()
mcp2221.GPIO_Read() mcp2221.I2C_State_Check()
mcp2221.Write_Flash_Data(
mcp2221.ADC_DataRead() mcp2221.CLKDIV_64
mcp2221.ClockOut( mcp2221.GPIO_0_BIT
mcp2221.GPIO_0_OutputMode() mcp2221.GPIO_1_Output(
mcp2221.GPIO_2_MODE mcp2221.GPIO_3_InputMode()
mcp2221.GPIO_Write() mcp2221.I2C_Write(
mcp2221.mcp2221a
mcp2221.CLKDIV_128 mcp2221.CLKDIV_8
mcp2221.Command_Structure( mcp2221.GPIO_0_DIR
mcp2221.GPIO_1_BIT mcp2221.GPIO_1_OutputMode()
mcp2221.GPIO_2_Output( mcp2221.GPIO_3_MODE
mcp2221.I2C_Cancel() mcp2221.I2C_Write_No_Stop(
mcp2221.CLKDIV_16 mcp2221.CLKDUTY_0
mcp2221.DAC_1_Init() mcp2221.GPIO_0_Input()
mcp2221.GPIO_1_DIR mcp2221.GPIO_2_BIT
mcp2221.GPIO_2_OutputMode() mcp2221.GPIO_3_Output(
mcp2221.I2C_Init( mcp2221.I2C_Write_Repeated(

Regarding the rPI Zero 2W I dd'ed the 14.1 SD image for RPI and got
the rainbow box on hdmi but some updates are needed in the firmware
files on first partition, I quickly updated to git master but that did
not help, and I not rally have time now to play, but it was fun to try
:-)

Have fun George :-)

--
CeDeROM, SQ7MHZ, http://www.tomek.cedro.info

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de

George Mitchell

2024-11-14 18:31:13 UTC

Permalink

On 11/14/24 02:38, Andriy Gapon wrote:
> On 13/11/2024 22:29, George Mitchell wrote:
>> On 11/12/24 17:46, George Mitchell wrote:
>>> Has anyone ever used the MCP2221 chip from Microchip Technology (or any
>>> device incorporating it) on FreeBSD?Â If so, does it attach as both a
>>> serial port (cuaUn) AND human interface (uhidn), or just one?Â Does it
>>> work well?Â Thanks for any information you can give me.Â Â Â Â Â Â -- George
>>
>> Does FreeBSD have the concept of one hardware device attaching as two
>> device nodes?Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â -- George
>
> Take a look at cp2112 driver, for instance.
>
Thanks, Andriy! -- George

Alan Somers

2024-11-22 14:47:07 UTC

Permalink

On Fri, Nov 22, 2024 at 7:07 AM Mark Johnston <***@freebsd.org> wrote:
>
> On Thu, Nov 21, 2024 at 04:06:53PM -0700, Alan Somers wrote:
> > Would it be possible to make dtrace with with KMSAN? It would
> > certainly make my life easier. As it is, every time I try to kldload
> > traceall, whether from the command line or in the loader, my VM
> > infinite loops printing alerts like this:
> >
> > MSan: Uninitialized stack memory from isize64+0x2e
> > #0 0xffffffff833d8f10 at __msan_warning+0x140
> > #1 0xffffffff86ec9c03 at dtrace_disp_opnd+0xd3
> > #2 0xffffffff86ebe552 at dtrace_disx86+0xc602
> > #3 0xffffffff86eca98e at dtrace_instr_size+0xee
> > #4 0xffffffff86d5ec27 at fbt_provide_module_function+0x957
> > #5 0xffffffff83303eff at link_elf_each_function_nameval+0x56f
> > #6 0xffffffff86d56cb3 at fbt_provide_module+0x423
> > #7 0xffffffff86d56871 at fbt_linker_file_cb+0x41
> > #8 0xffffffff830d9f5c at linker_file_foreach+0xdc
> > #9 0xffffffff830d63dd at linker_load_module+0x413d
> > #10 0xffffffff830e6116 at linker_load_dependencies+0x1136
> > #11 0xffffffff853b70b8 at link_elf_load_file+0x65c8
> > #12 0xffffffff830d542e at linker_load_module+0x318e
> > #13 0xffffffff830e0611 at kern_kldload+0x5d1
> > #14 0xffffffff830e0d74 at sys_kldload+0x1a4
> > #15 0xffffffff84fe56f6 at amd64_syscall+0x706
> > #16 0xffffffff84f0ef4b at fast_syscall_common+0xf8
> >
>
> Here you go:
>
> https://cgit.freebsd.org/src/commit/?id=418d8f0dc269b314bba51de63869b20da1d9a76e
> https://cgit.freebsd.org/src/commit/?id=cc3da1955c16df5eb0019e0fef810696b035b7cf
>
> This one might also be important if your test system is low on RAM,
> since the KMSAN shadow map gobbles up quite a lot of memory:
>
> https://cgit.freebsd.org/src/commit/?id=5d12db2dafece9f6a0453c4a45c4abed6b1e15ec
>
> I haven't yet tried running through the full test suite, but this was
> enough to load dtracell (quite slow under KMSAN) and run some simple
> scripts.

Wow, thanks! I'll try it out later today.

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de

alan somers

2024-11-22 20:58:39 UTC

Permalink

On Fri, Nov 22, 2024 at 7:47 AM Alan Somers <***@freebsd.org> wrote:
>
> On Fri, Nov 22, 2024 at 7:07 AM Mark Johnston <***@freebsd.org> wrote:
> >
> > On Thu, Nov 21, 2024 at 04:06:53PM -0700, Alan Somers wrote:
> > > Would it be possible to make dtrace with with KMSAN? It would
> > > certainly make my life easier. As it is, every time I try to kldload
> > > traceall, whether from the command line or in the loader, my VM
> > > infinite loops printing alerts like this:
> > >
> > > MSan: Uninitialized stack memory from isize64+0x2e
> > > #0 0xffffffff833d8f10 at __msan_warning+0x140
> > > #1 0xffffffff86ec9c03 at dtrace_disp_opnd+0xd3
> > > #2 0xffffffff86ebe552 at dtrace_disx86+0xc602
> > > #3 0xffffffff86eca98e at dtrace_instr_size+0xee
> > > #4 0xffffffff86d5ec27 at fbt_provide_module_function+0x957
> > > #5 0xffffffff83303eff at link_elf_each_function_nameval+0x56f
> > > #6 0xffffffff86d56cb3 at fbt_provide_module+0x423
> > > #7 0xffffffff86d56871 at fbt_linker_file_cb+0x41
> > > #8 0xffffffff830d9f5c at linker_file_foreach+0xdc
> > > #9 0xffffffff830d63dd at linker_load_module+0x413d
> > > #10 0xffffffff830e6116 at linker_load_dependencies+0x1136
> > > #11 0xffffffff853b70b8 at link_elf_load_file+0x65c8
> > > #12 0xffffffff830d542e at linker_load_module+0x318e
> > > #13 0xffffffff830e0611 at kern_kldload+0x5d1
> > > #14 0xffffffff830e0d74 at sys_kldload+0x1a4
> > > #15 0xffffffff84fe56f6 at amd64_syscall+0x706
> > > #16 0xffffffff84f0ef4b at fast_syscall_common+0xf8
> > >
> >
> > Here you go:
> >
> > https://cgit.freebsd.org/src/commit/?id=418d8f0dc269b314bba51de63869b20da1d9a76e
> > https://cgit.freebsd.org/src/commit/?id=cc3da1955c16df5eb0019e0fef810696b035b7cf
> >
> > This one might also be important if your test system is low on RAM,
> > since the KMSAN shadow map gobbles up quite a lot of memory:
> >
> > https://cgit.freebsd.org/src/commit/?id=5d12db2dafece9f6a0453c4a45c4abed6b1e15ec
> >
> > I haven't yet tried running through the full test suite, but this was
> > enough to load dtracell (quite slow under KMSAN) and run some simple
> > scripts.
>
> Wow, thanks! I'll try it out later today.

It works now. I can load the module and do some probes. Other
probes, though, still trigger warnings. For instance,
'fbt:zfs:zio_data_buf_alloc:entry {@z[stack()] = count();}' triggers
warnings like this:

MSan: Uninitialized stack memory in copyout():arg1, offset 24/368,
addr 0xfffffe00b68ae018, from w_stillcold+0x28
#0 0xffffffff8340009c at kmsan_report_hook+0x15c
#1 0xffffffff833dbc61 at kmsan_copyout+0x1f1
#2 0xffffffff87e800f0 at dtrace_ioctl+0x4420
#3 0xffffffff8297795f at devfs_ioctl+0x3ef
#4 0xffffffff8547c277 at VOP_IOCTL_APV+0x107
#5 0xffffffff8381f974 at vn_ioctl+0x7a4
#6 0xffffffff8297a0f6 at devfs_ioctl_f+0x186
#7 0xffffffff834f3c2b at kern_ioctl+0xc5b
#8 0xffffffff834f2dc0 at sys_ioctl+0x580
#9 0xffffffff84fe7836 at amd64_syscall+0x706
#10 0xffffffff84f1128b at fast_syscall_common+0xf8

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de

Daniel O'Connor

2024-12-16 04:13:45 UTC

Permalink

Hi Mark,

> On 16 Dec 2024, at 10:33, Mark Millard <***@yahoo.com> wrote:
> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=267028 is for a crash problem
> someone has been having over more than 2 years. There are boot time crashes
> involved.
>
> It appears that 0xFFFFF80000000007 is showing up in use and stored in data
> structures as a pointer value in fields/arguments that are pointers, where such
> a special value would not be expected. Later defrerencing does not go well, at
> least when the dererefenced data is then in-turn put to use.
>
> The small offset from 0xFFFFF80000000000 suggests to me that the special value likely
> is inappropriately left around and somehow picked up and used. 0xFFFFF80000000000 (or
> near it) might be odd enough to have only a few known likely possible usages. Such
> notes in the bugzilla report would be good if such is the case. Thus my question.

That value (0xffffffff80000000) is kernbase (see sysctl kern.base_address).

However it is hard to think of why that value (or a small offset to it) is getting put in places it shouldn't be..

--
Daniel O'Connor
"The nice thing about standards is that there
are so many of them to choose from."
-- Andrew Tanenbaum

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de

Mark Millard

2024-12-16 05:48:45 UTC

Permalink

On Dec 15, 2024, at 20:13, Daniel O'Connor <***@dons.net.au> wrote:

> Hi Mark,

Hello Daniel,

>> On 16 Dec 2024, at 10:33, Mark Millard <***@yahoo.com> wrote:
>> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=267028 is for a crash problem
>> someone has been having over more than 2 years. There are boot time crashes
>> involved.
>>
>> It appears that 0xFFFFF80000000007 is showing up in use and stored in data
>> structures as a pointer value in fields/arguments that are pointers, where such
>> a special value would not be expected. Later defrerencing does not go well, at
>> least when the dererefenced data is then in-turn put to use.
>>
>> The small offset from 0xFFFFF80000000000 suggests to me that the special value likely
>> is inappropriately left around and somehow picked up and used. 0xFFFFF80000000000 (or
>> near it) might be odd enough to have only a few known likely possible usages. Such
>> notes in the bugzilla report would be good if such is the case. Thus my question.
>
> That value (0xffffffff80000000) is kernbase (see sysctl kern.base_address).

On an amd64 system that I have access to:

# sysctl -x kern.base_address
kern.base_address: 0xffffffff80000000

But, while looking similar, it is not the same base number:

0xfffff80000000007 (copied and pasted from the kgdb session on the vmcore.*)
0xffffffff80000000

However, kern.base_address might be something that varies from
system to system in some way.

The closest examples I see in sysctl -ax output, start with
0xfffff801. . ., such as shown by:

kern.geom.confdot: digraph geom {
z0xfffff80105633a00 [shape=box,label="ZFS::VDEV\nzfs::vdev\nr#4"];
z0xfffff827c9e7dc80 [label="r1w1e1"];
z0xfffff827c9e7dc80 -> z0xfffff827c9e6d800;
z0xfffff80105633a00 -> z0xfffff827c9e7dc80;
z0xfffff8255c020300 [shape=box,label="SWAP\nswap\nr#4"];
z0xfffff80e3c0bed00 [label="r1w1e0"];
z0xfffff80e3c0bed00 -> z0xfffff80105633e00;
z0xfffff8255c020300 -> z0xfffff80e3c0bed00;
z0xfffff8010553c300 [shape=box,label="PART\nda0\nr#2"];
z0xfffff80105531700 [label="r0w0e0"];
z0xfffff80105531700 -> z0xfffff80105337c00;
z0xfffff8010553c300 -> z0xfffff80105531700;
. . .
z0xfffff80105afa080 [label="r0w0e0"];
z0xfffff80105afa080 -> z0xfffff80105631000;
z0xfffff827c9f56400 -> z0xfffff80105afa080;
z0xfffff8013806b800 [shape=box,label="DEV\nnda0\nr#2"];
z0xfffff827c992f580 [label="r0w0e0"];
z0xfffff827c992f580 -> z0xfffff80105931200;
z0xfffff8013806b800 -> z0xfffff827c992f580;
. . .
kern.geom.confxml: <mesh>
. . .
<class id="0xffffffff82461c78">
<name>ZFS::VDEV</name>
<geom id="0xfffff80105633a00">
<class ref="0xffffffff82461c78"/>
<name>zfs::vdev</name>
<rank>4</rank>
<consumer id="0xfffff827c9e7dc80">
<geom ref="0xfffff80105633a00"/>
<provider ref="0xfffff827c9e6d800"/>
<mode>r1w1e1</mode>
</consumer>
</geom>
</class>
<class id="0xffffffff819375f8">
<name>SWAP</name>
<geom id="0xfffff8255c020300">
<class ref="0xffffffff819375f8"/>
<name>swap</name>
<rank>4</rank>
<consumer id="0xfffff80e3c0bed00">
<geom ref="0xfffff8255c020300"/>
<provider ref="0xfffff80105633e00"/>
<mode>r1w1e0</mode>
</consumer>
</geom>
</class>
. . .
<geom id="0xfffff8013806b800">
<class ref="0xffffffff818b7560"/>
<name>nda0</name>
<rank>2</rank>
<consumer id="0xfffff827c992f580">
<geom ref="0xfffff8013806b800"/>
<provider ref="0xfffff80105931200"/>
<mode>r0w0e0</mode>
</consumer>
</geom>

So: Only seen in such kern.geom.* related sysctl -ax output.

Thanks: I'd not considered looking at sysctl output.

> However it is hard to think of why that value (or a small offset to it) is getting put in places it shouldn't be..

Certainly does seem odd.

===
Mark Millard
marklmi at yahoo.com

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de

Daniel O'Connor

2024-12-16 06:35:55 UTC

Permalink

> On 16 Dec 2024, at 16:18, Mark Millard <***@yahoo.com> wrote:
>>> On 16 Dec 2024, at 10:33, Mark Millard <***@yahoo.com> wrote:
>>> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=267028 is for a crash problem
>>> someone has been having over more than 2 years. There are boot time crashes
>>> involved.
>>>
>>> It appears that 0xFFFFF80000000007 is showing up in use and stored in data
>>> structures as a pointer value in fields/arguments that are pointers, where such
>>> a special value would not be expected. Later defrerencing does not go well, at
>>> least when the dererefenced data is then in-turn put to use.
>>>
>>> The small offset from 0xFFFFF80000000000 suggests to me that the special value likely
>>> is inappropriately left around and somehow picked up and used. 0xFFFFF80000000000 (or
>>> near it) might be odd enough to have only a few known likely possible usages. Such
>>> notes in the bugzilla report would be good if such is the case. Thus my question.
>>
>> That value (0xffffffff80000000) is kernbase (see sysctl kern.base_address).
>
> On an amd64 system that I have access to:
>
> # sysctl -x kern.base_address
> kern.base_address: 0xffffffff80000000
>
> But, while looking similar, it is not the same base number:
>
> 0xfffff80000000007 (copied and pasted from the kgdb session on the vmcore.*)
> 0xffffffff80000000

Oops, my mistake!

> However, kern.base_address might be something that varies from
> system to system in some way.

Your value is the same as mine on this amd64 system - I don't think it varies (for a given architecture anyway)

> The closest examples I see in sysctl -ax output, start with
> 0xfffff801. . ., such as shown by:
>
> kern.geom.confdot: digraph geom {
> z0xfffff80105633a00 [shape=box,label="ZFS::VDEV\nzfs::vdev\nr#4"];

I assume these addresses are pointers to the internal GEOM objects (because they must be unique) - ie they are actual memory location.

Hmm, perhaps 0xfffff80000000000 is where kernel RAM starts?

--
Daniel O'Connor
"The nice thing about standards is that there
are so many of them to choose from."
-- Andrew Tanenbaum

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de

Philipp

2024-12-16 07:01:59 UTC

Permalink

Hi Mark

[2024-12-15 16:03] Mark Millard <***@yahoo.com>
> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=267028 is for a crash problem
> someone has been having over more than 2 years. There are boot time crashes
> involved.
>
> It appears that 0xFFFFF80000000007 is showing up in use and stored in data
> structures as a pointer value in fields/arguments that are pointers, where such
> a special value would not be expected. Later defrerencing does not go well, at
> least when the dererefenced data is then in-turn put to use.
>
> The small offset from 0xFFFFF80000000000 suggests to me that the special value likely
> is inappropriately left around and somehow picked up and used. 0xFFFFF80000000000 (or
> near it) might be odd enough to have only a few known likely possible usages. Such
> notes in the bugzilla report would be good if such is the case. Thus my question.

By simple grep through sys/ I found following comment in sys/amd64/include/vmparam.h:

> /*
> * Virtual addresses of things. Derived from the page directory and
> * page table indexes from pmap.h for precision.
> [...]
> * 0xfffff80000000000 - 0xfffffbffffffffff 4TB direct map

The direct map is 4TB of virtuall address space mapping the physical
address space 1:1 (minus the base). So I would guess this is caused by
an NULL pointer converted by PHYS_TO_DMAP.

Philipp

> The context has amdgpu raven support in use normally. Reportedly the problem has
> never been seen with that disabled. (However, I'm not aware of experiments with
> alternate card types, for example.)
>
> Where, when, and if a boot crash occurs is variable, not stable. But use of the
> list found_modules->tqh_first->. . . tends to be involved.
>
>
>
> Some other modern 13.4-RELEASE related context notes
> ( comments #231 and #233 ):
>
> The person with the problem reports . . .
>
> I am not using a stock distribution of the kernel:
>
> diff -u sys/amd64/conf/{GENERIC,M5P}
> --- sys/amd64/conf/GENERIC 2024-07-03 16:23:56.252550000 -0400
> +++ sys/amd64/conf/M5P 2024-07-03 16:25:05.287604000 -0400
> @@ -18,12 +18,13 @@
> #
>
> cpu HAMMER
> -ident GENERIC
> +ident M5P
>
> makeoptions DEBUG=-g # Build kernel with gdb(1) debug symbols
> makeoptions WITH_CTF=1 # Run ctfconvert(1) for DTrace support
>
> -options SCHED_ULE # ULE scheduler
> +#options SCHED_ULE # ULE scheduler
> +options SCHED_4BSD # 4BSD scheduler
> options NUMA # Non-Uniform Memory Architecture support
> options PREEMPTION # Enable kernel thread preemption
> options VIMAGE # Subsystem virtualization, e.g. VNET
>
>
> I also noted (for modern 13.4-RELEASE times):
>
> Also: the build is based on the -p2 source code (hash 3f40d5821):
>
> # strings boot/kernel/kernel | grep "\-RELEASE"
> @(#)FreeBSD 13.4-RELEASE-p2 3f40d5821 M5P
> FreeBSD 13.4-RELEASE-p2 3f40d5821 M5P
> 13.4-RELEASE-p2
>
> Because it is a rebuild, the kernel ends up with -p2 instead
> of the official -p1 ( from -p2 not updating boot/kernel/kernel
> in the official distributions ).
>
>
>
> ===
> Mark Millard
> marklmi at yahoo.com
>
>

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de

Mark Johnston

2024-12-21 18:47:40 UTC

Permalink

On Sat, Dec 21, 2024 at 08:17:12PM +0200, Konstantin Belousov wrote:
> On Sat, Dec 21, 2024 at 05:03:45PM +0000, Bjoern A. Zeeb wrote:
> > Hi,
> >
> > upon boot we display our physical memory junks nicely.
> >
> > Physical memory chunk(s):
> > 0x0000000000001000 - 0x000000000009ffff, 651264 bytes (159 pages)
> > 0x0000000000101000 - 0x0000000012df5fff, 315576320 bytes (77045 pages)
> > 0x0000000013c00000 - 0x0000000013d87fff, 1605632 bytes (392 pages)
> > 0x0000000016401000 - 0x000000001e920fff, 139591680 bytes (34080 pages)
> > 0x000000001eb1b000 - 0x000000001f73afff, 12713984 bytes (3104 pages)
> >
> >
> > Do we have any way on a running system to export some statistics of
> > how much each of them is used up? Something like [Use, Requests]?
> Look at vm.phys_segs. These are segments of physical memory as seen
> by the phys allocator (vm_phys.c). It is mostly the same chunks as were
> reported e.g. by UEFI parser at boot, but after the kernel initial memory
> and other stuff took some pages before vm was initialized.
>
> >
> >
> > Say one wanted to debug the good old lower 4GB contigmalloc failing
> > problem (and example but also something I am just facing again).
> > How would one do that? The immediate questions are:
> > (a) how much of the available lower physical 4G chunks are used?
> > (b) how much fragmentation is there roughly?
> > (c) what's the largest contig. chunk avail?
> >
> Then look at vm.phys_free. First, there are free lists. On amd64, freelist
> 0 is the normal freelist (all pages except freelist 1), and freelist 1 is
> the pages below 4G (ie. DMA32). See sys/amd64/include/vmparam.h for
> definitions and comments.

This isn't always true: we only create a dedicated DMA32 freelist on
systems with >= VM_DMA32_NPAGES_THRESHOLD pages of physical memory,
which is 64GB by default. So on smaller machines one will only see two
freelists, default and ISA DMA, instead of three.

I think the reason is that static allocations during boot (e.g., the VM
page array) can possibly deplete all physical RAM below 4GB, so the
explicit freelist helps reserve memory needed for DMA. There is some
overhead in searching multiple freelists when free pages are scarce, so
there is perhaps some performance penalty in creating the DMA32 freelist
unconditionally. Maybe the threshold should be overridable by a
tunable.

> The sysctl reports the number of free pages clustered by the contigous
> order. Pools are mostly internal to vm, they allow to distinguish allocs
> from direct map (e.g. for UMA page allocs) vs. generic contig allocs.

The intent of the direct freepool, I believe, is to cluster allocations
of pages accessed solely via the direct map, i.e., small UMA allocs and
page table pages, in order to improve TLB efficiency.

> People actively working with allocators could correct me.
>
> > Given (c) is likely harder and expensive (a) and (b) could at least
> > give an idea. Did we ever solve that?
>
> vm.phys_free should give the answer up to the supported order size.
>
> That said, a radical solution for the problem of memory below 4G is
> to use IOMMU. It might be enabled just for specific PCI device.
>

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de

Mohammad Noureldin

2024-12-25 15:14:19 UTC

Permalink

Hi Tomoaki,

1st of all, thanks a lot for sharing your notes and insights helping to
enrich the discussion

On Sun, Dec 22, 2024 at 7:25â¯PM Tomoaki AOKI <***@dec.sakura.ne.jp>
wrote:

> On Fri, 20 Dec 2024 17:04:20 +0100
> Mohammad Noureldin <***@thelightbird.com> wrote:
>

<snip>

>
> Hi.
> Thanks for the pointer.
>
> Some notes:
> *hselasky@ (RIP, Hans...) suffered from the lack of official and open
> hardware specs for implementing his driver.
>
> *hselasky@ wasn't going to implement TB3's security features
> as it seemed NOT to be sufficient for actual security in spec itself.
>
> *I have Titan Ridge TB3 controller (falles back to Intel Cannon Lake
> USB 3.1 controller when TB3 device is not connected) in my ThinkPad
> P52, but hselasky@'s last driver didn't work for it, as his driver
> only supports older generation of controller chipset (with the lack
> of information).
>
> *P52 recognizes TB3-only device (Samsung X5 external SSD I've
> purchased for test) as internal PCIe drive if it was attached
> before powerint on the computer, but hot deplugging causes panic,
> with and without hselasky@'s driver installed.
>
> *My related info are in Bug 237666 [1].
>
> [1] https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=237666

Believe or not, I was in touch with HPS (may he RIP) over this specific PR
given my general interest in USB and I've even purchased 3 2nd hand
machines to help testing and get to the root cause:
- Dell Precision Tower 5810
-# Which proved not to be useful for this case
- Lenovo ThinkPad P52s
- Lenovo ThinkPad T480
-# Because Bjoern Zeeb (bz@) could reproduce the original reported issue on
this machine and he wrote down some notes about his findings in [1]

But that EuroBSDCon of year 2023 when I was planning to bring one or both
of these Lenovos with me to tinker on them with HPS, I cam to know that he
passed away before the EuroBSDCon's starting date ... RIP HPS.

I've added the *"Problem Reports to be Possibly Revisited"* section [2]
because in addition to [3] I've noticed other similar PR(s) where we can
revisit after a working driver is in place to either:
- Indicate that now this works
- Or, it still not working, we won't support it, and write this down in a
clear way for end users

Also to hopefully make the scope more clear, I've added a Scope sub-section
[4]

>
> --
> Tomoaki AOKI <***@dec.sakura.ne.jp>
>

Thanks again Tomoaki and happy holidays ð

[1] https://wiki.freebsd.org/BjoernZeeb/USB
[2]
https://wiki.freebsd.org/MohammadNoureldin/FreeBSDUSB4TBT3Support#Problem_Reports_To_Be_Possibly_Revisited
[3] https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=237666
[4] https://wiki.freebsd.org/MohammadNoureldin/FreeBSDUSB4TBT3Support#Scope

--
Thanks
- Mohammad Noureldin
--
"Life is like riding a bicycle. To keep your balance you must keep moving"
- Albert Einstein

Paul Floyd

2024-12-26 14:43:54 UTC

Permalink

On 26-12-24 13:57, Daniel Engberg wrote:
> You asked about this ~a year ago, not much has changed since last time.

I failed to overcome my inertia then, maybe this time.

> Is there anything new you're wondering about?

The main thing was having reasonably working graphics. Has that improved
at all since then?

> The only difference is that Zen 5 is a bit more efficient and have a few
> new AVX512 instructions compared to the older Zen 4 CPUs.

Having the latest instruction set would be a plus.

A+
Paul

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de

Paul Floyd

2024-12-26 15:14:41 UTC

Permalink

On 26-12-24 15:43, Paul Floyd wrote:
> On 26-12-24 13:57, Daniel Engberg wrote:

> The main thing was having reasonably working graphics. Has that improved
> at all since then?

To be a bit more specific, I'm currently thinking of a Ryzen 9 9900X and
an Asus ProArt X870E.

A+
Paul

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de

Daniel Engberg

2024-12-26 16:30:32 UTC

Permalink

On 2024-12-26T16:15:24.000+01:00, Paul Floyd <***@gmail.com>
wrote:

>Â OnÂ 26-12-24Â 15:43,Â PaulÂ FloydÂ wrote:
>>Â Â OnÂ 26-12-24Â 13:57,Â DanielÂ EngbergÂ wrote:
>Â
>>Â Â TheÂ mainÂ thingÂ wasÂ havingÂ reasonablyÂ workingÂ graphics.Â HasÂ that
>>Â Â improvedÂ
>>Â Â
>>Â Â Â atÂ allÂ sinceÂ then?
>Â
>Â ToÂ beÂ aÂ bitÂ moreÂ specific,Â I'mÂ currentlyÂ thinkingÂ ofÂ aÂ RyzenÂ 9Â 9900X
>Â andÂ
>Â
>Â anÂ AsusÂ ProArtÂ X870E.
>Â
>Â A+
>Â
>Â Paul

Hi,

Good board however there's very little that justifies the difference
in price between X670E and X870E variants (spend that on a cooler
instead). You can find a short summary
here,Â https://www.reddit.com/r/buildapc/comments/1fpk3cq/proart_x870ecreator_wifi_vs_proart_x670ecreator/
. Graphics is integratedÂ in the CPU so choice of motherboard doesn't
matter in that regard. Without looking too much it's probably going to
be a bit rough using 6.1 as a base. (My box is headless)

Best regards,

Daniel

Warner Losh

2024-12-27 13:59:53 UTC

Permalink

On Fri, Dec 27, 2024, 1:53â¯AM Emmanuel Vadot <***@bidouilliste.com> wrote:

> On Fri, 27 Dec 2024 09:43:22 +0100
> Emmanuel Vadot <***@bidouilliste.com> wrote:
>
> >
> > Hi,
> >
> > On Thu, 26 Dec 2024 23:55:24 +0100
> > Fernando ApesteguÃa <***@gmail.com> wrote:
> >
> > > I updated to today's current from a version from Nov 18th.
> > >
> > > I had these lines in loader.conf:
> > >
> > > iwm7265Dfw_load="YES"
> > > if_iwm_load="YES"
> > >
> > > And with those the kernel panics. Then I saw the entry 20241216 in
> > > UPDATING. Can't reproduce it here since I'm writing in my phone, but it
> > > says the iwm firmwares are now shipped as raw files.
> > > So, how can I get the firmware loaded to make my nic work?
> > >
> > > Cheers
> >
> > Without panic trace it's hard to really know what's going on but I
> > think that the problem is that the firmware wasn't loaded by loader and
> > iwm panics since root fs isn't there yet.
> > Just removing those lines will make iwm works again or you can use :
> > iwm7265Dfw_load="YES"
> > iwm7265Dfw_name="/boot/firmware/iwm7265Dfw"
> > iwm7265Dfw_type="firmware"
> >
> > Cheers,
> >
> > --
> > Emmanuel Vadot <***@bidouilliste.com> <***@freebsd.org>
> >
>
> https://reviews.freebsd.org/D48211 will help too.
>
> P.S.: Note that I don't understand why anyone wants to load wifi
> driver in loader, was it suggested somewhere at some point ?
>

Many people want to run minimal + their drivers and load them all from the
loader. With the firmware shift, we may need to defer drivers that need
firmware until after mountroot generally.

Warnet

--
> Emmanuel Vadot <***@bidouilliste.com> <***@freebsd.org>
>
>

Fernando Apesteguía

2024-12-27 16:22:16 UTC

Permalink

On Fri, Dec 27, 2024 at 3:00â¯PM Warner Losh <***@bsdimp.com> wrote:

>
>
> On Fri, Dec 27, 2024, 1:53â¯AM Emmanuel Vadot <***@bidouilliste.com>
> wrote:
>
>> On Fri, 27 Dec 2024 09:43:22 +0100
>> Emmanuel Vadot <***@bidouilliste.com> wrote:
>>
>> >
>> > Hi,
>> >
>> > On Thu, 26 Dec 2024 23:55:24 +0100
>> > Fernando ApesteguÃa <***@gmail.com> wrote:
>> >
>> > > I updated to today's current from a version from Nov 18th.
>> > >
>> > > I had these lines in loader.conf:
>> > >
>> > > iwm7265Dfw_load="YES"
>> > > if_iwm_load="YES"
>> > >
>> > > And with those the kernel panics. Then I saw the entry 20241216 in
>> > > UPDATING. Can't reproduce it here since I'm writing in my phone, but
>> it
>> > > says the iwm firmwares are now shipped as raw files.
>> > > So, how can I get the firmware loaded to make my nic work?
>> > >
>> > > Cheers
>> >
>> > Without panic trace it's hard to really know what's going on but I
>> > think that the problem is that the firmware wasn't loaded by loader and
>> > iwm panics since root fs isn't there yet.
>> > Just removing those lines will make iwm works again or you can use :
>> > iwm7265Dfw_load="YES"
>> > iwm7265Dfw_name="/boot/firmware/iwm7265Dfw"
>> > iwm7265Dfw_type="firmware"
>>
>
I just removed the lines from /boot/loader.conf and the NIC is up again,
thanks.
I can see these messages in dmesg though:

$ dmesg | grep iwm
iwm0: <Intel(R) Dual Band Wireless AC 7265> mem 0xdf200000-0xdf201fff irq
16 at device 0.0 on pci2
iwm0: hw rev 0x210, fw ver 22.361476.0, address 10:02:b5:5c:f3:d7
iwm0: <Intel(R) Dual Band Wireless AC 7265> mem 0xdf200000-0xdf201fff irq
16 at device 0.0 on pci2
iwm7265Dfw: could not load firmware image, error 2
iwm0: hw rev 0x210, fw ver 22.361476.0, address 10:02:b5:5c:f3:d7

> >
>> > Cheers,
>> >
>> > --
>> > Emmanuel Vadot <***@bidouilliste.com> <***@freebsd.org>
>> >
>>
>> https://reviews.freebsd.org/D48211 will help too.
>>
>> P.S.: Note that I don't understand why anyone wants to load wifi
>> driver in loader, was it suggested somewhere at some point ?
>>
>
> Many people want to run minimal + their drivers and load them all from the
> loader. With the firmware shift, we may need to defer drivers that need
> firmware until after mountroot generally.
>

In my case it's always been in loader.conf. I don't remember when I put it
there.

Would suggesting removing the load of the modules for iwm(4) worth
mentioning in UPDATING?

Cheers

>
> Warnet
>
> --
>> Emmanuel Vadot <***@bidouilliste.com> <***@freebsd.org>
>>
>>

Fernando Apesteguía

2024-12-27 16:25:15 UTC

Permalink

On Fri, Dec 27, 2024 at 5:22â¯PM Fernando ApesteguÃa <
***@gmail.com> wrote:

>
>
> On Fri, Dec 27, 2024 at 3:00â¯PM Warner Losh <***@bsdimp.com> wrote:
>
>>
>>
>> On Fri, Dec 27, 2024, 1:53â¯AM Emmanuel Vadot <***@bidouilliste.com>
>> wrote:
>>
>>> On Fri, 27 Dec 2024 09:43:22 +0100
>>> Emmanuel Vadot <***@bidouilliste.com> wrote:
>>>
>>> >
>>> > Hi,
>>> >
>>> > On Thu, 26 Dec 2024 23:55:24 +0100
>>> > Fernando ApesteguÃa <***@gmail.com> wrote:
>>> >
>>> > > I updated to today's current from a version from Nov 18th.
>>> > >
>>> > > I had these lines in loader.conf:
>>> > >
>>> > > iwm7265Dfw_load="YES"
>>> > > if_iwm_load="YES"
>>> > >
>>> > > And with those the kernel panics. Then I saw the entry 20241216 in
>>> > > UPDATING. Can't reproduce it here since I'm writing in my phone, but
>>> it
>>> > > says the iwm firmwares are now shipped as raw files.
>>> > > So, how can I get the firmware loaded to make my nic work?
>>> > >
>>> > > Cheers
>>> >
>>> > Without panic trace it's hard to really know what's going on but I
>>> > think that the problem is that the firmware wasn't loaded by loader and
>>> > iwm panics since root fs isn't there yet.
>>>
>>
The stack trace:
https://people.freebsd.org/~fernape/iwm.jpg

> > Just removing those lines will make iwm works again or you can use :
>>> > iwm7265Dfw_load="YES"
>>> > iwm7265Dfw_name="/boot/firmware/iwm7265Dfw"
>>> > iwm7265Dfw_type="firmware"
>>>
>>
> I just removed the lines from /boot/loader.conf and the NIC is up again,
> thanks.
> I can see these messages in dmesg though:
>
> $ dmesg | grep iwm
> iwm0: <Intel(R) Dual Band Wireless AC 7265> mem 0xdf200000-0xdf201fff irq
> 16 at device 0.0 on pci2
> iwm0: hw rev 0x210, fw ver 22.361476.0, address 10:02:b5:5c:f3:d7
> iwm0: <Intel(R) Dual Band Wireless AC 7265> mem 0xdf200000-0xdf201fff irq
> 16 at device 0.0 on pci2
> iwm7265Dfw: could not load firmware image, error 2
> iwm0: hw rev 0x210, fw ver 22.361476.0, address 10:02:b5:5c:f3:d7
>
>
>> >
>>> > Cheers,
>>> >
>>> > --
>>> > Emmanuel Vadot <***@bidouilliste.com> <***@freebsd.org>
>>> >
>>>
>>> https://reviews.freebsd.org/D48211 will help too.
>>>
>>> P.S.: Note that I don't understand why anyone wants to load wifi
>>> driver in loader, was it suggested somewhere at some point ?
>>>
>>
>> Many people want to run minimal + their drivers and load them all from
>> the loader. With the firmware shift, we may need to defer drivers that need
>> firmware until after mountroot generally.
>>
>
> In my case it's always been in loader.conf. I don't remember when I put it
> there.
>
> Would suggesting removing the load of the modules for iwm(4) worth
> mentioning in UPDATING?
>
> Cheers
>
>
>>
>> Warnet
>>
>> --
>>> Emmanuel Vadot <***@bidouilliste.com> <***@freebsd.org>
>>>
>>>

Warner Losh

2024-12-27 16:27:11 UTC

Permalink

On Fri, Dec 27, 2024, 9:22â¯AM Fernando ApesteguÃa <
***@gmail.com> wrote:

>
>
> On Fri, Dec 27, 2024 at 3:00â¯PM Warner Losh <***@bsdimp.com> wrote:
>
>>
>>
>> On Fri, Dec 27, 2024, 1:53â¯AM Emmanuel Vadot <***@bidouilliste.com>
>> wrote:
>>
>>> On Fri, 27 Dec 2024 09:43:22 +0100
>>> Emmanuel Vadot <***@bidouilliste.com> wrote:
>>>
>>> >
>>> > Hi,
>>> >
>>> > On Thu, 26 Dec 2024 23:55:24 +0100
>>> > Fernando ApesteguÃa <***@gmail.com> wrote:
>>> >
>>> > > I updated to today's current from a version from Nov 18th.
>>> > >
>>> > > I had these lines in loader.conf:
>>> > >
>>> > > iwm7265Dfw_load="YES"
>>> > > if_iwm_load="YES"
>>> > >
>>> > > And with those the kernel panics. Then I saw the entry 20241216 in
>>> > > UPDATING. Can't reproduce it here since I'm writing in my phone, but
>>> it
>>> > > says the iwm firmwares are now shipped as raw files.
>>> > > So, how can I get the firmware loaded to make my nic work?
>>> > >
>>> > > Cheers
>>> >
>>> > Without panic trace it's hard to really know what's going on but I
>>> > think that the problem is that the firmware wasn't loaded by loader and
>>> > iwm panics since root fs isn't there yet.
>>> > Just removing those lines will make iwm works again or you can use :
>>> > iwm7265Dfw_load="YES"
>>> > iwm7265Dfw_name="/boot/firmware/iwm7265Dfw"
>>> > iwm7265Dfw_type="firmware"
>>>
>>
> I just removed the lines from /boot/loader.conf and the NIC is up again,
> thanks.
> I can see these messages in dmesg though:
>
> $ dmesg | grep iwm
> iwm0: <Intel(R) Dual Band Wireless AC 7265> mem 0xdf200000-0xdf201fff irq
> 16 at device 0.0 on pci2
> iwm0: hw rev 0x210, fw ver 22.361476.0, address 10:02:b5:5c:f3:d7
> iwm0: <Intel(R) Dual Band Wireless AC 7265> mem 0xdf200000-0xdf201fff irq
> 16 at device 0.0 on pci2
> iwm7265Dfw: could not load firmware image, error 2
> iwm0: hw rev 0x210, fw ver 22.361476.0, address 10:02:b5:5c:f3:d7
>
>
>> >
>>> > Cheers,
>>> >
>>> > --
>>> > Emmanuel Vadot <***@bidouilliste.com> <***@freebsd.org>
>>> >
>>>
>>> https://reviews.freebsd.org/D48211 will help too.
>>>
>>> P.S.: Note that I don't understand why anyone wants to load wifi
>>> driver in loader, was it suggested somewhere at some point ?
>>>
>>
>> Many people want to run minimal + their drivers and load them all from
>> the loader. With the firmware shift, we may need to defer drivers that need
>> firmware until after mountroot generally.
>>
>
> In my case it's always been in loader.conf. I don't remember when I put it
> there.
>
> Would suggesting removing the load of the modules for iwm(4) worth
> mentioning in UPDATING?
>

Likely mention it as a workaround for this bug... loading from the loader
should be supported, but our firmware interface needs to defer loadings and
such until after mountroot. Sure, a lot can be done with devmatch, but that
doesn't mean we have to not support from loader...

Warner

Cheers
>
>
>>
>> Warnet
>>
>> --
>>> Emmanuel Vadot <***@bidouilliste.com> <***@freebsd.org>
>>>
>>>

Warner Losh

2024-12-27 17:51:32 UTC

Permalink

On Fri, Dec 27, 2024, 10:41â¯AM Tomoaki AOKI <***@dec.sakura.ne.jp>
wrote:

> On Fri, 27 Dec 2024 06:59:53 -0700
> Warner Losh <***@bsdimp.com> wrote:
>
> > On Fri, Dec 27, 2024, 1:53â¯AM Emmanuel Vadot <***@bidouilliste.com>
> wrote:
> >
> > > On Fri, 27 Dec 2024 09:43:22 +0100
> > > Emmanuel Vadot <***@bidouilliste.com> wrote:
> > >
> > > >
> > > > Hi,
> > > >
> > > > On Thu, 26 Dec 2024 23:55:24 +0100
> > > > Fernando ApesteguÃa <***@gmail.com> wrote:
> > > >
> > > > > I updated to today's current from a version from Nov 18th.
> > > > >
> > > > > I had these lines in loader.conf:
> > > > >
> > > > > iwm7265Dfw_load="YES"
> > > > > if_iwm_load="YES"
> > > > >
> > > > > And with those the kernel panics. Then I saw the entry 20241216 in
> > > > > UPDATING. Can't reproduce it here since I'm writing in my phone,
> but it
> > > > > says the iwm firmwares are now shipped as raw files.
> > > > > So, how can I get the firmware loaded to make my nic work?
> > > > >
> > > > > Cheers
> > > >
> > > > Without panic trace it's hard to really know what's going on but I
> > > > think that the problem is that the firmware wasn't loaded by loader
> and
> > > > iwm panics since root fs isn't there yet.
> > > > Just removing those lines will make iwm works again or you can use :
> > > > iwm7265Dfw_load="YES"
> > > > iwm7265Dfw_name="/boot/firmware/iwm7265Dfw"
> > > > iwm7265Dfw_type="firmware"
> > > >
> > > > Cheers,
> > > >
> > > > --
> > > > Emmanuel Vadot <***@bidouilliste.com> <***@freebsd.org>
> > > >
> > >
> > > https://reviews.freebsd.org/D48211 will help too.
> > >
> > > P.S.: Note that I don't understand why anyone wants to load wifi
> > > driver in loader, was it suggested somewhere at some point ?
> > >
>
> What I can only imagine is that the computer is network-booted and
> doesn't have wired network, thus, need WiFi driver and its firmware
> to mount actual remote root fs (rootfs provided by, i.e, pxe is quite
> minimalistic and need remounting / with actual one).
> But I don't wamt to configure in such a way.
>

That is one scenario...

> Many people want to run minimal + their drivers and load them all from the
> > loader. With the firmware shift, we may need to defer drivers that need
> > firmware until after mountroot generally.
>
> I'm recommending (mostly on forums.freebsd.org) keeping
> /boot/loader.conf as lean as possible (let modules only essential
> to boot kernel and/or mountroot in it) and let other modules to be
> loaded via /etc/rc.conf[.local].
>
> This is because tooooooooooo many panics [on computers and/or users]
> trying to load everything including huge monstors like GPU drivers,
> graphics/drm-*-kmod, x11/nvidia-driver* and graphics/nvidia-drm-*-kmod
> which can easily make staging area overflow especially with zfs.ko.
>

I thought we'd fixed all the overflow issues... I'm generally planning a
more agressive loading of modules as work progresses on loader devmatch.

And another idea for loader only firmwares (in cases loading via
> /etc/rc.conf[.local] is too late), introducng new variable for loader
> such as module_fs could help.
> It shall be any file system that loader can read and specified not
> only with partition/pool but also with directory relative to the fs
> root that contains firmware/ directory.
>
> For example, if
> Root on ZFS in local diks with pool name zroot,
> module filesystem dataset is MODULE/default,
> in MODULE/default, firmwares are in boot/firmware,
> module_fs is set to "zfs:zroot/MODULE/default/boot/", where "firmware/"
> directory exists.
>
> Treating ESP specially would ease UEFI boot, if possible.
> If ESP can be specified with "ESP:" and firmware/ directory is in
> EFI/freebsd/firmware/, mocule_fs could be "ESP:EFI/freebsd/".
>

I'm having trouble understanding what this solves that adding boot/firmware
to the boot path (and changing the default type based on path) gives you.

ESP can be ambiguous in some cases. loaddev isn't, though. It might be
more helpful to allow a limited set of env vars to be expanded in both the
load path and files to load. Lua lets us have the right kind of filtering
if we want it.

Warner

Not sure if there really be such a need or not.
> And could be difficult for BIOS boots (does size of loader allow?).
> But historically, I haven't heared of such an actual needs in pre-UEFI
> era. So I assume BIOS boots doesn't need such an extentions.
>

Bios loader gets no new features after the stable/14 feature set.

>
> >
> > Warnet
> >
> > --
> > > Emmanuel Vadot <***@bidouilliste.com> <***@freebsd.org>
>
>
> --
> Tomoaki AOKI <***@dec.sakura.ne.jp>
>

Chris Torek

2025-01-04 16:27:06 UTC

Permalink

On Sat, Jan 4, 2025 at 7:01 AM Peter 'PMc' Much
<***@citylink.dinoex.sub.org> wrote:
>> I'm swapping to a zfs mirror
>
> Well, You shouldn't do that.

Why not? Swapping to a *file* on zfs has obvious issues, but swapping
to a mirrored swap partition seems like it should be entirely safe. A
bit slow (double writes) but I spent $ on RAM rather than M.2 drives
on the theory that I can add those later as needed.

Chris

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de

Joe Schaefer

2025-01-04 17:27:36 UTC

Permalink

Intel and AMD cores are now heterogeneous compute engines. The idea that it
is useful to beat the crap of every core simultaneously for an extended
period of time is a really dumb idea for capacity testing now.

On Sat, Jan 4, 2025 at 11:27â¯AM Chris Torek <***@gmail.com> wrote:

> On Sat, Jan 4, 2025 at 7:01â¯AM Peter 'PMc' Much
> <***@citylink.dinoex.sub.org> wrote:
> >> I'm swapping to a zfs mirror
> >
> > Well, You shouldn't do that.
>
> Why not? Swapping to a *file* on zfs has obvious issues, but swapping
> to a mirrored swap partition seems like it should be entirely safe. A
> bit slow (double writes) but I spent $ on RAM rather than M.2 drives
> on the theory that I can add those later as needed.
>
> Chris
>
>

Peter 'PMc' Much

2025-01-04 17:53:02 UTC

Permalink

On Sat, Jan 04, 2025 at 12:27:36PM -0500, Joe Schaefer wrote:
! Intel and AMD cores are now heterogeneous compute engines. The idea that it
! is useful to beat the crap of every core simultaneously for an extended
! period of time is a really dumb idea for capacity testing now.

Hm. Not sure what exactly You mean. Also not sure /since when/ this
would apply, or to /which series/ of these cores. But I would think they
are indeed for beating the crap of every core simulatneously, over
days. And in my perception the main issue with this is where to move
the thermal.
But then also, I am using Xeon EP (and rather old ones, for that) -
and if these are not designed to do exactly that, then I really don't
know...

BTW, I am missing the 2025Q1 - they tend to get later and later. And
that is an issue here, because rebuilding yields me some 15 kWh
thermal in my hall, and I need some planning to cater for these.

cheerio,
PMc

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de

Peter 'PMc' Much

2025-01-04 17:29:31 UTC

Permalink

On Sat, Jan 04, 2025 at 08:27:06AM -0800, Chris Torek wrote:
! On Sat, Jan 4, 2025 at 7:01 AM Peter 'PMc' Much
! <***@citylink.dinoex.sub.org> wrote:
! >> I'm swapping to a zfs mirror
! >
! > Well, You shouldn't do that.
!
! Why not? Swapping to a *file* on zfs has obvious issues, but swapping
! to a mirrored swap partition seems like it should be entirely safe. A

A "mirrored swap partition" - that would be a zfs volume inside a zfs
pool which runs on some vdevs which happen to be mirrored, right?
I don't know of zfs itself having any notion of "partitions". It
supports volumes, and these have almost all the same features as
filesystems: checksumming, compression, txg buffering, logging,
snapshoting, etc.

So I tend to doubt such being safe. I can't give You logical proof
(it's more than ten years since I looked deeper into the zfs source),
but my belly feeling says there is so many creepy things going on
in the zfs layer nowadays (and very likely a bunch of undiscovered
bugs also), that one should avoid such a stack.
Also, the idea of paging into zfs got popular about the same time when
it got popular to normally not use swap at all, as lots of memory got
available. And while running a system with serious paging (into tens
of GB) is practical, it is probably not the usecase where we would
page into zfs.

A zfs vdev is logically just a fixed-length file - aka a raw partition.
Then above that thing is the zfs logic, with lots of caches. There
is not only the ARC where data must go thru, there is other dbuf
handling, there is more handling on the vdev layer, and all of that
needs some memory. (I looked into these various buffers when I patched
things so zfs gets a bit more NUMA-friendly - many of them use the
UMA allocator scheme, which again has it's own mechanics.)
Then above all this memory consuming stuff comes finally the kernel
that wants to pageout, and would expect the pageout going directly
onto a fixed-length file, aka a raw partition.
That doesn't look very sane to me, so what I am saying is: before
you spend time hunting this bug, give it a try with direct
raw-partition paging. At least then we know if it happens there also,
or not - and that helps narrowing the search.

! bit slow (double writes) but I spent $ on RAM rather than M.2 drives
! on the theory that I can add those later as needed.

It doesn't need superfast SSD, at least not for testing. Pageout
happens async, and while pagein stalls the concerned process, it is
read, and read should be faster.

cheerio,
PMc

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de

Karl Denninger

2025-01-04 17:36:20 UTC

Permalink

On 1/4/2025 12:30, Miroslav Lachman wrote:
> On 04/01/2025 17:27, Chris Torek wrote:
>> On Sat, Jan 4, 2025 at 7:01â¯AM Peter 'PMc' Much
>> <***@citylink.dinoex.sub.org> wrote:
>>>> I'm swapping to a zfs mirror
>>>
>>> Well, You shouldn't do that.
>>
>> Why not? Swapping to a *file* on zfs has obvious issues, but swapping
>> to a mirrored swap partition seems like it should be entirely safe. A
>> bit slow (double writes) but I spent $ on RAM rather than M.2 drives
>> on the theory that I can add those later as needed.
>
> Do you have swap on ZFS or not (in the first post you said "swapping
> to a zfs mirror")? If yes then it can cause the problems in memory
> pressure because system has no free memory but ZFS needs memory.
> If you want the swap on a mirror, then use separate partition on 2
> disks and add gmirror on top of them.
>
> Kind regards
> Miroslav Lachman
>
What's the argument for swapping to a mirror in the first place? If the
issue is throughput IMHO the answer is to swap to multiple devices.

Putting swap through a filesystem abstraction as noted above runs the
risk of requiring an allocation of RAM for the operation to complete
when you're desperately short on it in the first place resulting in
serious trouble.

--
Karl Denninger
***@denninger.net
/The Market Ticker/
/[S/MIME encrypted email preferred]/

Craig Leres

2025-01-04 18:08:33 UTC

Permalink

On 1/4/25 09:36, Karl Denninger wrote:
> What's the argument for swapping to a mirror in the first place? If the
> issue is throughput IMHO the answer is to swap to multiple devices.

I thought the idea here is if your swap is not redundant and it goes
away the system crashes (or otherwise misbehaves)?

I used a beefy poudriere build server (48 cores, 192G, 6 x 480GB nvme)
and I build > 1000 packages daily. I see lots of build failures (usually
2-3 a week) but instead of core dumps I see "Killed" in the poudriere
logs. Appended is one from today. (Not sure what's up with the symlink
failure at the very end...)

Craig

=>> Building www/mod_wsgi4
build started at Wed Jan 1 01:53:31 PST 2025
port directory: /usr/ports/www/mod_wsgi4
package name: ap24-py311-mod_wsgi-4.9.2
building for: FreeBSD zinc.ee.lbl.gov 14.2-RELEASE FreeBSD 14.2-RELEASE
amd64
maintained by: ***@douglasthrift.net
Makefile datestamp: -rw-r--r-- 1 root wheel 1139 Jun 28 2023
/usr/ports/www/mod_wsgi4/Makefile
Ports top last git commit: eda5eaff66d
Ports top unclean checkout: yes
Port dir last git commit: 3d9a815d9c5
Port dir unclean checkout: no
Poudriere version: poudriere-git-3.4.2
Host OSVERSION: 1402000
Jail OSVERSION: 1402000
Job Id: 39
[...]
===> License APACHE20 accepted by the user
Killed
ln:
/usr/local/poudriere/data/logs/bulk/14release-current-patched/2025-01-01_00h00m12s/logs/errors/ap24-py311-mod_wsgi-4.9.2.log:
File exists
build of www/***@py311 | ap24-py311-mod_wsgi-4.9.2 ended at Wed
Jan 1 01:54:28 PST 2025
!!! build failure encountered !!!

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de

Craig Leres

2025-01-04 21:11:05 UTC

Permalink

On 1/4/25 10:40, Peter 'PMc' Much wrote:
> There might be a corresponding message in the system log for the
> "Killed", that would at least tell us the signal, maybe more.

I wish. One minute before I see:

Jan 4 01:52:10 zinc.ee.lbl.gov kernel: : pid 21007 (conftest), jid
3845, uid 0: exited on signal 11 (core dumped)

but I see a lot of those and I believe they're normal and autoconf related.

On 1/4/25 10:47, Miroslav Lachman wrote:
> Redundancy in case of disk failure. I've seen many disk failures over
> the years, and when you have a swap concatenated from partitions on 2
> disks, the whole system mirrored on 2 disks, and 1 disk fails, the
> system crashes. That's why we've always used swap over gmirrored
> partitions. Then we can pull out one disk when system is running and
> replace it with a new disk.

This. And in my case the system is located in a data center that I do
not have casual access to so I'm happy to make the tradeoff in favor of
system resilience.

Craig

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de

Karl Denninger

2025-01-04 21:49:26 UTC

Permalink

On 1/4/2025 16:11, Craig Leres wrote:
> On 1/4/25 10:40, Peter 'PMc' Much wrote:
>> There might be a corresponding message in the system log for the
>> "Killed", that would at least tell us the signal, maybe more.
>
> I wish. One minute before I see:
>
> Â Â Â JanÂ 4 01:52:10 zinc.ee.lbl.gov kernel: : pid 21007 (conftest),
> jid 3845, uid 0: exited on signal 11 (core dumped)
>
> but I see a lot of those and I believe they're normal and autoconf
> related.
>
> On 1/4/25 10:47, Miroslav Lachman wrote:
> > Redundancy in case of disk failure. I've seen many disk failures over
> > the years, and when you have a swap concatenated from partitions on 2
> > disks, the whole system mirrored on 2 disks, and 1 disk fails, the
> > system crashes. That's why we've always used swap over gmirrored
> > partitions. Then we can pull out one disk when system is running and
> > replace it with a new disk.
>
> This. And in my case the system is located in a data center that I do
> not have casual access to so I'm happy to make the tradeoff in favor
> of system resilience.
>
> Â Â Â Â Â Â Â Craig
>
That's a reasonable justification for mirroring the swap (despite the
fact that you are then doing two I/Os on a write to it instead of one,
albeit on different devices) but then I'd still get it out of ZFS'
domain and run the swap to a bare GPT partition under gmirror instead
simply because ZFS can demand a RAM allocation to do things and swap
space is by definition something used under significant memory pressure
which implies you can need it within ZFS and not be able to get it.

--
Karl Denninger
***@denninger.net
/The Market Ticker/
/[S/MIME encrypted email preferred]/

Craig Leres

2025-01-04 22:03:53 UTC

Permalink

On 1/4/25 13:49, Karl Denninger wrote:
> That's a reasonable justification for mirroring the swap (despite the
> fact that you are then doing two I/Os on a write to it instead of one,
> albeit on different devices) but then I'd still get it out of ZFS'
> domain and run the swap to a bare GPT partition under gmirror instead
> simply because ZFS can demand a RAM allocation to do things and swap
> space is by definition something used under significant memory pressure
> which implies you can need it within ZFS and not be able to get it.

Chris said he was "swapping to a zfs mirror" but I'm swapping to a gmirror.

Craig

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de

Chris Torek

2025-01-05 05:33:04 UTC

Permalink

On Sat, Jan 4, 2025 at 2:04 PM Craig Leres <***@freebsd.org> wrote:
> Chris said he was "swapping to a zfs mirror" but I'm swapping to a gmirror.

I thought I was, but now I'm not sure any more :-) (I set this up many
months ago). Poking around, it looks like I have a three way geom
mirror (why 3? I have no idea, well, some vague idea, but, well...).

I have `/dev/mirror/swap` as the swapon device in `/etc/fstab`, and
`gmirror list` says:

Geom name: swap
State: COMPLETE
Components: 3
Balance: load
Slice: 4096
Flags: NONE
GenID: 0
SyncID: 1
ID: 777312945
Type: AUTOMATIC
Providers:
1. Name: mirror/swap
Mediasize: 137438952960 (128G)
Sectorsize: 512
Stripesize: 4096
Stripeoffset: 0
Mode: r1w1e0
Consumers:
1. Name: ada2p3
Mediasize: 137438953472 (128G)
Sectorsize: 512
Stripesize: 4096
Stripeoffset: 0
Mode: r1w1e1
State: ACTIVE
Priority: 2
Flags: NONE
GenID: 0
SyncID: 1
ID: 466179938
[repeat for ada1p3 and ada0p3 but with priorities counting down to 0
and with different IDs.]

So I guess I'm swapping to a geom mirror. I suspect it has three
drives in it because I was originally thinking I had to swap to a zfs
vdev mirror.

You can stop here, but if not: MORE DETAILS

I originally put four drives in the machine (3 8TB + 1 old 2 TB), and
only three were working (ada0 through 2), so I set up a three-way zfs
file system on it. The fourth one worked on Linux but not on FreeBSD.
Eventually I moved it to a different channel and it started working
fine and I still don't know what that's all about, though there were
several kernel updates along the way (patch releases of FreeBSD 14) as
well.

At some point I discovered you can't crash-dump to a mirror, so I set
up a 64 GB partition on ada3 and made it the dump device. It might not
be working, I haven't forcibly tested it yet. Worse, `gpart show -l`
shows it as `swap2`, a duplicate of ada2's `swap2` except half the
size (??). It shows up fine as /dev/label/crash though.

`geom swap list -a` says:

Geom name: swap
Consumers:
1. Name: mirror/swap
Mediasize: 137438952960 (128G)
Sectorsize: 512
Stripesize: 4096
Stripeoffset: 0
Mode: r1w1e0

Strippng `geom -t` output a bit (egrep disk\|swap):

ada3p2 LABEL gpt/swap2
gpt/swap2 DEV
swap MIRROR mirror/swap
mirror/swap DEV
swap SWAP
swap MIRROR mirror/swap
mirror/swap DEV
swap SWAP
swap MIRROR mirror/swap
mirror/swap DEV
swap SWAP
swap.sync MIRROR

Here's `gpart show -p | grep swap`:

2048 134217728 ada3p2 freebsd-swap (64G)
534528 268435456 ada2p3 freebsd-swap (128G)
534528 268435456 ada1p3 freebsd-swap (128G)
534528 268435456 ada0p3 freebsd-swap (128G)

and `gpart show -l | grep swap`:

2048 134217728 2 swap2 (64G)
534528 268435456 3 swap2 (128G)
534528 268435456 3 swap1 (128G)
534528 268435456 3 swap0 (128G)

I went straight from an original FreeBSD 9 setup with an ancient MBR
(persisting unchanged through several upgrades) to FreeBSD 14 (now 15)
at home, so I get a bit lost between gpart, geom, and zfs/zpool
differences at times.

Chris

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de

Manas Ghandat

2025-01-05 08:55:45 UTC

Permalink

Hi,

I was interested in the syzkaller (
https://wiki.freebsd.org/SummerOfCodeIdeas#syzkaller_improvements) project.
I wanted to know if there is stuff yet to be implemented so that I can add
to it.

Thanks,
Manas

On Sun, Jan 5, 2025 at 11:17â¯AM Ð¯ÑÐŸÑÐ»Ð°Ð² ÐÐ°ÑÐºÐŸ <***@gmail.com>
wrote:

> hi, Manas,
>
> if there has been a project, on GSoC, and you plan working on that ideas,
> then what are those ideas you talk about?
>
> all wishes
>
> ÑÐ±, 4 ÑÐœÐ². 2025 Ð³., 21:54 Manas Ghandat <***@gmail.com>:
>
>> Hi,
>>
>> I am Manas and I am interested in working on the syzkaller project ideas
>> mentioned at
>> https://wiki.freebsd.org/SummerOfCodeIdeas#syzkaller_improvements
>>
>> There has been a GSoC project in 2021 regarding the same. I wanted to
>> know if this idea is implemented or if some parts are yet to be implemented
>>
>> Thanks,
>> Manas
>>
>

George Mitchell

2025-01-06 14:18:26 UTC

Permalink

On 1/6/25 09:00, Gleb Popov wrote:
> On Mon, Jan 6, 2025 at 4:05â¯PM George Mitchell <george+***@m5p.com> wrote:
>>[...]
>> Now in fact I have net/asio installed, and /usr/local/include/asio.hpp
>> exists. But attempting to build abiword stops and complains:
>> Does not build with asio from system
>> As far as I can tell, no such thing as "asio from system" even exists.
>
> The "system" term here means "installed into system-wide prefix" or in
> other words "coming from pkg install".
> The antonym to "system" in this context is "bundled". I presume that
> Abiword has an option to build with asio that comes together with
> Abiword itself.
>
So ... do I understand that the way to solve this problem is,
counterintuitively. to UNINSTALL asio?? According to the pkg database,
asio was only installed because abiword depends on it!! I'll try it,
though.

Guess what happened. There was no "BROKEN" complaint to begin with,
but the first thing that happened was that building editors/abiword
caused net/asio to be built and reinstalled. At that point, the build
returned to editors/abiword, and now it said:

abiword-3.0.5_11 is marked as broken: Does not build with asio from
system Does not build with asio from system.
*** Error code 1

So something is wrong with the Makefile as it stands, and I have no
clue on how to fix it. Thanks for your attention. -- George

Gleb Popov

2025-01-06 14:38:49 UTC

Permalink

On Mon, Jan 6, 2025 at 5:18 PM George Mitchell <george+***@m5p.com> wrote:
>
> On 1/6/25 09:00, Gleb Popov wrote:
> > On Mon, Jan 6, 2025 at 4:05 PM George Mitchell <george+***@m5p.com> wrote:
> >>[...]
> >> Now in fact I have net/asio installed, and /usr/local/include/asio.hpp
> >> exists. But attempting to build abiword stops and complains:
> >> Does not build with asio from system
> >> As far as I can tell, no such thing as "asio from system" even exists.
> >
> > The "system" term here means "installed into system-wide prefix" or in
> > other words "coming from pkg install".
> > The antonym to "system" in this context is "bundled". I presume that
> > Abiword has an option to build with asio that comes together with
> > Abiword itself.
> >
> So ... do I understand that the way to solve this problem is,
> counterintuitively. to UNINSTALL asio?? According to the pkg database,
> asio was only installed because abiword depends on it!! I'll try it,
> though.

The

COLSERVICE_BUILD_DEPENDS= ${LOCALBASE}/include/asio.hpp:net/asio

makes Abiword depend on asio if the COLSERVICE option is turned on.

> Guess what happened. There was no "BROKEN" complaint to begin with,
> but the first thing that happened was that building editors/abiword
> caused net/asio to be built and reinstalled. At that point, the build
> returned to editors/abiword, and now it said:
>
> abiword-3.0.5_11 is marked as broken: Does not build with asio from
> system Does not build with asio from system.
> *** Error code 1
>
> So something is wrong with the Makefile as it stands, and I have no
> clue on how to fix it. Thanks for your attention. -- George

In the same vein, the

COLSERVICE_BROKEN= Does not build with asio from system

works just like plain BROKEN= knob, but only when COLSERVICE option is enabled.

You need to turn this option off (or just remove COLSERVICE_BROKEN and
see what exactly breaks in this case).

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de

George Mitchell

2025-01-06 15:13:21 UTC

Permalink

On 1/6/25 09:38, Gleb Popov wrote:
> [...]
> In the same vein, the
>
> COLSERVICE_BROKEN= Does not build with asio from system
>
> works just like plain BROKEN= knob, but only when COLSERVICE option is enabled.
>
> You need to turn this option off (or just remove COLSERVICE_BROKEN and
> see what exactly breaks in this case).
>

Ah! It is so obvious in the light of understanding. I thank you for
explaining it to me! I don't remember having set any collaboration
options when I first installed the package, and I'm surprised that my
options are any different from the defaults. But it was a long time
back that I first installed the package.

I greatly appreciate your gracious assistance. -- George

Warner Losh

2025-01-09 00:03:23 UTC

Permalink

On Wed, Jan 8, 2025 at 3:20â¯PM Mark Johnston <***@freebsd.org> wrote:

> On Wed, Jan 08, 2025 at 02:51:31PM -0700, Warner Losh wrote:
> > On Wed, Jan 8, 2025 at 2:31â¯PM Mark Johnston <***@freebsd.org> wrote:
> >
> > > The global "ticks" variable counts hardclock ticks, it's widely used in
> > > the kernel for low-precision timekeeping. The linuxkpi provides a very
> > > similar variable, "jiffies", but there's an incompatibility: the former
> > > is a signed int and the latter is an unsigned long. It's not
> > > particularly easy to paper over this difference, which has been
> > > responsible for some nasty bugs, and modifying drivers to store the
> > > jiffies value in a signed int is error-prone and a maintenance burden
> > > that the linuxkpi is supposed to avoid.
> > >
> > > It would be nice to provide a compatible implementation of jiffies. I
> > > can see a few approaches:
> > > - Define a 64-bit ticks variable, say ticks64, and make hardclock()
> > > update both ticks and ticks64. Then #define jiffies ticks64 on
> 64-bit
> > > platforms. This is the simplest to implement, but it adds extra work
> > > to hardclock() and is somewhat ugly.
> > > - Make ticks an int64_t or a long and convert our native code
> > > accordingly. This is cleaner but requires a lot of auditing to avoid
> > > introducing bugs, though perhaps some code could be left unmodified,
> > > implicitly truncating the value to an int. For example I think
> > > sched_pctcpu_update() is fine. I've gotten an amd64 kernel to
> compile
> > > and boot with this change, but it's hard to be confident in it. This
> > > approach also has the potential downside of bloating structures that
> > > store a ticks value, and it can't be MFCed.
> > > - Introduce a 64-bit ticks variable, ticks64, and
> > > #define ticks ((int)ticks64). This requires renaming any struct
> > > fields and local vars named "ticks", of which there's a decent
> number,
> > > but that can be done fairly mechanically.
> > >
> > > Is there another solution which avoids these pitfalls? If not, should
> > > we go ahead with one of these approaches? If so, which one?
> > >
> >
> > So solution (1) is MFC-able, I think, so I like it.
> > (2) Isn't, but is likely a better long-term solution.
> > (3) is a non-starter since ticks is too common a name to #define.
>
> Why is that a non-starter? This is just in the kernel, and as you note
> below, shadowing ticks isn't a great idea anyway. (I don't really want
> to go down this path in any case, but I'm wondering if I misunderstood
> something.)
>

I worry about it leaking to userland. And I worry about the third party
drivers
that can't tolerate it as a #define. That's all...

> > I could easily see a situation where we do (1) and then convert all
> current
> > users of ticks to be ticks64. This could proceed one at a time with as
> much
> > haste or caution as we need. Once we convert all of them over, we could
> > delete ticks and then there'd be no extra work in hardclock. This too
> would
> > be MFC-able.
> >
> > sys/net/iflib.c: uint64_t this_tick = ticks;
> > sys/netinet/tcp_subr.c: < (u_int)ticks))) {
> >
> > look fun! We also shadow it in a lot of places. The TCP stack uses it a
> lot
> > with a bunch of different variables, struct entries, etc, including RACK
> > and BBR.
> > The 802.11 stack uses it a bunch. As to a bunch of drivers, sometimes
> > shadowing
> > other times not.
> >
> > It would be a lot to audit all this, so I think having the new API in
> place
> > might be
> > better, and incrementally converting / removing the shadowing (even if it
> > isn't
> > completely in scoe, using ticks as a local variable is begging for
> trouble).
>
> Yeah, looking some more, I think having a flag day will make this too
> painful.
>
> So then I guess the question is, do we provide an int64_t ticks64 or a
> long ticksl? Do we have any 32-bit platforms where a 64-bit cmpset in
> hardclock() would be a problem?
>

I don't think so. I kinda like kib's notion too...

> > Warner
> >
> > Also I see both jiffies and jiffies_64 defined. Does that matter at all?
>
> They differ only on 32-bit systems I believe. On such systems there is
> a 64-bit tick counter, jiffies_64, but it might not be atomic.
>

Ah!

Warner

Rick Macklem

2025-01-10 14:57:34 UTC

Permalink

On Thu, Jan 9, 2025 at 8:14 PM Dan Shelton <***@gmail.com> wrote:
>
> On Fri, 10 Jan 2025 at 02:59, Rick Macklem <***@gmail.com> wrote:
> >
> > On Thu, Jan 9, 2025 at 4:31 PM Dan Shelton <***@gmail.com> wrote:
> > >
> > > Hello!
> > >
> > > Does FreeBSD nfsd support the WRITE_SAME request in NFSv4.1 mode?
> > Not at this time, I'm afraid.
> >
> > Maybe in a future release, rick
>
> How fast could it be implemented?
For the simplest version, not too long.
The simplest version would be synchronous only and use
VOP_WRITE() calls. Doing a new VOP_xxx() call to try and optimize
it per-fs would take quite a bit longer. (I know very little about ZFS, but just
maybe, block cloning would be useful for this?)

Btw, it is a NFSv4.2 call, so it would only be supported for NFSv4.2 and not
NFSv4.1.

rick

>
> Dan
> --
> Dan Shelton - Cluster Specialist Win/Lin/Bsd
>

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de

Warner Losh

2025-01-10 16:17:19 UTC

Permalink

On Fri, Jan 10, 2025 at 7:58â¯AM Rick Macklem <***@gmail.com> wrote:

> On Thu, Jan 9, 2025 at 8:14â¯PM Dan Shelton <***@gmail.com>
> wrote:
> >
> > On Fri, 10 Jan 2025 at 02:59, Rick Macklem <***@gmail.com>
> wrote:
> > >
> > > On Thu, Jan 9, 2025 at 4:31â¯PM Dan Shelton <***@gmail.com>
> wrote:
> > > >
> > > > Hello!
> > > >
> > > > Does FreeBSD nfsd support the WRITE_SAME request in NFSv4.1 mode?
> > > Not at this time, I'm afraid.
> > >
> > > Maybe in a future release, rick
> >
> > How fast could it be implemented?
> For the simplest version, not too long.
> The simplest version would be synchronous only and use
> VOP_WRITE() calls. Doing a new VOP_xxx() call to try and optimize
> it per-fs would take quite a bit longer. (I know very little about ZFS,
> but just
> maybe, block cloning would be useful for this?)
>

I had discussions years ago about adding a BIO_WRITE_SAME and whether or
not it made sense.
It's mildly helpful to just send one write command that writes all the LBAs
using the SCSI WRITE SAME
command. But it got hairy in a hurry and WRITE SAME is mostly only used to
unmap / trim blocks
anyway.

But only SCSI drives support this, and it's been too long to recall if they
all support writing multiple
blocks from the same OUT BUFFER or not. NVME doesn't have a similar concept
(just write zeros),
so I gave up on it. I didn't have a good use case for it.

Warner

> Btw, it is a NFSv4.2 call, so it would only be supported for NFSv4.2 and
> not
> NFSv4.1.
>
> rick
>
> >
> > Dan
> > --
> > Dan Shelton - Cluster Specialist Win/Lin/Bsd
> >
>
>

Rick Macklem

2025-01-10 15:31:32 UTC

Permalink

On Fri, Jan 10, 2025 at 7:02 AM Alan Somers <***@freebsd.org> wrote:
>
> On Thu, Jan 9, 2025 at 5:31 PM Dan Shelton <***@gmail.com> wrote:
> >
> > Hello!
> >
> > Does FreeBSD nfsd support the WRITE_SAME request in NFSv4.1 mode?
Just fyi for other readers...
A normal NFSv4.2 write pushes the data down the wire to the server. A WRITE_SAME
pushes a description of a data block and a repetition count down the
wire to the server,
allowing a lot of repetitive data to be written without sending it all
down the wire
(at least that is my understanding;-).

I am not aware of any system call for this at this time, although
there is active discussion
of this on the Linux NFS mailing list.

rick

> >
> > Dan
> > --
> > Dan Shelton - Cluster Specialist Win/Lin/Bsd
>
> Out of curiosity, what is your use case?
>

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de

Rick Macklem

2025-01-10 23:24:42 UTC

Permalink

On Thu, Jan 9, 2025 at 8:14 PM Dan Shelton <***@gmail.com> wrote:
>
> On Fri, 10 Jan 2025 at 02:59, Rick Macklem <***@gmail.com> wrote:
> >
> > On Thu, Jan 9, 2025 at 4:31 PM Dan Shelton <***@gmail.com> wrote:
> > >
> > > Hello!
> > >
> > > Does FreeBSD nfsd support the WRITE_SAME request in NFSv4.1 mode?
> > Not at this time, I'm afraid.
> >
> > Maybe in a future release, rick
>
> How fast could it be implemented?
I will probably try to come up with at least a proof of concept patch soon.
Once I have that much, I can start to think of what a VOP_WRITESAME()
might look like. If possible, it would be nice to get this in 15, since it would
have to wait for 16 otherwise.

Dan, if you could create a bugzilla PR (bugs.freebsd.org) asking for this as a
new feature, at least it won't get completely forgotten and I can hang patches
there.

You could look at the release schedule for FreeBSD 15 to see what the
optimistic bound on an implementation of this is.

rick
ps: If you can discuss it, I would also like to hear something about the
use case. This can help justify the work.

>
> Dan
> --
> Dan Shelton - Cluster Specialist Win/Lin/Bsd
>

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de

Tomoaki AOKI

2025-01-11 19:35:43 UTC

Permalink

On Sat, 11 Jan 2025 11:34:06 -0500
Mark Johnston <***@freebsd.org> wrote:

> On Sat, Jan 11, 2025 at 01:11:06PM +0900, Tomoaki AOKI wrote:
> > On Wed, 8 Jan 2025 18:07:47 -0500
> > Mark Johnston <***@freebsd.org> wrote:
> >
> > > On Thu, Jan 09, 2025 at 12:18:48AM +0200, Konstantin Belousov wrote:
> > > > On Wed, Jan 08, 2025 at 04:31:16PM -0500, Mark Johnston wrote:
> > > > > The global "ticks" variable counts hardclock ticks, it's widely used in
> > > > > the kernel for low-precision timekeeping. The linuxkpi provides a very
> > > > > similar variable, "jiffies", but there's an incompatibility: the former
> > > > > is a signed int and the latter is an unsigned long. It's not
> > > > > particularly easy to paper over this difference, which has been
> > > > > responsible for some nasty bugs, and modifying drivers to store the
> > > > > jiffies value in a signed int is error-prone and a maintenance burden
> > > > > that the linuxkpi is supposed to avoid.
> > > > >
> > > > > It would be nice to provide a compatible implementation of jiffies. I
> > > > > can see a few approaches:
> > > > > - Define a 64-bit ticks variable, say ticks64, and make hardclock()
> > > > > update both ticks and ticks64. Then #define jiffies ticks64 on 64-bit
> > > > > platforms. This is the simplest to implement, but it adds extra work
> > > > > to hardclock() and is somewhat ugly.
> > > > > - Make ticks an int64_t or a long and convert our native code
> > > > > accordingly. This is cleaner but requires a lot of auditing to avoid
> > > > > introducing bugs, though perhaps some code could be left unmodified,
> > > > > implicitly truncating the value to an int. For example I think
> > > > > sched_pctcpu_update() is fine. I've gotten an amd64 kernel to compile
> > > > > and boot with this change, but it's hard to be confident in it. This
> > > > > approach also has the potential downside of bloating structures that
> > > > > store a ticks value, and it can't be MFCed.
> > > > > - Introduce a 64-bit ticks variable, ticks64, and
> > > > > #define ticks ((int)ticks64). This requires renaming any struct
> > > > > fields and local vars named "ticks", of which there's a decent number,
> > > > > but that can be done fairly mechanically.
> > > > >
> > > > > Is there another solution which avoids these pitfalls? If not, should
> > > > > we go ahead with one of these approaches? If so, which one?
> > > >
> > > > You cannot do this in C, but can in asm:
> > > > .data
> > > > .globl ticksl, ticks
> > > > .type ticksl, @object
> > > > .type ticks, @object
> > > > ticksl: .quad
> > > > .size ticksl, 8
> > > > ticks =ticksl /* for little-endian */
> > > > /* ticks =ticksl + 4 for big-endian */
> > > > .size ticks, 4
> > > >
> > > >
> > > > Then update only ticksl in the hardclock().
> > >
> > > I implemented your suggestion here: https://reviews.freebsd.org/D48383
> >
> > As this is already committed to main, commenting here instead of review
> > D48383.
> >
> > Maybe I'm too paranoid and overlooking something, but...
> >
> > *If "jiffies" in LinuxKPI is really unsigned, isn't there any
> > possibilities that relies on its value to be larger than
> > 0x7fffffffffffffff as a threshold?
> > (Yes, it should be silly and non-realistic, but theoretically
> > possible.)
>
> Ideally we would have
>
> #define jiffies ((unsigned long)ticksl)
>
> in the linuxkpi, but some Linux code uses "jiffies" as a struct field or
> local variable name, so this doesn't quite work.
>
> In practice, the value is usually assigned to an unsigned long or used
> as an operand where it would be implicitly promoted to an unsigned type,
> so we don't see any incompatibilities.
>
> When jiffies is an int, code like the following can misbehave:
>
> unsigned long remain, timeout = jiffies + const;
> ...
> remain = timeout - jiffies;
> if ((long)remain < 0)
> /* timed out */
>
> If (int)timeout and jiffies have different signs, as might happen close
> to a rollover, the comparison won't work as expected.
>
> Linux has some macros (time_after() etc.) which are supposed to be used
> instead of direct comparisons, but they're not always used.

So ticksl should better be unsigned long if there's no reason to keep
it signed, isn't it?

> > *Is anywhere checking carry (sign) bit for int on LP32?
> > Maybe it would be the reason if "jiffies" in LinuxKPI is really
> > unsigned.
>
> Could you provide an example of what you mean?

Not an example of code, but for example, when ticksl is at
0x7fffffffffffffff (positive value), ticks shoule be 0xffffffff
(negative value), if I read the diff correctly.
The same thing starts happening ticksl is at 0x0000000080000000 throug
0x00000000ffffffff and values alike. So signs (carry bits, usually the
leftmost bit of each) should be checked separately for ticksl and ticks.

Am I (hopefully) overlooking something?

--
Tomoaki AOKI <***@dec.sakura.ne.jp>

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de

Warner Losh

2025-01-11 20:12:24 UTC

Permalink

Why not have jiffiesjust be an alias for tickl at the assembler level, then
just have extern unsigned long jiffies; so the types match and we don't
have fragile macros? At the assembler level, long and unsigned long are the
sane for object definition.

Warner

On Sat, Jan 11, 2025, 12:36â¯PM Tomoaki AOKI <***@dec.sakura.ne.jp>
wrote:

> On Sat, 11 Jan 2025 11:34:06 -0500
> Mark Johnston <***@freebsd.org> wrote:
>
> > On Sat, Jan 11, 2025 at 01:11:06PM +0900, Tomoaki AOKI wrote:
> > > On Wed, 8 Jan 2025 18:07:47 -0500
> > > Mark Johnston <***@freebsd.org> wrote:
> > >
> > > > On Thu, Jan 09, 2025 at 12:18:48AM +0200, Konstantin Belousov wrote:
> > > > > On Wed, Jan 08, 2025 at 04:31:16PM -0500, Mark Johnston wrote:
> > > > > > The global "ticks" variable counts hardclock ticks, it's widely
> used in
> > > > > > the kernel for low-precision timekeeping. The linuxkpi provides
> a very
> > > > > > similar variable, "jiffies", but there's an incompatibility: the
> former
> > > > > > is a signed int and the latter is an unsigned long. It's not
> > > > > > particularly easy to paper over this difference, which has been
> > > > > > responsible for some nasty bugs, and modifying drivers to store
> the
> > > > > > jiffies value in a signed int is error-prone and a maintenance
> burden
> > > > > > that the linuxkpi is supposed to avoid.
> > > > > >
> > > > > > It would be nice to provide a compatible implementation of
> jiffies. I
> > > > > > can see a few approaches:
> > > > > > - Define a 64-bit ticks variable, say ticks64, and make
> hardclock()
> > > > > > update both ticks and ticks64. Then #define jiffies ticks64
> on 64-bit
> > > > > > platforms. This is the simplest to implement, but it adds
> extra work
> > > > > > to hardclock() and is somewhat ugly.
> > > > > > - Make ticks an int64_t or a long and convert our native code
> > > > > > accordingly. This is cleaner but requires a lot of auditing
> to avoid
> > > > > > introducing bugs, though perhaps some code could be left
> unmodified,
> > > > > > implicitly truncating the value to an int. For example I think
> > > > > > sched_pctcpu_update() is fine. I've gotten an amd64 kernel to
> compile
> > > > > > and boot with this change, but it's hard to be confident in
> it. This
> > > > > > approach also has the potential downside of bloating
> structures that
> > > > > > store a ticks value, and it can't be MFCed.
> > > > > > - Introduce a 64-bit ticks variable, ticks64, and
> > > > > > #define ticks ((int)ticks64). This requires renaming any
> struct
> > > > > > fields and local vars named "ticks", of which there's a decent
> number,
> > > > > > but that can be done fairly mechanically.
> > > > > >
> > > > > > Is there another solution which avoids these pitfalls? If not,
> should
> > > > > > we go ahead with one of these approaches? If so, which one?
> > > > >
> > > > > You cannot do this in C, but can in asm:
> > > > > .data
> > > > > .globl ticksl, ticks
> > > > > .type ticksl, @object
> > > > > .type ticks, @object
> > > > > ticksl: .quad
> > > > > .size ticksl, 8
> > > > > ticks =ticksl /* for little-endian */
> > > > > /* ticks =ticksl + 4 for big-endian */
> > > > > .size ticks, 4
> > > > >
> > > > >
> > > > > Then update only ticksl in the hardclock().
> > > >
> > > > I implemented your suggestion here:
> https://reviews.freebsd.org/D48383
> > >
> > > As this is already committed to main, commenting here instead of review
> > > D48383.
> > >
> > > Maybe I'm too paranoid and overlooking something, but...
> > >
> > > *If "jiffies" in LinuxKPI is really unsigned, isn't there any
> > > possibilities that relies on its value to be larger than
> > > 0x7fffffffffffffff as a threshold?
> > > (Yes, it should be silly and non-realistic, but theoretically
> > > possible.)
> >
> > Ideally we would have
> >
> > #define jiffies ((unsigned long)ticksl)
> >
> > in the linuxkpi, but some Linux code uses "jiffies" as a struct field or
> > local variable name, so this doesn't quite work.
> >
> > In practice, the value is usually assigned to an unsigned long or used
> > as an operand where it would be implicitly promoted to an unsigned type,
> > so we don't see any incompatibilities.
> >
> > When jiffies is an int, code like the following can misbehave:
> >
> > unsigned long remain, timeout = jiffies + const;
> > ...
> > remain = timeout - jiffies;
> > if ((long)remain < 0)
> > /* timed out */
> >
> > If (int)timeout and jiffies have different signs, as might happen close
> > to a rollover, the comparison won't work as expected.
> >
> > Linux has some macros (time_after() etc.) which are supposed to be used
> > instead of direct comparisons, but they're not always used.
>
> So ticksl should better be unsigned long if there's no reason to keep
> it signed, isn't it?
>
>
> > > *Is anywhere checking carry (sign) bit for int on LP32?
> > > Maybe it would be the reason if "jiffies" in LinuxKPI is really
> > > unsigned.
> >
> > Could you provide an example of what you mean?
>
> Not an example of code, but for example, when ticksl is at
> 0x7fffffffffffffff (positive value), ticks shoule be 0xffffffff
> (negative value), if I read the diff correctly.
> The same thing starts happening ticksl is at 0x0000000080000000 throug
> 0x00000000ffffffff and values alike. So signs (carry bits, usually the
> leftmost bit of each) should be checked separately for ticksl and ticks.
>
> Am I (hopefully) overlooking something?
>
> --
> Tomoaki AOKI <***@dec.sakura.ne.jp>
>
>

Mark Johnston

2025-01-11 22:40:39 UTC

Permalink

On Sat, Jan 11, 2025 at 01:12:24PM -0700, Warner Losh wrote:
> Why not have jiffiesjust be an alias for tickl at the assembler level, then
> just have extern unsigned long jiffies; so the types match and we don't
> have fragile macros? At the assembler level, long and unsigned long are the
> sane for object definition.

We certainly could. I guess Linux code which does something like

print("%lu\n", jiffies);

will be incomatible otherwise. Aside from that, I'm not sure if any
code would be affected by the difference in practice, but it's easy to
add an alias.

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de

Mark Johnston

2025-01-11 22:35:36 UTC

Permalink

On Sun, Jan 12, 2025 at 04:35:43AM +0900, Tomoaki AOKI wrote:
> On Sat, 11 Jan 2025 11:34:06 -0500
> Mark Johnston <***@freebsd.org> wrote:
>
> > On Sat, Jan 11, 2025 at 01:11:06PM +0900, Tomoaki AOKI wrote:
> > > On Wed, 8 Jan 2025 18:07:47 -0500
> > > Mark Johnston <***@freebsd.org> wrote:
> > >
> > > > On Thu, Jan 09, 2025 at 12:18:48AM +0200, Konstantin Belousov wrote:
> > > > > On Wed, Jan 08, 2025 at 04:31:16PM -0500, Mark Johnston wrote:
> > > > > > The global "ticks" variable counts hardclock ticks, it's widely used in
> > > > > > the kernel for low-precision timekeeping. The linuxkpi provides a very
> > > > > > similar variable, "jiffies", but there's an incompatibility: the former
> > > > > > is a signed int and the latter is an unsigned long. It's not
> > > > > > particularly easy to paper over this difference, which has been
> > > > > > responsible for some nasty bugs, and modifying drivers to store the
> > > > > > jiffies value in a signed int is error-prone and a maintenance burden
> > > > > > that the linuxkpi is supposed to avoid.
> > > > > >
> > > > > > It would be nice to provide a compatible implementation of jiffies. I
> > > > > > can see a few approaches:
> > > > > > - Define a 64-bit ticks variable, say ticks64, and make hardclock()
> > > > > > update both ticks and ticks64. Then #define jiffies ticks64 on 64-bit
> > > > > > platforms. This is the simplest to implement, but it adds extra work
> > > > > > to hardclock() and is somewhat ugly.
> > > > > > - Make ticks an int64_t or a long and convert our native code
> > > > > > accordingly. This is cleaner but requires a lot of auditing to avoid
> > > > > > introducing bugs, though perhaps some code could be left unmodified,
> > > > > > implicitly truncating the value to an int. For example I think
> > > > > > sched_pctcpu_update() is fine. I've gotten an amd64 kernel to compile
> > > > > > and boot with this change, but it's hard to be confident in it. This
> > > > > > approach also has the potential downside of bloating structures that
> > > > > > store a ticks value, and it can't be MFCed.
> > > > > > - Introduce a 64-bit ticks variable, ticks64, and
> > > > > > #define ticks ((int)ticks64). This requires renaming any struct
> > > > > > fields and local vars named "ticks", of which there's a decent number,
> > > > > > but that can be done fairly mechanically.
> > > > > >
> > > > > > Is there another solution which avoids these pitfalls? If not, should
> > > > > > we go ahead with one of these approaches? If so, which one?
> > > > >
> > > > > You cannot do this in C, but can in asm:
> > > > > .data
> > > > > .globl ticksl, ticks
> > > > > .type ticksl, @object
> > > > > .type ticks, @object
> > > > > ticksl: .quad
> > > > > .size ticksl, 8
> > > > > ticks =ticksl /* for little-endian */
> > > > > /* ticks =ticksl + 4 for big-endian */
> > > > > .size ticks, 4
> > > > >
> > > > >
> > > > > Then update only ticksl in the hardclock().
> > > >
> > > > I implemented your suggestion here: https://reviews.freebsd.org/D48383
> > >
> > > As this is already committed to main, commenting here instead of review
> > > D48383.
> > >
> > > Maybe I'm too paranoid and overlooking something, but...
> > >
> > > *If "jiffies" in LinuxKPI is really unsigned, isn't there any
> > > possibilities that relies on its value to be larger than
> > > 0x7fffffffffffffff as a threshold?
> > > (Yes, it should be silly and non-realistic, but theoretically
> > > possible.)
> >
> > Ideally we would have
> >
> > #define jiffies ((unsigned long)ticksl)
> >
> > in the linuxkpi, but some Linux code uses "jiffies" as a struct field or
> > local variable name, so this doesn't quite work.
> >
> > In practice, the value is usually assigned to an unsigned long or used
> > as an operand where it would be implicitly promoted to an unsigned type,
> > so we don't see any incompatibilities.
> >
> > When jiffies is an int, code like the following can misbehave:
> >
> > unsigned long remain, timeout = jiffies + const;
> > ...
> > remain = timeout - jiffies;
> > if ((long)remain < 0)
> > /* timed out */
> >
> > If (int)timeout and jiffies have different signs, as might happen close
> > to a rollover, the comparison won't work as expected.
> >
> > Linux has some macros (time_after() etc.) which are supposed to be used
> > instead of direct comparisons, but they're not always used.
>
> So ticksl should better be unsigned long if there's no reason to keep
> it signed, isn't it?

Well, I kept it signed since it's meant to be similar in usage to ticks.
With a signed counter, you can check test whether a value has passed by
looking at the sign of the difference between ticks(l) and that value
(modulo rollover). With an unsigned counter, you need some casting, as
in the example above.

> > > *Is anywhere checking carry (sign) bit for int on LP32?
> > > Maybe it would be the reason if "jiffies" in LinuxKPI is really
> > > unsigned.
> >
> > Could you provide an example of what you mean?
>
> Not an example of code, but for example, when ticksl is at
> 0x7fffffffffffffff (positive value), ticks shoule be 0xffffffff
> (negative value), if I read the diff correctly.
> The same thing starts happening ticksl is at 0x0000000080000000 throug
> 0x00000000ffffffff and values alike. So signs (carry bits, usually the
> leftmost bit of each) should be checked separately for ticksl and ticks.

That's true, but I can't see why any code would care about this?

> Am I (hopefully) overlooking something?
>
> --
> Tomoaki AOKI <***@dec.sakura.ne.jp>

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de

Tomoaki AOKI

2025-01-11 22:50:38 UTC

Permalink

On Sat, 11 Jan 2025 17:35:36 -0500
Mark Johnston <***@freebsd.org> wrote:

> On Sun, Jan 12, 2025 at 04:35:43AM +0900, Tomoaki AOKI wrote:
> > On Sat, 11 Jan 2025 11:34:06 -0500
> > Mark Johnston <***@freebsd.org> wrote:
> >
> > > On Sat, Jan 11, 2025 at 01:11:06PM +0900, Tomoaki AOKI wrote:
> > > > On Wed, 8 Jan 2025 18:07:47 -0500
> > > > Mark Johnston <***@freebsd.org> wrote:
> > > >
> > > > > On Thu, Jan 09, 2025 at 12:18:48AM +0200, Konstantin Belousov wrote:
> > > > > > On Wed, Jan 08, 2025 at 04:31:16PM -0500, Mark Johnston wrote:
> > > > > > > The global "ticks" variable counts hardclock ticks, it's widely used in
> > > > > > > the kernel for low-precision timekeeping. The linuxkpi provides a very
> > > > > > > similar variable, "jiffies", but there's an incompatibility: the former
> > > > > > > is a signed int and the latter is an unsigned long. It's not
> > > > > > > particularly easy to paper over this difference, which has been
> > > > > > > responsible for some nasty bugs, and modifying drivers to store the
> > > > > > > jiffies value in a signed int is error-prone and a maintenance burden
> > > > > > > that the linuxkpi is supposed to avoid.
> > > > > > >
> > > > > > > It would be nice to provide a compatible implementation of jiffies. I
> > > > > > > can see a few approaches:
> > > > > > > - Define a 64-bit ticks variable, say ticks64, and make hardclock()
> > > > > > > update both ticks and ticks64. Then #define jiffies ticks64 on 64-bit
> > > > > > > platforms. This is the simplest to implement, but it adds extra work
> > > > > > > to hardclock() and is somewhat ugly.
> > > > > > > - Make ticks an int64_t or a long and convert our native code
> > > > > > > accordingly. This is cleaner but requires a lot of auditing to avoid
> > > > > > > introducing bugs, though perhaps some code could be left unmodified,
> > > > > > > implicitly truncating the value to an int. For example I think
> > > > > > > sched_pctcpu_update() is fine. I've gotten an amd64 kernel to compile
> > > > > > > and boot with this change, but it's hard to be confident in it. This
> > > > > > > approach also has the potential downside of bloating structures that
> > > > > > > store a ticks value, and it can't be MFCed.
> > > > > > > - Introduce a 64-bit ticks variable, ticks64, and
> > > > > > > #define ticks ((int)ticks64). This requires renaming any struct
> > > > > > > fields and local vars named "ticks", of which there's a decent number,
> > > > > > > but that can be done fairly mechanically.
> > > > > > >
> > > > > > > Is there another solution which avoids these pitfalls? If not, should
> > > > > > > we go ahead with one of these approaches? If so, which one?
> > > > > >
> > > > > > You cannot do this in C, but can in asm:
> > > > > > .data
> > > > > > .globl ticksl, ticks
> > > > > > .type ticksl, @object
> > > > > > .type ticks, @object
> > > > > > ticksl: .quad
> > > > > > .size ticksl, 8
> > > > > > ticks =ticksl /* for little-endian */
> > > > > > /* ticks =ticksl + 4 for big-endian */
> > > > > > .size ticks, 4
> > > > > >
> > > > > >
> > > > > > Then update only ticksl in the hardclock().
> > > > >
> > > > > I implemented your suggestion here: https://reviews.freebsd.org/D48383
> > > >
> > > > As this is already committed to main, commenting here instead of review
> > > > D48383.
> > > >
> > > > Maybe I'm too paranoid and overlooking something, but...
> > > >
> > > > *If "jiffies" in LinuxKPI is really unsigned, isn't there any
> > > > possibilities that relies on its value to be larger than
> > > > 0x7fffffffffffffff as a threshold?
> > > > (Yes, it should be silly and non-realistic, but theoretically
> > > > possible.)
> > >
> > > Ideally we would have
> > >
> > > #define jiffies ((unsigned long)ticksl)
> > >
> > > in the linuxkpi, but some Linux code uses "jiffies" as a struct field or
> > > local variable name, so this doesn't quite work.
> > >
> > > In practice, the value is usually assigned to an unsigned long or used
> > > as an operand where it would be implicitly promoted to an unsigned type,
> > > so we don't see any incompatibilities.
> > >
> > > When jiffies is an int, code like the following can misbehave:
> > >
> > > unsigned long remain, timeout = jiffies + const;
> > > ...
> > > remain = timeout - jiffies;
> > > if ((long)remain < 0)
> > > /* timed out */
> > >
> > > If (int)timeout and jiffies have different signs, as might happen close
> > > to a rollover, the comparison won't work as expected.
> > >
> > > Linux has some macros (time_after() etc.) which are supposed to be used
> > > instead of direct comparisons, but they're not always used.
> >
> > So ticksl should better be unsigned long if there's no reason to keep
> > it signed, isn't it?
>
> Well, I kept it signed since it's meant to be similar in usage to ticks.
> With a signed counter, you can check test whether a value has passed by
> looking at the sign of the difference between ticks(l) and that value
> (modulo rollover). With an unsigned counter, you need some casting, as
> in the example above.
>
> > > > *Is anywhere checking carry (sign) bit for int on LP32?
> > > > Maybe it would be the reason if "jiffies" in LinuxKPI is really
> > > > unsigned.
> > >
> > > Could you provide an example of what you mean?
> >
> > Not an example of code, but for example, when ticksl is at
> > 0x7fffffffffffffff (positive value), ticks shoule be 0xffffffff
> > (negative value), if I read the diff correctly.
> > The same thing starts happening ticksl is at 0x0000000080000000 throug
> > 0x00000000ffffffff and values alike. So signs (carry bits, usually the
> > leftmost bit of each) should be checked separately for ticksl and ticks.
>
> That's true, but I can't see why any code would care about this?

While ticks is defined as (signed) int, it shoule be turnaround when it
reaches at 0x7fffffff (as incrementing it causes overflow).
Is ticks allowed to be minus value? My guess is that it is monotonic
counter.

> > Am I (hopefully) overlooking something?
> >
> > --
> > Tomoaki AOKI <***@dec.sakura.ne.jp>

--
Tomoaki AOKI <***@dec.sakura.ne.jp>

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de

Mark Johnston

2025-01-11 23:00:12 UTC

Permalink

On Sun, Jan 12, 2025 at 07:50:38AM +0900, Tomoaki AOKI wrote:
> On Sat, 11 Jan 2025 17:35:36 -0500
> Mark Johnston <***@freebsd.org> wrote:
>
> > On Sun, Jan 12, 2025 at 04:35:43AM +0900, Tomoaki AOKI wrote:
> > > On Sat, 11 Jan 2025 11:34:06 -0500
> > > Mark Johnston <***@freebsd.org> wrote:
> > >
> > > > On Sat, Jan 11, 2025 at 01:11:06PM +0900, Tomoaki AOKI wrote:
> > > > > On Wed, 8 Jan 2025 18:07:47 -0500
> > > > > Mark Johnston <***@freebsd.org> wrote:
> > > > >
> > > > > > On Thu, Jan 09, 2025 at 12:18:48AM +0200, Konstantin Belousov wrote:
> > > > > > > On Wed, Jan 08, 2025 at 04:31:16PM -0500, Mark Johnston wrote:
> > > > > > > > The global "ticks" variable counts hardclock ticks, it's widely used in
> > > > > > > > the kernel for low-precision timekeeping. The linuxkpi provides a very
> > > > > > > > similar variable, "jiffies", but there's an incompatibility: the former
> > > > > > > > is a signed int and the latter is an unsigned long. It's not
> > > > > > > > particularly easy to paper over this difference, which has been
> > > > > > > > responsible for some nasty bugs, and modifying drivers to store the
> > > > > > > > jiffies value in a signed int is error-prone and a maintenance burden
> > > > > > > > that the linuxkpi is supposed to avoid.
> > > > > > > >
> > > > > > > > It would be nice to provide a compatible implementation of jiffies. I
> > > > > > > > can see a few approaches:
> > > > > > > > - Define a 64-bit ticks variable, say ticks64, and make hardclock()
> > > > > > > > update both ticks and ticks64. Then #define jiffies ticks64 on 64-bit
> > > > > > > > platforms. This is the simplest to implement, but it adds extra work
> > > > > > > > to hardclock() and is somewhat ugly.
> > > > > > > > - Make ticks an int64_t or a long and convert our native code
> > > > > > > > accordingly. This is cleaner but requires a lot of auditing to avoid
> > > > > > > > introducing bugs, though perhaps some code could be left unmodified,
> > > > > > > > implicitly truncating the value to an int. For example I think
> > > > > > > > sched_pctcpu_update() is fine. I've gotten an amd64 kernel to compile
> > > > > > > > and boot with this change, but it's hard to be confident in it. This
> > > > > > > > approach also has the potential downside of bloating structures that
> > > > > > > > store a ticks value, and it can't be MFCed.
> > > > > > > > - Introduce a 64-bit ticks variable, ticks64, and
> > > > > > > > #define ticks ((int)ticks64). This requires renaming any struct
> > > > > > > > fields and local vars named "ticks", of which there's a decent number,
> > > > > > > > but that can be done fairly mechanically.
> > > > > > > >
> > > > > > > > Is there another solution which avoids these pitfalls? If not, should
> > > > > > > > we go ahead with one of these approaches? If so, which one?
> > > > > > >
> > > > > > > You cannot do this in C, but can in asm:
> > > > > > > .data
> > > > > > > .globl ticksl, ticks
> > > > > > > .type ticksl, @object
> > > > > > > .type ticks, @object
> > > > > > > ticksl: .quad
> > > > > > > .size ticksl, 8
> > > > > > > ticks =ticksl /* for little-endian */
> > > > > > > /* ticks =ticksl + 4 for big-endian */
> > > > > > > .size ticks, 4
> > > > > > >
> > > > > > >
> > > > > > > Then update only ticksl in the hardclock().
> > > > > >
> > > > > > I implemented your suggestion here: https://reviews.freebsd.org/D48383
> > > > >
> > > > > As this is already committed to main, commenting here instead of review
> > > > > D48383.
> > > > >
> > > > > Maybe I'm too paranoid and overlooking something, but...
> > > > >
> > > > > *If "jiffies" in LinuxKPI is really unsigned, isn't there any
> > > > > possibilities that relies on its value to be larger than
> > > > > 0x7fffffffffffffff as a threshold?
> > > > > (Yes, it should be silly and non-realistic, but theoretically
> > > > > possible.)
> > > >
> > > > Ideally we would have
> > > >
> > > > #define jiffies ((unsigned long)ticksl)
> > > >
> > > > in the linuxkpi, but some Linux code uses "jiffies" as a struct field or
> > > > local variable name, so this doesn't quite work.
> > > >
> > > > In practice, the value is usually assigned to an unsigned long or used
> > > > as an operand where it would be implicitly promoted to an unsigned type,
> > > > so we don't see any incompatibilities.
> > > >
> > > > When jiffies is an int, code like the following can misbehave:
> > > >
> > > > unsigned long remain, timeout = jiffies + const;
> > > > ...
> > > > remain = timeout - jiffies;
> > > > if ((long)remain < 0)
> > > > /* timed out */
> > > >
> > > > If (int)timeout and jiffies have different signs, as might happen close
> > > > to a rollover, the comparison won't work as expected.
> > > >
> > > > Linux has some macros (time_after() etc.) which are supposed to be used
> > > > instead of direct comparisons, but they're not always used.
> > >
> > > So ticksl should better be unsigned long if there's no reason to keep
> > > it signed, isn't it?
> >
> > Well, I kept it signed since it's meant to be similar in usage to ticks.
> > With a signed counter, you can check test whether a value has passed by
> > looking at the sign of the difference between ticks(l) and that value
> > (modulo rollover). With an unsigned counter, you need some casting, as
> > in the example above.
> >
> > > > > *Is anywhere checking carry (sign) bit for int on LP32?
> > > > > Maybe it would be the reason if "jiffies" in LinuxKPI is really
> > > > > unsigned.
> > > >
> > > > Could you provide an example of what you mean?
> > >
> > > Not an example of code, but for example, when ticksl is at
> > > 0x7fffffffffffffff (positive value), ticks shoule be 0xffffffff
> > > (negative value), if I read the diff correctly.
> > > The same thing starts happening ticksl is at 0x0000000080000000 throug
> > > 0x00000000ffffffff and values alike. So signs (carry bits, usually the
> > > leftmost bit of each) should be checked separately for ticksl and ticks.
> >
> > That's true, but I can't see why any code would care about this?
>
> While ticks is defined as (signed) int, it shoule be turnaround when it
> reaches at 0x7fffffff (as incrementing it causes overflow).
> Is ticks allowed to be minus value? My guess is that it is monotonic
> counter.

Yes, INT_MAX ticks elapse in approximately 25 days at 1000Hz. In fact,
ticks is initialized to INT_MAX - <small number> in subr_param.c so that
it wraps around shortly after boot, after which it is negative.

Kernel code should not care about the sign of ticks.

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de

Mark Johnston

2025-01-12 15:27:37 UTC

Permalink

On Sun, Jan 12, 2025 at 11:16:51AM +0900, Tomoaki AOKI wrote:
> Replying to ML only, as Mark's gmail address seems to block previous
> one.
>
> On Sat, 11 Jan 2025 18:00:12 -0500
> Mark Johnston <***@freebsd.org> wrote:
>
> > On Sun, Jan 12, 2025 at 07:50:38AM +0900, Tomoaki AOKI wrote:
> > > On Sat, 11 Jan 2025 17:35:36 -0500
> > > Mark Johnston <***@freebsd.org> wrote:
> > >
> > > > On Sun, Jan 12, 2025 at 04:35:43AM +0900, Tomoaki AOKI wrote:
> > > > > Not an example of code, but for example, when ticksl is at
> > > > > 0x7fffffffffffffff (positive value), ticks shoule be 0xffffffff
> > > > > (negative value), if I read the diff correctly.
> > > > > The same thing starts happening ticksl is at 0x0000000080000000 throug
> > > > > 0x00000000ffffffff and values alike. So signs (carry bits, usually the
> > > > > leftmost bit of each) should be checked separately for ticksl and ticks.
> > > >
> > > > That's true, but I can't see why any code would care about this?
> > >
> > > While ticks is defined as (signed) int, it shoule be turnaround when it
> > > reaches at 0x7fffffff (as incrementing it causes overflow).
> > > Is ticks allowed to be minus value? My guess is that it is monotonic
> > > counter.
> >
> > Yes, INT_MAX ticks elapse in approximately 25 days at 1000Hz. In fact,
> > ticks is initialized to INT_MAX - <small number> in subr_param.c so that
> > it wraps around shortly after boot, after which it is negative.
> >
> > Kernel code should not care about the sign of ticks.
>
> Thanks! I've overlooked it.
>
> BTW, does tickl restricted with INT_MAX, too? (In detail, although tickl
> has the type long, but actually the range of the values used are
> restricted with INT_MAX?)

No, that's the point of the change: the kernel now increments a counter
of type long, so it will eventually reach LONG_MAX. Existing code which
references ticks will still get a 32-bit value that behaves the same as
before.

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de

Tomoaki AOKI

2025-01-12 16:20:39 UTC

Permalink

On Sun, 12 Jan 2025 10:27:37 -0500
Mark Johnston <***@freebsd.org> wrote:

> On Sun, Jan 12, 2025 at 11:16:51AM +0900, Tomoaki AOKI wrote:
> > Replying to ML only, as Mark's gmail address seems to block previous
> > one.
> >
> > On Sat, 11 Jan 2025 18:00:12 -0500
> > Mark Johnston <***@freebsd.org> wrote:
> >
> > > On Sun, Jan 12, 2025 at 07:50:38AM +0900, Tomoaki AOKI wrote:
> > > > On Sat, 11 Jan 2025 17:35:36 -0500
> > > > Mark Johnston <***@freebsd.org> wrote:
> > > >
> > > > > On Sun, Jan 12, 2025 at 04:35:43AM +0900, Tomoaki AOKI wrote:
> > > > > > Not an example of code, but for example, when ticksl is at
> > > > > > 0x7fffffffffffffff (positive value), ticks shoule be 0xffffffff
> > > > > > (negative value), if I read the diff correctly.
> > > > > > The same thing starts happening ticksl is at 0x0000000080000000 throug
> > > > > > 0x00000000ffffffff and values alike. So signs (carry bits, usually the
> > > > > > leftmost bit of each) should be checked separately for ticksl and ticks.
> > > > >
> > > > > That's true, but I can't see why any code would care about this?
> > > >
> > > > While ticks is defined as (signed) int, it shoule be turnaround when it
> > > > reaches at 0x7fffffff (as incrementing it causes overflow).
> > > > Is ticks allowed to be minus value? My guess is that it is monotonic
> > > > counter.
> > >
> > > Yes, INT_MAX ticks elapse in approximately 25 days at 1000Hz. In fact,
> > > ticks is initialized to INT_MAX - <small number> in subr_param.c so that
> > > it wraps around shortly after boot, after which it is negative.
> > >
> > > Kernel code should not care about the sign of ticks.
> >
> > Thanks! I've overlooked it.
> >
> > BTW, does tickl restricted with INT_MAX, too? (In detail, although tickl
> > has the type long, but actually the range of the values used are
> > restricted with INT_MAX?)
>
> No, that's the point of the change: the kernel now increments a counter
> of type long, so it will eventually reach LONG_MAX. Existing code which
> references ticks will still get a 32-bit value that behaves the same as
> before.

Thanks. Will read related codes more deeper to understand once I can
take long enough time.

--
Tomoaki AOKI <***@dec.sakura.ne.jp>

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de

BERTRAND Joël

2025-01-12 19:44:46 UTC

Permalink

Chris Moerz a Ã©critÂ :
> On Sun, 12 Jan 2025, Ð¯ÑÐŸÑÐ»Ð°Ð² ÐÐ°ÑÐºÐŸ wrote:
>
>>
>> Somehow bulding drm-61-kmod from ports fixed the issue.
>>
>
> That's exactly it - you will need to build graphics/drm-61-kmod directly
> from ports. Clone the ports repo into /usr/ports, i.e. run as root:
>
> $ cd /usr/ports
> $ git clone https://github.com/freebsd/freebsd-ports /usr/ports
> $ cd graphics/drm-61-kmod
> $ make -j${NCPU}
> $ make reinstall
>
> replace ${NCPU} with number of cores of your system.
>
> Right now, the binary packages do not line up properly with minor
> releases; there is work underway to improve this, but it's not there yet.
>
> You will have to redo this potentially every time you do a release
> upgrade, until the fix is in place.
>
> Let me know if you have any further questions.
> chris

OK. I will try to do a portsnap auto and rebuild package from sources.

Thanks,

JB

Warner Losh

2025-01-16 02:09:49 UTC

Permalink

Most of that work would be to adopt qemu to use vmm, and then fix the
structural issue in vmm that makes using it from qemu awkward.

Warner

On Wed, Jan 15, 2025, 7:06â¯PM Mario Marietto <***@gmail.com> wrote:

> You could try to patch or rewrite bhyve so that it can accelerate qemu. We
> need this.
>
> On Wed, Jan 15, 2025 at 10:06â¯PM Abhinav Chavali <
> ***@gmail.com> wrote:
>
>> Hello,
>> I am a student at the Pennsylvania State University looking to contribute
>> to FreeBSD in the summer, in some form or another. I've been a desktop user
>> of FreeBSD for several years now, and would now like to learn a specific
>> subsystem and undergo a potential project for this summer.
>>
>> My experience with FreeBSD development is limited to writing simple
>> kernel modules, although I have a strong understanding of C and Unix
>> programming (and am willing to spend as much time as required before summer
>> to fill in the gaps in my knowledge). I have already taken a look at the
>> GSoC projects page, but I'm not sure how many of these are still relevant.
>> Could anyone suggest a project that someone with my experience level could
>> reasonably learn and complete? Also contacts to potential mentors would be
>> very helpful.
>>
>> Thank you,
>> Abhinav Chavali
>>
>
>
> --
> Mario.
>

Abhinav Chavali

2025-01-16 12:58:04 UTC

Permalink

In that case, do you think a project like that would be worth undertaking
over the summer? I'm actually quite interested in this issue if its
feasible.

On Wed, Jan 15, 2025 at 9:10â¯PM Warner Losh <***@bsdimp.com> wrote:

> Most of that work would be to adopt qemu to use vmm, and then fix the
> structural issue in vmm that makes using it from qemu awkward.
>
> Warner
>
> On Wed, Jan 15, 2025, 7:06â¯PM Mario Marietto <***@gmail.com>
> wrote:
>
>> You could try to patch or rewrite bhyve so that it can accelerate qemu.
>> We need this.
>>
>> On Wed, Jan 15, 2025 at 10:06â¯PM Abhinav Chavali <
>> ***@gmail.com> wrote:
>>
>>> Hello,
>>> I am a student at the Pennsylvania State University looking to
>>> contribute to FreeBSD in the summer, in some form or another. I've been a
>>> desktop user of FreeBSD for several years now, and would now like to learn
>>> a specific subsystem and undergo a potential project for this summer.
>>>
>>> My experience with FreeBSD development is limited to writing simple
>>> kernel modules, although I have a strong understanding of C and Unix
>>> programming (and am willing to spend as much time as required before summer
>>> to fill in the gaps in my knowledge). I have already taken a look at the
>>> GSoC projects page, but I'm not sure how many of these are still relevant.
>>> Could anyone suggest a project that someone with my experience level could
>>> reasonably learn and complete? Also contacts to potential mentors would be
>>> very helpful.
>>>
>>> Thank you,
>>> Abhinav Chavali
>>>
>>
>>
>> --
>> Mario.
>>
>

David Chisnall

2025-01-16 21:09:52 UTC

Permalink

Bertrand Petit

2025-01-23 09:06:03 UTC

Permalink

On Thu, Jan 23, 2025 at 08:24:08AM +0000, Poul-Henning Kamp wrote:
>
> Isn't that program already horrible and complex enough, in terms
> of source code, manual page and command line options ?

And buggy, see [1]. Reported Oct. 2021 and still present.

I'm not against ifconfig per se, I even like the interface it exposes
to the user, a unified interface. Having dozens of commands, each configuring
a protocol or even parts of a protocol, each with its own command line
paradigm and idiosyncrasies is daunting and taxing on human memory. I like
ifconfig however, its source code is a mess---a mess similar to what a cat
makes when playing with balls of knitting yarn.

[1] <https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=259003>

--
%!PS -- Bertrand Petit
/D{def}def/E{exch}D/G{get}D/I{2 div}D/U{dup}D/L{roll}D/Y{setgray}D/N{newpath}D
/O{N 0 0 moveto}D/P{pop}D/T{translate}D currentpagedevice/PageSize G U 0 G/w E
D 1 G /h E D w I h I T 0 Y 1 setlinewidth 0 1 2 { P 120 rotate 2 4 w U mul h U
mul add sqrt I 50 add {N 50 0 3 2 L 0 360 arc stroke}for}for/s{O true charpath
pathbbox exch 4 -1 L E sub I 3 1 L sub I} D /l(bp)D 0.94 Y /Helvetica findfont
22 scalefont setfont l s P(x)s exch P T O l show showpage

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de

Dimitry Andric

2025-01-27 15:34:22 UTC

Permalink

On 27 Jan 2025, at 07:27, Steve Kargl <***@troutmask.apl.washington.edu> wrote:
>
> On Mon, Jan 27, 2025 at 06:54:19AM +0200, Konstantin Belousov wrote:
>> On Sun, Jan 26, 2025 at 12:16:53PM -0800, Steve Kargl wrote:
>>> In replacing an ancient system with new I re-installed all ports
>>> including lang/gcc14 of FreeBSD-current. -current is 2 day old
>>> sources.
>>>
>>> Consider,
>>>
>>> #include <stdio.h>
>>> #include <stdlib.h>
>>> #include "mpfr.h"
>>>
>>> int
>>> main(void)
>>> {
>>> mpfr_t pi;
>>> mpfr_inits2(512, pi, NULL);
>>> mpfr_const_pi(pi, MPFR_RNDN);
>>> mpfr_printf("pi = %25.20Rf\n", pi);
>>> // A conscientious programmer cleans up after themself,
>>> // but on exit the system should take care of memory.
>>> // mpfr_clears(pi, NULL);
>>> return (0);
>>> }
>>>
>>> % gcc14 -o z -O2 -I/usr/local/include a.c -L/usr/local/lib -lmpfr -lgmp
>>> % ./z
>>> pi = 3.14159265358979323846
>>>
>>> All seems to work with shared linking.
>>>
>>> The following used to work.
>>>
>>> % gcc14 -o z -O2 -I/usr/local/include a.c -L/usr/local/lib -lmpfr -lgmp \
>>> -static
>>> % ./z
>>> pi = 3.14159265358979323846
>>> Segmentation fault (core dumped)
>>>
>>> % gdb151 ./z z.core
>>> ...
>>> #0 0x0000000000427ba5 in __gmpn_mul_1 ()
>>> (gdb) bt
>>> #0 0x0000000000427ba5 in __gmpn_mul_1 ()
>>> #1 0x000000000040051f in __do_global_dtors_aux ()
>>> at /usr/src/lib/csu/common/crtbegin.c:83
>>> #2 0x00000000004bc165 in _fini ()
>>> #3 0x0000000000458a7f in __cxa_finalize (dso=***@entry=0x0)
>>> at /usr/src/lib/libc/stdlib/atexit.c:234
>>> #4 0x0000000000458b70 in exit (status=0) at /usr/src/lib/libc/stdlib/exit.c:89
>>> #5 0x00000000004483d9 in __libc_start1 (argc=1, argv=0x820a52900,
>>> env=0x820a52910, cleanup=<optimized out>, mainX=0x400480 <main>)
>>> at /usr/src/lib/libc/csu/libc_start1.c:172
>>> #6 0x00000000004004f0 in _start () at /usr/src/lib/csu/amd64/crt1_s.S:83
>>>
>>> So, did someone break the startup files?
>> Why do you think that startup (crt) files are broken?
>> Note that they are involved in the trace above, but the lowest frame is
>> from gmp destructor, i.e. the problem formally happens in the gmp code.
>>
>> Perhaps try to rebuild gmp with debug info to get more information.
>
> You likely correct that its a gmp problem unmasked by the
> new hardware that I have. Rebuilding gmp with debugging
> did not help :(
>
> (gdb) run
> ...
> Program received signal SIGSEGV, Segmentation fault.
> Address not mapped to object.
> __gmpn_sqr_basecase () at tmp-sqr_basecase.s:222
> warning: 222 tmp-sqr_basecase.s: No such file or directory
> (gdb) bt
> #0 __gmpn_sqr_basecase () at tmp-sqr_basecase.s:222
> #1 0x00000000004004df in __do_global_dtors_aux ()
> at /usr/src/lib/csu/common/crtbegin.c:83
> #2 0x00000000004c8875 in _fini ()
> #3 0x000000000046518f in __cxa_finalize (dso=***@entry=0x0)
> at /usr/src/lib/libc/stdlib/atexit.c:234
> #4 0x0000000000465280 in exit (status=0) at /usr/src/lib/libc/stdlib/exit.c:89
> #5 0x0000000000454ae9 in __libc_start1 (argc=1, argv=0x7fffffffe738,
> env=0x7fffffffe748, cleanup=<optimized out>, mainX=0x400515 <main>)
> at /usr/src/lib/libc/csu/libc_start1.c:172
> #6 0x00000000004004b0 in _start () at /usr/src/lib/csu/amd64/crt1_s.S:83
>
> It seems gmp's build infrastructure removes tmp files.

The sqr_basecase.s thing is a red herring. In fact, the whole mpfr/gmp
thing is a red herring. :) The actual problem is in the way gcc emits
the .dtors section:

$ readelf --hex-dump=.dtors static-test-clang

Hex dump of section '.dtors':
0x004f3d30 ffffffff ffffffff 00000000 00000000 ................

$ readelf --hex-dump=.dtors static-test-gcc

Hex dump of section '.dtors':
0x004efca8 ffffffff ffffffff ........

Our lib/csu/common/crtbegin.c's dtors handler starts at index 1,
however:

69 static void
70 __do_global_dtors_aux(void)
71 {
72 crt_func fn;
73 int n;
74
75 #ifdef SHARED
76 run_cxa_finalize();
77 #endif
78
79 for (n = 1;; n++) {
80 fn = __DTOR_LIST__[n];
81 if (fn == (crt_func)0 || fn == (crt_func)-1)
82 break;
83 fn();
84 }
85 }

Because it doesn't check the section length, and expects the table to be
terminated with a 0 or -1, it goes off the rails and ends up calling
random function pointers after it!

In my static binary, compiled with gcc and linked with BFD ld, the
.dtors section is followed by an empty .jcr section (so it doesn't
matter), and then .data.rel.ro:

Section Headers:
[Nr] Name
Type Address Off Size ES Lk Inf Al
Flags
...
[14] .dtors
PROGBITS 00000000004efca8 0eeca8 000008 00 0 0 8
[0000000000000003]: WRITE, ALLOC
[15] .jcr
PROGBITS 00000000004efcb0 0eecb0 000000 00 0 0 8
[0000000000000003]: WRITE, ALLOC
[16] .data.rel.ro
PROGBITS 00000000004efcb0 0eecb0 000088 00 0 0 8
[0000000000000003]: WRITE, ALLOC

the latter of which contains:

Hex dump of section '.data.rel.ro':
0x004efcb0 0c9d4200 00000000 299c4200 00000000 ..B.....).B.....
0x004efcc0 3b9c4200 00000000 869c4200 00000000 ;.B.......B.....
0x004efcd0 e79d4200 00000000 9b9e4200 00000000 ..B.......B.....
0x004efce0 9e9f4200 00000000 4ba04200 00000000 ..B.....K.B.....
0x004efcf0 39354400 00000000 fb344400 00000000 95D......4D.....
0x004efd00 2a354400 00000000 c8344400 00000000 *5D......4D.....
0x004efd10 e8344400 00000000 17354400 00000000 .4D......5D.....
0x004efd20 b5344400 00000000 d5344400 00000000 .4D......4D.....
0x004efd30 04354400 00000000 .5D.....

and indeed it is calling 0x429d0c, which happens to be a relocation into
the guts of sqr_basecase():

Program received signal SIGSEGV, Segmentation fault.
Address not mapped to object.
0x0000000000429d0c in __gmpn_sqr_basecase ()

Going back to the bad .dtors table, we can see that in FreeBSD's
crtbegin.c we have:

static crt_func __CTOR_LIST__[] __section(".ctors") __used = {
(crt_func)-1
};
static crt_func __DTOR_LIST__[] __section(".dtors") __used = {
(crt_func)-1
};

with corresponding entries in crtend.c:

static crt_func __CTOR_END__[] __section(".ctors") __used = {
(crt_func)0
};
static crt_func __DTOR_END__[] __section(".dtors") __used = {
(crt_func)0
};

The linker merges these together, effectively forming the "ffffffff
ffffffff 00000000 00000000" block mentioned earlier.

But for some reason, when gcc links a static executable, it uses
FreeBSD's crtbeginT.o (which is byte-identical to crtbegin.o), but
_libgcc_'s crtend.o:

gcc -v -static static-test.c -o static-test -lmpfr -lgmp
...
/usr/local/libexec/gcc13/gcc/x86_64-portbld-freebsd15.0/13.3.0/collect2 \
-plugin /usr/local/libexec/gcc13/gcc/x86_64-portbld-freebsd15.0/13.3.0/liblto_plugin.so \
-plugin-opt=/usr/local/libexec/gcc13/gcc/x86_64-portbld-freebsd15.0/13.3.0/lto-wrapper \
-plugin-opt=-fresolution=/tmp/cc00aXNF.res \
-plugin-opt=-pass-through=-lgcc \
-plugin-opt=-pass-through=-lgcc_eh \
-plugin-opt=-pass-through=-lc \
-plugin-opt=-pass-through=-lgcc \
-plugin-opt=-pass-through=-lgcc_eh \
-m elf_x86_64_fbsd \
-V \
-Bstatic \
-o static-test \
/usr/lib/crt1.o \
/usr/lib/crti.o \
/usr/lib/crtbeginT.o \
-L/usr/local/lib/gcc13/gcc/x86_64-portbld-freebsd15.0/13.3.0 \
-L/usr/local/lib/gcc13/gcc/x86_64-portbld-freebsd15.0/13.3.0/../../../../../x86_64-portbld-freebsd15.0/lib \
-L/usr/local/lib/gcc13/gcc/x86_64-portbld-freebsd15.0/13.3.0/../../.. \
/tmp/ccT2nihc.o \
-lmpfr \
-lgmp \
-lgcc \
-lgcc_eh \
-lc \
-lgcc \
-lgcc_eh \
/usr/local/lib/gcc13/gcc/x86_64-portbld-freebsd15.0/13.3.0/crtend.o \
/usr/lib/crtn.o

The problem is that libgcc's crtend.o does _not_ contain .ctors or
.dtors sections at all, resulting in the "ffffffff ffffffff" block.

During gcc's configure phase, I see:

checking for .preinit_array/.init_array/.fini_array support... yes

so initfini_array support is then enabled.

In libgcc's crtstuff.c, which is used to generate crtbegin.o and
crtend.o, the definitions of .ctors and .dtors are all conditional on
#ifndef USE_INITFINI_ARRAY. This is why gcc's crtbegin.o and crtend.o
only have .init and .fini sections, but no .ctors or .dtors.

Summarizing, I think that it is weird that gcc doesn't use its own
crtbegin object, at least for static linking. There may be some
historical reason for it, but it should then also not use its own crtend
object!

Another issue is how our __do_global_[cd]tors_aux() functions handle
improperly terminated sections. We could try to be more robust and cope
with it, or at least abort with a proper error message.

-Dimitry

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de

Steve Kargl

2025-01-27 17:52:34 UTC

Permalink

On Mon, Jan 27, 2025 at 04:34:22PM +0100, Dimitry Andric wrote:
> On 27 Jan 2025, at 07:27, Steve Kargl <***@troutmask.apl.washington.edu> wrote:
> >
> > On Mon, Jan 27, 2025 at 06:54:19AM +0200, Konstantin Belousov wrote:
> >> On Sun, Jan 26, 2025 at 12:16:53PM -0800, Steve Kargl wrote:
> >>> In replacing an ancient system with new I re-installed all ports
> >>> including lang/gcc14 of FreeBSD-current. -current is 2 day old
> >>> sources.
> >>>
> >>> Consider,
> >>>
> >>> #include <stdio.h>
> >>> #include <stdlib.h>
> >>> #include "mpfr.h"
> >>>
> >>> int
> >>> main(void)
> >>> {
> >>> mpfr_t pi;
> >>> mpfr_inits2(512, pi, NULL);
> >>> mpfr_const_pi(pi, MPFR_RNDN);
> >>> mpfr_printf("pi = %25.20Rf\n", pi);
> >>> // A conscientious programmer cleans up after themself,
> >>> // but on exit the system should take care of memory.
> >>> // mpfr_clears(pi, NULL);
> >>> return (0);
> >>> }

(remove some unimportant info)

> >>> So, did someone break the startup files?
> >> Why do you think that startup (crt) files are broken?
> >> Note that they are involved in the trace above, but the lowest frame is
> >> from gmp destructor, i.e. the problem formally happens in the gmp code.
> >>
> >> Perhaps try to rebuild gmp with debug info to get more information.
> >
> > You likely correct that its a gmp problem unmasked by the
> > new hardware that I have. Rebuilding gmp with debugging
> > did not help :(
> >
> > (gdb) run
> > ...
> > Program received signal SIGSEGV, Segmentation fault.
> > Address not mapped to object.
> > __gmpn_sqr_basecase () at tmp-sqr_basecase.s:222
> > warning: 222 tmp-sqr_basecase.s: No such file or directory
> > (gdb) bt
> > #0 __gmpn_sqr_basecase () at tmp-sqr_basecase.s:222
> > #1 0x00000000004004df in __do_global_dtors_aux ()
> > at /usr/src/lib/csu/common/crtbegin.c:83
> > #2 0x00000000004c8875 in _fini ()
> > #3 0x000000000046518f in __cxa_finalize (dso=***@entry=0x0)
> > at /usr/src/lib/libc/stdlib/atexit.c:234
> > #4 0x0000000000465280 in exit (status=0) at /usr/src/lib/libc/stdlib/exit.c:89
> > #5 0x0000000000454ae9 in __libc_start1 (argc=1, argv=0x7fffffffe738,
> > env=0x7fffffffe748, cleanup=<optimized out>, mainX=0x400515 <main>)
> > at /usr/src/lib/libc/csu/libc_start1.c:172
> > #6 0x00000000004004b0 in _start () at /usr/src/lib/csu/amd64/crt1_s.S:83
> >
> > It seems gmp's build infrastructure removes tmp files.
>
> The sqr_basecase.s thing is a red herring. In fact, the whole mpfr/gmp
> thing is a red herring. :) The actual problem is in the way gcc emits
> the .dtors section:
>
> $ readelf --hex-dump=.dtors static-test-clang
>
> Hex dump of section '.dtors':
> 0x004f3d30 ffffffff ffffffff 00000000 00000000 ................
>
> $ readelf --hex-dump=.dtors static-test-gcc
>
> Hex dump of section '.dtors':
> 0x004efca8 ffffffff ffffffff ........
>

(Debugging info removed to keep this short)

>
> During gcc's configure phase, I see:
>
> checking for .preinit_array/.init_array/.fini_array support... yes
>
> so initfini_array support is then enabled.

I tried building gcc14 with --enable-initfini-array configure option.
Maybe I should try --disable-initfini-array.

> In libgcc's crtstuff.c, which is used to generate crtbegin.o and
> crtend.o, the definitions of .ctors and .dtors are all conditional on
> #ifndef USE_INITFINI_ARRAY. This is why gcc's crtbegin.o and crtend.o
> only have .init and .fini sections, but no .ctors or .dtors.

Thanks for the thorough explanation!

You've given me someplace to poke around in crtstuff.c.
I do see in gcc/config/freebsd-spec.h

/* Provide a STARTFILE_SPEC appropriate for FreeBSD. Here we add
the magical crtbegin.o file (see crtstuff.c) which provides part
of the support for getting C++ file-scope static object constructed
before entering `main'. */

#define FBSD_STARTFILE_SPEC \
"%{!shared: \
%{pg:gcrt1.o%s} %{!pg:%{p:gcrt1.o%s} \
%{!p:%{profile:gcrt1.o%s} \
%{!profile: \
%{pie: Scrt1.o%s;:crt1.o%s}}}}} \
crti.o%s %{static:crtbeginT.o%s;shared|pie:crtbeginS.o%s;:crtbegin.o%s}"

/* Provide a ENDFILE_SPEC appropriate for FreeBSD. Here we tack on
the magical crtend.o file (see crtstuff.c) which provides part of
the support for getting C++ file-scope static object constructed
before entering `main', followed by a normal "finalizer" file,
`crtn.o'. */

#define FBSD_ENDFILE_SPEC \
"%{shared|pie:crtendS.o%s;:crtend.o%s} crtn.o%s"

So, the paths must be dealt with by crtstuff.c.

--
Steve

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de

Steve Kargl

2025-01-27 18:42:25 UTC

Permalink

On Mon, Jan 27, 2025 at 04:34:22PM +0100, Dimitry Andric wrote:
> /usr/lib/crt1.o \
> /usr/lib/crti.o \
> /usr/lib/crtbeginT.o \
(...)
> /usr/local/lib/gcc13/gcc/x86_64-portbld-freebsd15.0/13.3.0/crtend.o \
> /usr/lib/crtn.o
>

>
> Summarizing, I think that it is weird that gcc doesn't use its own
> crtbegin object, at least for static linking. There may be some
> historical reason for it, but it should then also not use its own crtend
> object!

Not sure about a historical reason, but gcc14 seems to not
supply crt1.o, crt1.o, crtbeginT.o, ot crtn.o. Thus, the
linker is picking up those files from /usr/lib.

% find /usr/local/lib/gcc14/ -name crt\*.o
/usr/local/lib/gcc14/gcc/x86_64-portbld-freebsd15.0/14.2.0/crtbegin.o
/usr/local/lib/gcc14/gcc/x86_64-portbld-freebsd15.0/14.2.0/crtend.o
/usr/local/lib/gcc14/gcc/x86_64-portbld-freebsd15.0/14.2.0/crtbeginS.o
/usr/local/lib/gcc14/gcc/x86_64-portbld-freebsd15.0/14.2.0/crtendS.o

If I move gcc14/.../crtend.o out of the way, then the segfault goes away
as the linker shows

-V -Bstatic -o z
/usr/lib/crt1.o
/usr/lib/crti.o
/usr/lib/crtbeginT.o
...
/usr/lib/crtend.o
/usr/lib/crtn.o

--
Steve

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de

Dimitry Andric

2025-01-27 19:15:46 UTC

Permalink

On 27 Jan 2025, at 19:42, Steve Kargl <***@troutmask.apl.washington.edu> wrote:
>
> On Mon, Jan 27, 2025 at 04:34:22PM +0100, Dimitry Andric wrote:
...
>> Summarizing, I think that it is weird that gcc doesn't use its own
>> crtbegin object, at least for static linking. There may be some
>> historical reason for it, but it should then also not use its own crtend
>> object!
>
...
> Not sure about a historical reason, but gcc14 seems to not
> supply crt1.o, crt1.o, crtbeginT.o, ot crtn.o. Thus, the
> linker is picking up those files from /usr/lib.
>
> % find /usr/local/lib/gcc14/ -name crt\*.o
> /usr/local/lib/gcc14/gcc/x86_64-portbld-freebsd15.0/14.2.0/crtbegin.o
> /usr/local/lib/gcc14/gcc/x86_64-portbld-freebsd15.0/14.2.0/crtend.o
> /usr/local/lib/gcc14/gcc/x86_64-portbld-freebsd15.0/14.2.0/crtbeginS.o
> /usr/local/lib/gcc14/gcc/x86_64-portbld-freebsd15.0/14.2.0/crtendS.o
>
> If I move gcc14/.../crtend.o out of the way, then the segfault goes away

Yes, that's one way to do it. Another way is to change the spec file, or
a patch like this, which I'm now trying out:

--- libgcc/config.host.orig 2024-08-01 08:17:17 UTC
+++ libgcc/config.host
@@ -286,7 +286,7 @@ case ${host} in
# machine-specific sections may refine and add to this
# configuration.
tmake_file="$tmake_file t-freebsd t-crtstuff-pic t-libgcc-pic t-eh-dw2-dip t-slibgcc t-slibgcc-gld t-slibgcc-elf-ver"
- extra_parts="crtbegin.o crtend.o crtbeginS.o crtendS.o"
+ extra_parts="crtbegin.o crtend.o crtbeginS.o crtbeginT.o crtendS.o"
case ${target_thread_file} in
posix)
tmake_file="${tmake_file} t-freebsd-thread"

Though I'm not sure if the resulting crtbeginT.o will work as expected.

-Dimitry

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de

Konstantin Belousov

2025-01-27 19:24:34 UTC

Permalink

On Mon, Jan 27, 2025 at 10:42:25AM -0800, Steve Kargl wrote:
> On Mon, Jan 27, 2025 at 04:34:22PM +0100, Dimitry Andric wrote:
> > /usr/lib/crt1.o \
> > /usr/lib/crti.o \
> > /usr/lib/crtbeginT.o \
> (...)
> > /usr/local/lib/gcc13/gcc/x86_64-portbld-freebsd15.0/13.3.0/crtend.o \
> > /usr/lib/crtn.o
> >
>
>
> >
> > Summarizing, I think that it is weird that gcc doesn't use its own
> > crtbegin object, at least for static linking. There may be some
> > historical reason for it, but it should then also not use its own crtend
> > object!
>
> Not sure about a historical reason, but gcc14 seems to not
> supply crt1.o, crt1.o, crtbeginT.o, ot crtn.o. Thus, the
> linker is picking up those files from /usr/lib.
>
> % find /usr/local/lib/gcc14/ -name crt\*.o
> /usr/local/lib/gcc14/gcc/x86_64-portbld-freebsd15.0/14.2.0/crtbegin.o
> /usr/local/lib/gcc14/gcc/x86_64-portbld-freebsd15.0/14.2.0/crtend.o
> /usr/local/lib/gcc14/gcc/x86_64-portbld-freebsd15.0/14.2.0/crtbeginS.o
> /usr/local/lib/gcc14/gcc/x86_64-portbld-freebsd15.0/14.2.0/crtendS.o
>
> If I move gcc14/.../crtend.o out of the way, then the segfault goes away
> as the linker shows
>
> -V -Bstatic -o z
> /usr/lib/crt1.o
> /usr/lib/crti.o
> /usr/lib/crtbeginT.o
> ...
> /usr/lib/crtend.o
> /usr/lib/crtn.o

The following patch worked for me without changing anything in gcc.

From 976aa780b8ad212127d84a47a5a05f1bd6aef60c Mon Sep 17 00:00:00 2001
From: Konstantin Belousov <***@FreeBSD.org>
Date: Mon, 27 Jan 2025 21:21:20 +0200
Subject: [PATCH] crtbegin: accurately check for the end of .dtors

not relying only on the end section marker, but also checking for the
section size when iterating.

Reported by: kargl
Analyzed by: dim
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
---
lib/csu/common/crtbegin.c | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/lib/csu/common/crtbegin.c b/lib/csu/common/crtbegin.c
index d6978859af4a..06fe990052f7 100644
--- a/lib/csu/common/crtbegin.c
+++ b/lib/csu/common/crtbegin.c
@@ -66,19 +66,27 @@ static crt_func __DTOR_LIST__[] __section(".dtors") __used = {
(crt_func)-1
};

+extern const char startof_dtors[] __asm(".startof..dtors")
+ __weak_symbol __hidden;
+extern const char sizeof_dtors[] __asm(".sizeof..dtors")
+ __weak_symbol __hidden;
+
static void
__do_global_dtors_aux(void)
{
crt_func fn;
+ uintptr_t dtors_end;
int n;

#ifdef SHARED
run_cxa_finalize();
#endif

+ dtors_end = (uintptr_t)&startof_dtors + (uintptr_t)&sizeof_dtors;
for (n = 1;; n++) {
fn = __DTOR_LIST__[n];
- if (fn == (crt_func)0 || fn == (crt_func)-1)
+ if (fn == (crt_func)0 || fn == (crt_func)-1 || (dtors_end > 0 &&
+ (uintptr_t)&__DTOR_LIST__[n] >= dtors_end))
break;
fn();
}
--
2.48.1

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de

Steve Kargl

2025-01-27 20:19:44 UTC

Permalink

On Mon, Jan 27, 2025 at 09:24:34PM +0200, Konstantin Belousov wrote:
> On Mon, Jan 27, 2025 at 10:42:25AM -0800, Steve Kargl wrote:
> > On Mon, Jan 27, 2025 at 04:34:22PM +0100, Dimitry Andric wrote:
> > > /usr/lib/crt1.o \
> > > /usr/lib/crti.o \
> > > /usr/lib/crtbeginT.o \
> > (...)
> > > /usr/local/lib/gcc13/gcc/x86_64-portbld-freebsd15.0/13.3.0/crtend.o \
> > > /usr/lib/crtn.o
> > >
> > > Summarizing, I think that it is weird that gcc doesn't use its own
> > > crtbegin object, at least for static linking. There may be some
> > > historical reason for it, but it should then also not use its own crtend
> > > object!
> >
> > Not sure about a historical reason, but gcc14 seems to not
> > supply crt1.o, crt1.o, crtbeginT.o, ot crtn.o. Thus, the
> > linker is picking up those files from /usr/lib.
> >
> > % find /usr/local/lib/gcc14/ -name crt\*.o
> > /usr/local/lib/gcc14/gcc/x86_64-portbld-freebsd15.0/14.2.0/crtbegin.o
> > /usr/local/lib/gcc14/gcc/x86_64-portbld-freebsd15.0/14.2.0/crtend.o
> > /usr/local/lib/gcc14/gcc/x86_64-portbld-freebsd15.0/14.2.0/crtbeginS.o
> > /usr/local/lib/gcc14/gcc/x86_64-portbld-freebsd15.0/14.2.0/crtendS.o
> >
> > If I move gcc14/.../crtend.o out of the way, then the segfault goes away
> > as the linker shows
> >
> > -V -Bstatic -o z
> > /usr/lib/crt1.o
> > /usr/lib/crti.o
> > /usr/lib/crtbeginT.o
> > ...
> > /usr/lib/crtend.o
> > /usr/lib/crtn.o
>
> The following patch worked for me without changing anything in gcc.
>
> >From 976aa780b8ad212127d84a47a5a05f1bd6aef60c Mon Sep 17 00:00:00 2001
> From: Konstantin Belousov <***@FreeBSD.org>
> Date: Mon, 27 Jan 2025 21:21:20 +0200
> Subject: [PATCH] crtbegin: accurately check for the end of .dtors
>
> not relying only on the end section marker, but also checking for the
> section size when iterating.
>
> Reported by: kargl
> Analyzed by: dim
> Sponsored by: The FreeBSD Foundation
> MFC after: 1 week

Thanks, kib! If dim's patch also fixes/avoids the issue
I'll pursue upstreaming it to gcc. I know that it effects
gcc's mainline.

--
Steve

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de

Dimitry Andric

2025-01-27 20:23:36 UTC

Permalink

On 27 Jan 2025, at 20:24, Konstantin Belousov <***@gmail.com> wrote:
>
> On Mon, Jan 27, 2025 at 10:42:25AM -0800, Steve Kargl wrote:
...
>> If I move gcc14/.../crtend.o out of the way, then the segfault goes away
>> as the linker shows
>>
>> -V -Bstatic -o z
>> /usr/lib/crt1.o
>> /usr/lib/crti.o
>> /usr/lib/crtbeginT.o
>> ...
>> /usr/lib/crtend.o
>> /usr/lib/crtn.o
>
> The following patch worked for me without changing anything in gcc.
>
> From 976aa780b8ad212127d84a47a5a05f1bd6aef60c Mon Sep 17 00:00:00 2001
> From: Konstantin Belousov <***@FreeBSD.org>
> Date: Mon, 27 Jan 2025 21:21:20 +0200
> Subject: [PATCH] crtbegin: accurately check for the end of .dtors
>
> not relying only on the end section marker, but also checking for the
> section size when iterating.
>
> Reported by: kargl
> Analyzed by: dim
> Sponsored by: The FreeBSD Foundation
> MFC after: 1 week
> ---
> lib/csu/common/crtbegin.c | 10 +++++++++-
> 1 file changed, 9 insertions(+), 1 deletion(-)
>
> diff --git a/lib/csu/common/crtbegin.c b/lib/csu/common/crtbegin.c
> index d6978859af4a..06fe990052f7 100644
> --- a/lib/csu/common/crtbegin.c
> +++ b/lib/csu/common/crtbegin.c
> @@ -66,19 +66,27 @@ static crt_func __DTOR_LIST__[] __section(".dtors") __used = {
> (crt_func)-1
> };
>
> +extern const char startof_dtors[] __asm(".startof..dtors")
> + __weak_symbol __hidden;
> +extern const char sizeof_dtors[] __asm(".sizeof..dtors")
> + __weak_symbol __hidden;
> +
> static void
> __do_global_dtors_aux(void)
> {
> crt_func fn;
> + uintptr_t dtors_end;
> int n;
>
> #ifdef SHARED
> run_cxa_finalize();
> #endif
>
> + dtors_end = (uintptr_t)&startof_dtors + (uintptr_t)&sizeof_dtors;
> for (n = 1;; n++) {
> fn = __DTOR_LIST__[n];
> - if (fn == (crt_func)0 || fn == (crt_func)-1)
> + if (fn == (crt_func)0 || fn == (crt_func)-1 || (dtors_end > 0 &&
> + (uintptr_t)&__DTOR_LIST__[n] >= dtors_end))
> break;
> fn();
> }

Yes, this looks good to me, thanks! Note that even with this safety belt
applied to crtbegin, it's still a good idea to fix the gcc side... :)

Btw, it would also be nice to add a similar safety belt for ctors?

-Dimitry

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de

Konstantin Belousov

2025-01-27 20:43:54 UTC

Permalink

On Mon, Jan 27, 2025 at 09:23:36PM +0100, Dimitry Andric wrote:
> On 27 Jan 2025, at 20:24, Konstantin Belousov <***@gmail.com> wrote:
> >
> > On Mon, Jan 27, 2025 at 10:42:25AM -0800, Steve Kargl wrote:
> ...
> >> If I move gcc14/.../crtend.o out of the way, then the segfault goes away
> >> as the linker shows
> >>
> >> -V -Bstatic -o z
> >> /usr/lib/crt1.o
> >> /usr/lib/crti.o
> >> /usr/lib/crtbeginT.o
> >> ...
> >> /usr/lib/crtend.o
> >> /usr/lib/crtn.o
> >
> > The following patch worked for me without changing anything in gcc.
> >
> > From 976aa780b8ad212127d84a47a5a05f1bd6aef60c Mon Sep 17 00:00:00 2001
> > From: Konstantin Belousov <***@FreeBSD.org>
> > Date: Mon, 27 Jan 2025 21:21:20 +0200
> > Subject: [PATCH] crtbegin: accurately check for the end of .dtors
> >
> > not relying only on the end section marker, but also checking for the
> > section size when iterating.
> >
> > Reported by: kargl
> > Analyzed by: dim
> > Sponsored by: The FreeBSD Foundation
> > MFC after: 1 week
> > ---
> > lib/csu/common/crtbegin.c | 10 +++++++++-
> > 1 file changed, 9 insertions(+), 1 deletion(-)
> >
> > diff --git a/lib/csu/common/crtbegin.c b/lib/csu/common/crtbegin.c
> > index d6978859af4a..06fe990052f7 100644
> > --- a/lib/csu/common/crtbegin.c
> > +++ b/lib/csu/common/crtbegin.c
> > @@ -66,19 +66,27 @@ static crt_func __DTOR_LIST__[] __section(".dtors") __used = {
> > (crt_func)-1
> > };
> >
> > +extern const char startof_dtors[] __asm(".startof..dtors")
> > + __weak_symbol __hidden;
> > +extern const char sizeof_dtors[] __asm(".sizeof..dtors")
> > + __weak_symbol __hidden;
> > +
> > static void
> > __do_global_dtors_aux(void)
> > {
> > crt_func fn;
> > + uintptr_t dtors_end;
> > int n;
> >
> > #ifdef SHARED
> > run_cxa_finalize();
> > #endif
> >
> > + dtors_end = (uintptr_t)&startof_dtors + (uintptr_t)&sizeof_dtors;
> > for (n = 1;; n++) {
> > fn = __DTOR_LIST__[n];
> > - if (fn == (crt_func)0 || fn == (crt_func)-1)
> > + if (fn == (crt_func)0 || fn == (crt_func)-1 || (dtors_end > 0 &&
> > + (uintptr_t)&__DTOR_LIST__[n] >= dtors_end))
> > break;
> > fn();
> > }
>
> Yes, this looks good to me, thanks! Note that even with this safety belt
> applied to crtbegin, it's still a good idea to fix the gcc side... :)
Definitely, I do not think that one fix should exclude another.

>
> Btw, it would also be nice to add a similar safety belt for ctors?
D48700

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de

Tomek CEDRO

2025-01-28 16:05:46 UTC

Permalink

On Sun, Jan 26, 2025 at 2:54 PM Paul Floyd wrote:
> On 26/12/2024 17:30, Daniel Engberg wrote:
> Good board however there's very little that justifies the difference in price between X670E and X870E variants (spend that on a cooler instead). You can find a short summary here, https://www.reddit.com/r/buildapc/comments/1fpk3cq/proart_x870ecreator_wifi_vs_proart_x670ecreator/ . Graphics is integrated in the CPU so choice of motherboard doesn't matter in that regard. Without looking too much it's probably going to be a bit rough using 6.1 as a base. (My box is headless)
>
> I went with a Asus Rog Strix in the end.
> 6.1 does seem a bit rough, it gave me a kernel panic when I loaded it.

Can you please report back on how stable is FreeBSD on that hardware?
I am thinking about moving 14.2-RELEASE to new hardware very soon with
AMD Ryzen 9 9950X CPU (Zen5) and also some ASUS mobo (either X870E or
X670E not sure yet).

I plan to buy 6000MHz RAM in 64..96..128..192GB size for a ramdisk to
speed up build times.

This will be my development workstation and I also need good stable 3D
with multimonitor+rotation setup for CAD and multimedia so scfb
fallback is not an option because it does not support
multimonitor+rotation without gpu drivers even with no 3d
acceleration.

I will definitely buy second hand NVIDIA GPU because I am sick tired
of AMDGPU stability problems with my RX580 and I dont believe modern
built-in GPU will even work. After move to 14.2 (and drm-61-kmod) it
does not panic as on 14.0/5.15 but it gets hiccups that require
reboot.. far from 13.3/5.10 stability.. so I am considering just
replacing GPU and use nvidia drivers :-(

--
CeDeROM, SQ7MHZ, http://www.tomek.cedro.info

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de

Daniel Engberg

2025-01-28 18:34:25 UTC

Permalink

On 2025-01-28T17:06:42.000+01:00, Tomek CEDRO <***@cedro.info>
wrote:

>Â OnÂ Sun,Â JanÂ 26,Â 2025Â atÂ 2:54â¯PMÂ PaulÂ FloydÂ wrote:
>>Â Â OnÂ 26/12/2024Â 17:30,Â DanielÂ EngbergÂ wrote:
>>Â Â
>>Â Â Â GoodÂ boardÂ howeverÂ there'sÂ veryÂ littleÂ thatÂ justifiesÂ the
>>Â Â differenceÂ inÂ priceÂ betweenÂ X670EÂ andÂ X870EÂ variantsÂ (spendÂ that
>>Â Â onÂ aÂ coolerÂ instead).Â YouÂ canÂ findÂ aÂ shortÂ summaryÂ here,
>>Â Â https://www.reddit.com/r/buildapc/comments/1fpk3cq/proart_x870ecreator_wifi_vs_proart_x670ecreator/
>>Â Â .Â GraphicsÂ isÂ integratedÂ inÂ theÂ CPUÂ soÂ choiceÂ ofÂ motherboard
>>Â Â doesn'tÂ matterÂ inÂ thatÂ regard.Â WithoutÂ lookingÂ tooÂ muchÂ it's
>>Â Â probablyÂ goingÂ toÂ beÂ aÂ bitÂ roughÂ usingÂ 6.1Â asÂ aÂ base.Â (MyÂ boxÂ is
>>Â Â headless)
>>Â Â
>>Â Â Â IÂ wentÂ withÂ aÂ AsusÂ RogÂ StrixÂ inÂ theÂ end.
>>Â Â
>>Â Â Â 6.1Â doesÂ seemÂ aÂ bitÂ rough,Â itÂ gaveÂ meÂ aÂ kernelÂ panicÂ whenÂ I
>>Â Â loadedÂ it.
>Â
>Â CanÂ youÂ pleaseÂ reportÂ backÂ onÂ howÂ stableÂ isÂ FreeBSDÂ onÂ that
>Â hardware?
>Â
>Â IÂ amÂ thinkingÂ aboutÂ movingÂ 14.2-RELEASEÂ toÂ newÂ hardwareÂ veryÂ soon
>Â with
>Â
>Â AMDÂ RyzenÂ 9Â 9950XÂ CPUÂ (Zen5)Â andÂ alsoÂ someÂ ASUSÂ moboÂ (eitherÂ X870E
>Â or
>Â
>Â X670EÂ notÂ sureÂ yet).
>Â
>Â IÂ planÂ toÂ buyÂ 6000MHzÂ RAMÂ inÂ 64..96..128..192GBÂ sizeÂ forÂ aÂ ramdisk
>Â to
>Â
>Â speedÂ upÂ buildÂ times.
>Â
>Â ThisÂ willÂ beÂ myÂ developmentÂ workstationÂ andÂ IÂ alsoÂ needÂ goodÂ stable
>Â 3D
>Â
>Â withÂ multimonitor+rotationÂ setupÂ forÂ CADÂ andÂ multimediaÂ soÂ scfb
>Â
>Â fallbackÂ isÂ notÂ anÂ optionÂ becauseÂ itÂ doesÂ notÂ support
>Â
>Â multimonitor+rotationÂ withoutÂ gpuÂ driversÂ evenÂ withÂ noÂ 3d
>Â
>Â acceleration.
>Â
>Â IÂ willÂ definitelyÂ buyÂ secondÂ handÂ NVIDIAÂ GPUÂ becauseÂ IÂ amÂ sickÂ tired
>Â
>Â ofÂ AMDGPUÂ stabilityÂ problemsÂ withÂ myÂ RX580Â andÂ IÂ dontÂ believeÂ modern
>Â
>Â built-inÂ GPUÂ willÂ evenÂ work.Â AfterÂ moveÂ toÂ 14.2Â (andÂ drm-61-kmod)Â it
>Â
>Â doesÂ notÂ panicÂ asÂ onÂ 14.0/5.15Â butÂ itÂ getsÂ hiccupsÂ thatÂ require
>Â
>Â reboot..Â farÂ fromÂ 13.3/5.10Â stability..Â soÂ IÂ amÂ consideringÂ just
>Â
>Â replacingÂ GPUÂ andÂ useÂ nvidiaÂ driversÂ :-(
>Â
>Â --Â
>Â
>Â CeDeROM,Â SQ7MHZ,Â http://www.tomek.cedro.info

Stability is excellent using Asus ProArt X670E-CREATOR WIFI and a
Ryzen 7900 at least, some platform support bits a missing like no way
to monitor CPU boost etc. You can however roughly monitor CPU feq.
Anything faster than 5600MT for memory is overclocking and in most
cases to very little benefit. If you utilize 4 DIMMs it's go down to
3600 with 2R sticks and possibly bit faster with 1R (official specs
states 3600 however). Intel ARC is probably your best bet in the long
run however 6.1 is too hold even for the A-series.

Best regards,

Daniel

Tomek CEDRO

2025-01-28 19:35:20 UTC

Permalink

On Tue, Jan 28, 2025 at 7:34 PM Daniel Engberg
<***@pyret.net> wrote:
> On 2025-01-28T17:06:42.000+01:00, Tomek CEDRO <***@cedro.info> wrote:
> (..)
> Can you please report back on how stable is FreeBSD on that hardware?
> I am thinking about moving 14.2-RELEASE to new hardware very soon with
> AMD Ryzen 9 9950X CPU (Zen5) and also some ASUS mobo (either X870E or
> X670E not sure yet).
>
> I plan to buy 6000MHz RAM in 64..96..128..192GB size for a ramdisk to
> speed up build times.
>
> This will be my development workstation and I also need good stable 3D
> with multimonitor+rotation setup for CAD and multimedia so scfb
> fallback is not an option because it does not support
> multimonitor+rotation without gpu drivers even with no 3d
> acceleration.
>
> I will definitely buy second hand NVIDIA GPU because I am sick tired
> of AMDGPU stability problems with my RX580 and I dont believe modern
> built-in GPU will even work. After move to 14.2 (and drm-61-kmod) it
> does not panic as on 14.0/5.15 but it gets hiccups that require
> reboot.. far from 13.3/5.10 stability.. so I am considering just
> replacing GPU and use nvidia drivers :-(
>
> --
> CeDeROM, SQ7MHZ, http://www.tomek.cedro.info
>
> Stability is excellent using Asus ProArt X670E-CREATOR WIFI and a Ryzen 7900 at least, some platform support bits a missing like no way to monitor CPU boost etc. You can however roughly monitor CPU feq. Anything faster than 5600MT for memory is overclocking and in most cases to very little benefit. If you utilize 4 DIMMs it's go down to 3600 with 2R sticks and possibly bit faster with 1R (official specs states 3600 however). Intel ARC is probably your best bet in the long run however 6.1 is too hold even for the A-series.

Thank you Daniel :-) Stability is what I need so I can focus on work
or just switch to MacStudio and forget Open Source OS on desktop.. no
problems with M2 controller sound ethernet etc? :-)

So its better to buy 2 big RAM chips only it will be faster than 4 chips?? o_O

I will try with nvidia 1030 or 1060 GPU that should be old enough to
work fine I dont need anything more :-)

All modern mobos seems to be gaming only and have one or two PCI-E
slots which is a bit scarry as I have some extra cards and these will
be lost :-( But hey these all now have WIFI with big antennas lol ;-)

These ASUS CREATOR are not available right now.. price was ~$600..
similar powerful ASUS mainboards cost ~$800 (STRIX or CROSSHAIR with
X870E) which is acceptable difference for me to spent once per 10
years.. and there are much cheaper ones with B650/B850 and X670/X870
chipset for just ~$200..300 what about B650/B850/X670/X870 chipset?
:-)

Thanks :-)
Tomek

--
CeDeROM, SQ7MHZ, http://www.tomek.cedro.info

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de

Manas Ghandat

2025-01-28 16:30:04 UTC

Permalink

Hi. I was going through the GSOC project done in 2021 (
https://gist.github.com/simran-kathpalia/3491aef1cf4a9a0401da8eafda3e702a).
The main work done was on adding syscall descriptions and stuff. In the
newsletter suggested above, work related to KMSAN and Linuxlator fuzzing
are remaining. I wanted to know about the progress of these tasks. Also is
there any updated source available on this project?

Thanks,
Manas

On Sun, Jan 5, 2025 at 3:59â¯PM Lorenzo Salvadore <
***@lorenzosalvadore.it> wrote:

> Hello,
>
> You might be interested into this:
> https://www.freebsd.org/status/report-2021-07-2021-09/syzkaller/
>
> There might be more recent news, but I could not find it. Maybe try
> reaching the people in the contact section and offer your help.
>
> Cheers,
>
> Lorenzo Salvadore
>
> Inviato da Proton Mail Android
>
>
> -------- Messaggio originale --------
> 05/01/25 09:55, Manas Ghandat ha scritto:
>
> Hi,
>
> I was interested in the syzkaller (
> https://wiki.freebsd.org/SummerOfCodeIdeas#syzkaller_improvements) project.
> I wanted to know if there is stuff yet to be implemented so that I can add
> to it.
>
> Thanks,
> Manas
>
> On Sun, Jan 5, 2025 at 11:17â¯AM Ð¯ÑÐŸÑÐ»Ð°Ð² ÐÐ°ÑÐºÐŸ <***@gmail.com>
> wrote:
>
>> hi, Manas,
>>
>> if there has been a project, on GSoC, and you plan working on that ideas,
>> then what are those ideas you talk about?
>>
>> all wishes
>>
>> ÑÐ±, 4 ÑÐœÐ². 2025 Ð³., 21:54 Manas Ghandat <***@gmail.com>:
>>
>>> Hi,
>>>
>>> I am Manas and I am interested in working on the syzkaller project ideas
>>> mentioned at
>>> https://wiki.freebsd.org/SummerOfCodeIdeas#syzkaller_improvements
>>>
>>> There has been a GSoC project in 2021 regarding the same. I wanted to
>>> know if this idea is implemented or if some parts are yet to be implemented
>>>
>>> Thanks,
>>> Manas
>>>
>>

Lorenzo Salvadore

2025-01-28 19:17:11 UTC

Permalink

On Tuesday, January 28th, 2025 at 17:30, Manas Ghandat <***@gmail.com> wrote:

> Hi. I was going through the GSOC project done in 2021 (https://gist.github.com/simran-kathpalia/3491aef1cf4a9a0401da8eafda3e702a). The main work done was on adding syscall descriptions and stuff. In the newsletter suggested above, work related to KMSAN and Linuxlator fuzzing are remaining. I wanted to know about the progress of these tasks. Also is there any updated source available on this project?
> Thanks,
> Manas

Hello,

Although the 2024Q4 status report has not been published yet, you can
already read the latest news about syzkaller in FreeBSD in the
2024Q4 syzkaller status report:
https://cgit.freebsd.org/doc/tree/website/content/en/status/report-2024-10-2024-12/syzkaller.adoc

It seems they are working on fuzzing wifi at the moment. I do not
know about syzkaller progress related to KMSAN or Linuxulator.

Cheers,

Lorenzo Salvadore

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de

Mohammad Noureldin

2025-01-28 20:03:18 UTC

Permalink

Hi Paul!

On Fri, Dec 27, 2024 at 6:47â¯PM Paul Floyd <***@gmail.com> wrote:
--snip--

>
> And after much hesitation I went back to just about my first choice, an
> Asus Rog Strix X870-A.
>
> Well see how well it works in a few weeks.
>
> A+
> Paul
>
>
>
Out of curiosity, have you faced any issues similar to the ones reported in
[1] ?

[1] https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=283186

--
Thanks
- Mohammad Noureldin
--
"Life is like riding a bicycle. To keep your balance you must keep moving"
- Albert Einstein

Paul Floyd

2025-01-28 20:55:09 UTC

Permalink

On 28-01-25 21:03, Mohammad Noureldin wrote:
> Hi Paul!

> Out of curiosity, have you faced any issues similar to the ones reported
> in [1] ?
>
> [1] https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=283186 <https://
> bugs.freebsd.org/bugzilla/show_bug.cgi?id=283186>

Hi

I had no USB issues during installation. That was with a Logitech mouse,
Keychon Q6 Pro keyboard and some kind of Kingston USB stick for the install.

The only issue for the moment is graphics - I couldn't get amdgpu to
load so I fell back to using scfb which seems to work OK. I need to
figure out setting the mode automatically.

I'm running KDE and mainly using it for development.

A+
Paul

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de

Sulev-Madis Silber

2025-01-30 12:03:46 UTC

Permalink

what happens if you take the word llm out and put a human in there?

there are ton of fbsd contributors and i often wonder if some of them bring something in. apparently it's no "code-id" where we can put code for checks. esp i worry about all those linuxkpi things. where's the voluntary no consequences drug test that proves you didn't smoke any gpl before you opened code editor

it's like llm is right out but humans are all ok?

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de