Discussion:
Diagnosing virtual machine network issues
(too old to reply)
Alex Arslan
2024-06-28 15:53:36 UTC
Permalink
Hello,

I originally posted the following to freebsd-questions but was encouraged
to repost here instead.

I work on the Julia language (https://julialang.org) and am the de facto
maintainer of its FreeBSD support. Our continuous integration runs jobs in
FreeBSD 13.2 AMD64 virtual machines with KVM on Linux. This same Linux
machine also runs Windows jobs in VMs with KVM as well as Linux jobs using
a custom sandboxing setup.

We've noticed a number of network-related issues that only occur on the
FreeBSD VMs and cause tests to fail. Currently we reliably see a test
failure that expects a host resolution failure via libcurl from
https://domain.invalid but on the FreeBSD VMs we instead get a timeout.
Previously we've also seen timeouts when making requests to httpbingo
and GitHub. However, I've never been able to reproduce any of these test
failures, which makes me suspect there's an issue with how we've set up
networking for the VMs.

Can anybody provide guidance for how to determine what, if anything, could
be misconfigured? I apologize for the vagueness of this question; I'm not
really familiar with anything networking- or virtualization-related, so
I'm not sure what information would be helpful to include here. The
complete setup lives in https://github.com/JuliaCI/sandboxed-buildkite-agent
in the freebsd-kvm directory. In base-image/freebsd13.pkr.hcl [1], which
uses Packer to build a base qcow2 image, we set net_device = "virtio-net".
In buildkite-worker/kvm_machine.xml.template [2], we set the target device
to vnet0 with bridge virbr0.

Thank you very much for your time!

Best,
Alex

[1]: https://github.com/JuliaCI/sandboxed-buildkite-agent/blob/main/freebsd-kvm/base-image/freebsd13.pkr.hcl
[2]: https://github.com/JuliaCI/sandboxed-buildkite-agent/blob/main/freebsd-kvm/buildkite-worker/kvm_machine.xml.template

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Alex Arslan
2024-07-19 16:08:54 UTC
Permalink
I would start a pcap inside and outside of the VM for all udp port 53 traffic as a start to see if its a network issue going out of the box. If it happens frequently and you think it might be the network, perhaps try with the Intel em driver instead of the virtio network driver ?
Thanks so much for your help!

The way I implemented your pcap suggestion was to use tcpdump, hopefully
that's correct. I ran tcpdump simultaneously on the host and VM then ran
the code where libcurl gives a timeout rather than the expected domain
resolution failure. The output is below. I'm pretty well outside of my
depth here; what is it I'm looking for that would be indicative of a
network issue going out of the VM?

Linux host:
$ sudo /usr/sbin/tcpdump -v -i any 'host 192.168.122.35 and port 53'
tcpdump: listening on any, link-type LINUX_SLL (Linux cooked v1), capture size 262144 bytes
21:06:03.320754 IP (tos 0x0, ttl 64, id 29048, offset 0, flags [none], proto UDP (17), length 60)
192.168.122.35.24119 > amdci6.domain: 23532+ A? domain.invalid. (32)
21:06:03.320754 IP (tos 0x0, ttl 64, id 29048, offset 0, flags [none], proto UDP (17), length 60)
192.168.122.35.24119 > amdci6.domain: 23532+ A? domain.invalid. (32)
21:06:03.321633 IP (tos 0x0, ttl 64, id 27798, offset 0, flags [none], proto UDP (17), length 73)
192.168.122.35.18137 > amdci6.domain: 61699+ PTR? 35.122.168.192.in-addr.arpa. (45)
21:06:03.321633 IP (tos 0x0, ttl 64, id 27798, offset 0, flags [none], proto UDP (17), length 73)
192.168.122.35.18137 > amdci6.domain: 61699+ PTR? 35.122.168.192.in-addr.arpa. (45)
21:06:03.321701 IP (tos 0x0, ttl 64, id 44762, offset 0, flags [DF], proto UDP (17), length 113)
amdci6.domain > 192.168.122.35.18137: 61699* 1/0/0 35.122.168.192.in-addr.arpa. PTR freebsd-debugging-amdci6-0. (85)
21:06:03.321707 IP (tos 0x0, ttl 64, id 44762, offset 0, flags [DF], proto UDP (17), length 113)
amdci6.domain > 192.168.122.35.18137: 61699* 1/0/0 35.122.168.192.in-addr.arpa. PTR freebsd-debugging-amdci6-0. (85)
21:06:03.322188 IP (tos 0x0, ttl 64, id 27799, offset 0, flags [none], proto UDP (17), length 72)
192.168.122.35.37631 > amdci6.domain: 23871+ PTR? 1.122.168.192.in-addr.arpa. (44)
21:06:03.322188 IP (tos 0x0, ttl 64, id 27799, offset 0, flags [none], proto UDP (17), length 72)
192.168.122.35.37631 > amdci6.domain: 23871+ PTR? 1.122.168.192.in-addr.arpa. (44)
21:06:08.446737 IP (tos 0x0, ttl 64, id 29049, offset 0, flags [none], proto UDP (17), length 60)
192.168.122.35.24119 > amdci6.domain: 23532+ A? domain.invalid. (32)
21:06:08.446737 IP (tos 0x0, ttl 64, id 29049, offset 0, flags [none], proto UDP (17), length 60)
192.168.122.35.24119 > amdci6.domain: 23532+ A? domain.invalid. (32)
21:06:18.567376 IP (tos 0x0, ttl 64, id 29050, offset 0, flags [none], proto UDP (17), length 60)
192.168.122.35.37009 > amdci6.domain: 36459+ AAAA? domain.invalid. (32)
21:06:18.567376 IP (tos 0x0, ttl 64, id 29050, offset 0, flags [none], proto UDP (17), length 60)
192.168.122.35.37009 > amdci6.domain: 36459+ AAAA? domain.invalid. (32)
21:06:23.671046 IP (tos 0x0, ttl 64, id 29051, offset 0, flags [none], proto UDP (17), length 60)
192.168.122.35.37009 > amdci6.domain: 36459+ AAAA? domain.invalid. (32)
21:06:23.671046 IP (tos 0x0, ttl 64, id 29051, offset 0, flags [none], proto UDP (17), length 60)
192.168.122.35.37009 > amdci6.domain: 36459+ AAAA? domain.invalid. (32)
^C
14 packets captured
20 packets received by filter
2 packets dropped by kernel

FreeBSD VM:
$ sudo tcpdump -v port 53
tcpdump: listening on vtnet0, link-type EN10MB (Ethernet), capture size 262144 bytes
21:06:06.179751 IP (tos 0x0, ttl 64, id 29048, offset 0, flags [none], proto UDP (17), length 60)
freebsd-debugging-amdci6-0.24119 > amdci6.domain: 23532+ A? domain.invalid. (32)
21:06:06.180634 IP (tos 0x0, ttl 64, id 27798, offset 0, flags [none], proto UDP (17), length 73)
freebsd-debugging-amdci6-0.18137 > amdci6.domain: 61699+ PTR? 35.122.168.192.in-addr.arpa. (45)
21:06:06.180826 IP (tos 0x0, ttl 64, id 44762, offset 0, flags [DF], proto UDP (17), length 113)
amdci6.domain > freebsd-debugging-amdci6-0.18137: 61699* 1/0/0 35.122.168.192.in-addr.arpa. PTR freebsd-debugging-amdci6-0. (85)
21:06:06.181193 IP (tos 0x0, ttl 64, id 27799, offset 0, flags [none], proto UDP (17), length 72)
freebsd-debugging-amdci6-0.37631 > amdci6.domain: 23871+ PTR? 1.122.168.192.in-addr.arpa. (44)
21:06:06.194107 IP (tos 0x0, ttl 64, id 44764, offset 0, flags [DF], proto UDP (17), length 118)
amdci6.domain > freebsd-debugging-amdci6-0.37631: 23871 2/0/0 1.122.168.192.in-addr.arpa. PTR amdci6., 1.122.168.192.in-addr.arpa. PTR amdci6.local. (90)
21:06:11.305743 IP (tos 0x0, ttl 64, id 29049, offset 0, flags [none], proto UDP (17), length 60)
freebsd-debugging-amdci6-0.24119 > amdci6.domain: 23532+ A? domain.invalid. (32)
21:06:21.426439 IP (tos 0x0, ttl 64, id 29050, offset 0, flags [none], proto UDP (17), length 60)
freebsd-debugging-amdci6-0.37009 > amdci6.domain: 36459+ AAAA? domain.invalid. (32)
21:06:26.530138 IP (tos 0x0, ttl 64, id 29051, offset 0, flags [none], proto UDP (17), length 60)
freebsd-debugging-amdci6-0.37009 > amdci6.domain: 36459+ AAAA? domain.invalid. (32)
^C
8 packets captured
427 packets received by filter
0 packets dropped by kernel



--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Alex Arslan
2024-07-30 21:11:55 UTC
Permalink
Can you provide more context? I'm not seeing earlier messages anywhere in my email folders. Is this a Qemu issue?
The original message is from just over a month ago, archived here:
https://lists.freebsd.org/archives/freebsd-hackers/2024-June/003378.html
Basically, we have FreeBSD 13.2 VMs running under KVM on a Linux machine.
Some code is using libcurl to make a request to an invalid domain and is
testing that the error is a resolution failure. This test passes on all
platforms except specifically in these FreeBSD VMs; I can't reproduce
locally on FreeBSD. That made me think that there's an issue with how the
VM was set up, prompting the original message and discussion. Then what
I recently found was that we set a 30-second timeout for the libcurl
request, which FreeBSD hits in the VM, as it evidently spends a full
30 seconds attempting to resolve the host, while e.g. Linux reports a
resolution failure immediately.
Coincidentally, I'm experimenting with FreeBSD under Qemu on my Mac Mini M1 and seeing about 93 mbits/sec in iperf, regardless of the NIC configured. ( VM to bare metal host ) Bare metal to bare metal shows 930 mbits/sec.
That's interesting, can you show how you did that? I'm not familiar with
iperf (or most things in the realm of networking). Do you know why it's
so much slower?

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Bakul Shah
2024-07-30 21:22:54 UTC
Permalink
Post by Alex Arslan
Can you provide more context? I'm not seeing earlier messages anywhere in my email folders. Is this a Qemu issue?
https://lists.freebsd.org/archives/freebsd-hackers/2024-June/003378.html
Basically, we have FreeBSD 13.2 VMs running under KVM on a Linux machine.
Some code is using libcurl to make a request to an invalid domain and is
testing that the error is a resolution failure. This test passes on all
platforms except specifically in these FreeBSD VMs; I can't reproduce
locally on FreeBSD. That made me think that there's an issue with how the
VM was set up, prompting the original message and discussion. Then what
I recently found was that we set a 30-second timeout for the libcurl
request, which FreeBSD hits in the VM, as it evidently spends a full
30 seconds attempting to resolve the host, while e.g. Linux reports a
resolution failure immediately.
What does /etc/resolv.conf look like on the FreeBSD VM?

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Alex Arslan
2024-07-30 21:53:28 UTC
Permalink
Post by Bakul Shah
Post by Alex Arslan
Can you provide more context? I'm not seeing earlier messages anywhere in my email folders. Is this a Qemu issue?
https://lists.freebsd.org/archives/freebsd-hackers/2024-June/003378.html
Basically, we have FreeBSD 13.2 VMs running under KVM on a Linux machine.
Some code is using libcurl to make a request to an invalid domain and is
testing that the error is a resolution failure. This test passes on all
platforms except specifically in these FreeBSD VMs; I can't reproduce
locally on FreeBSD. That made me think that there's an issue with how the
VM was set up, prompting the original message and discussion. Then what
I recently found was that we set a 30-second timeout for the libcurl
request, which FreeBSD hits in the VM, as it evidently spends a full
30 seconds attempting to resolve the host, while e.g. Linux reports a
resolution failure immediately.
What does /etc/resolv.conf look like on the FreeBSD VM?
Just a comment and a name server line:

$ cat /etc/resolv.conf
# Generated by resolvconf
nameserver 192.168.122.1
Alex Arslan
2024-08-03 01:51:30 UTC
Permalink
Post by Bakul Shah
Post by Alex Arslan
Post by Alex Arslan
$ cat /etc/resolv.conf
# Generated by resolvconf
nameserver 192.168.122.1
I believe that is the host IP, so I guess the VM is using the host for DNS
resolution? Interestingly, if I add `nameserver 8.8.8.8` below the line
with the host IP, it takes 10 seconds rather than 30 to reach the expected
domain resolution failure. If I put 8.8.8.8 above the host IP, the domain
resolution failure is instantaneous.
What does your host use as a namesever?
The nameserver is 127.0.0.53. It sets options edns0 and trust-ad, and
includes a search entry as well.
Post by Bakul Shah
Post by Alex Arslan
Not a particularly satisfying conclusion to this saga as I don't understand
why it's happening but at least I have a workaround that should hopefully
do the job. I really appreciate everyone's help and input thus far!
What's the best way to add `nameserver 8.8.8.8` to /etc/resolv.conf as
part of the VM's configuration?
You should diagnose the problem of the nameserver at 192.168.122.1
and fix it to act properly. I don't use vm (just bhyve) so can't help
you with its config.
I do still plan to try to figure out what the actual issue is, but I also
now have a path forward in the meantime. :)

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Rodney W. Grimes
2024-08-10 16:11:22 UTC
Permalink
Post by Alex Arslan
Post by Bakul Shah
Post by Alex Arslan
Post by Alex Arslan
$ cat /etc/resolv.conf
# Generated by resolvconf
nameserver 192.168.122.1
I believe that is the host IP, so I guess the VM is using the host for DNS
resolution? Interestingly, if I add `nameserver 8.8.8.8` below the line
with the host IP, it takes 10 seconds rather than 30 to reach the expected
domain resolution failure. If I put 8.8.8.8 above the host IP, the domain
resolution failure is instantaneous.
What does your host use as a namesever?
The nameserver is 127.0.0.53. It sets options edns0 and trust-ad, and
includes a search entry as well.
First, is that a typo and you mean 127.0.0.1:53?
Second, is that name server locked to 127.0.0.1, or is it
actually listinging on *:53? If it is LOCKED you have no name server
running on 192.168.122.1 to be reached by the VM, if it is NOT locked
can the guest ping 192.168.122.1, and can it reach dns at that IP on
port 53? Can the host send a packet BACK to the guest?

Third you can "fix" the "nameserver 192.168.122.1" entry in /etc/resolv.conf
by configuring the DHCP server that handed out the lease to the VM to send
a namserver entry of 8.8.8.8.
Post by Alex Arslan
Post by Bakul Shah
Post by Alex Arslan
Not a particularly satisfying conclusion to this saga as I don't understand
why it's happening but at least I have a workaround that should hopefully
do the job. I really appreciate everyone's help and input thus far!
What's the best way to add `nameserver 8.8.8.8` to /etc/resolv.conf as
part of the VM's configuration?
You should diagnose the problem of the nameserver at 192.168.122.1
and fix it to act properly. I don't use vm (just bhyve) so can't help
you with its config.
I do still plan to try to figure out what the actual issue is, but I also
now have a path forward in the meantime. :)
--
Rod Grimes ***@freebsd.org


--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Alex Arslan
2024-08-13 15:45:38 UTC
Permalink
Hi Rodney,
Post by Rodney W. Grimes
Post by Alex Arslan
Post by Bakul Shah
Post by Alex Arslan
Post by Alex Arslan
$ cat /etc/resolv.conf
# Generated by resolvconf
nameserver 192.168.122.1
I believe that is the host IP, so I guess the VM is using the host for DNS
resolution? Interestingly, if I add `nameserver 8.8.8.8` below the line
with the host IP, it takes 10 seconds rather than 30 to reach the expected
domain resolution failure. If I put 8.8.8.8 above the host IP, the domain
resolution failure is instantaneous.
What does your host use as a namesever?
The nameserver is 127.0.0.53. It sets options edns0 and trust-ad, and
includes a search entry as well.
First, is that a typo and you mean 127.0.0.1:53?
No, the host's /etc/resolv.conf has `nameserver 127.0.0.53`, I just went
back and rechecked to be sure.
Post by Rodney W. Grimes
Second, is that name server locked to 127.0.0.1, or is it
actually listinging on *:53? If it is LOCKED you have no name server
running on 192.168.122.1 to be reached by the VM, if it is NOT locked
can the guest ping 192.168.122.1, and can it reach dns at that IP on
port 53? Can the host send a packet BACK to the guest?
I apologize but I don't really know enough about these things to know how
to answer your question. I did post the output of tcpdump on the VM and
the host a while back but that was for the invalid request, so that
probably doesn't capture what you're describing.
Post by Rodney W. Grimes
Third you can "fix" the "nameserver 192.168.122.1" entry in /etc/resolv.conf
by configuring the DHCP server that handed out the lease to the VM to send
a namserver entry of 8.8.8.8.
If I understand correctly, that is indeed what we've done as a Band-Aid fix
for the time being: I added the line `prepend_nameservers=8.8.8.8` to
/etc/resolvconf.conf.
Post by Rodney W. Grimes
Post by Alex Arslan
Post by Bakul Shah
Post by Alex Arslan
Not a particularly satisfying conclusion to this saga as I don't understand
why it's happening but at least I have a workaround that should hopefully
do the job. I really appreciate everyone's help and input thus far!
What's the best way to add `nameserver 8.8.8.8` to /etc/resolv.conf as
part of the VM's configuration?
You should diagnose the problem of the nameserver at 192.168.122.1
and fix it to act properly. I don't use vm (just bhyve) so can't help
you with its config.
I do still plan to try to figure out what the actual issue is, but I also
now have a path forward in the meantime. :)
--
Bakul Shah
2024-08-13 16:15:05 UTC
Permalink
Alex Arslan
2024-08-14 16:15:22 UTC
Permalink
This weird 127. address seems like a systemd feature/bug thing: https://unix.stackexchange.com/questions/612416/why-does-etc-resolv-conf-point-at-127-0-0-53
This behavior seems like some strange interaction between systemd assumptions and freebsd’s, or something not being set up quite right on the linux side when the vm is running freebsd.
Could libvirt be a factor here, do you think? For example, perhaps the
network should be configured differently than the default when the host
is using systemd-resolved and/or when the guest is FreeBSD. In the network
XML format for libvirt (https://libvirt.org/formatnetwork.html), there is
a `domain` element with a `localOnly` attribute that I have seen set by
some virtualization projects. As far as I can tell, our setup isn't using
the `domain` element at all.
Post by Alex Arslan

Hi Rodney,
Post by Rodney W. Grimes
Post by Alex Arslan
Post by Bakul Shah
Post by Alex Arslan
Post by Alex Arslan
$ cat /etc/resolv.conf
# Generated by resolvconf
nameserver 192.168.122.1
I believe that is the host IP, so I guess the VM is using the host for DNS
resolution? Interestingly, if I add `nameserver 8.8.8.8` below the line
with the host IP, it takes 10 seconds rather than 30 to reach the expected
domain resolution failure. If I put 8.8.8.8 above the host IP, the domain
resolution failure is instantaneous.
What does your host use as a namesever?
The nameserver is 127.0.0.53. It sets options edns0 and trust-ad, and
includes a search entry as well.
First, is that a typo and you mean 127.0.0.1:53?
No, the host's /etc/resolv.conf has `nameserver 127.0.0.53`, I just went
back and rechecked to be sure.
Post by Rodney W. Grimes
Second, is that name server locked to 127.0.0.1, or is it
actually listinging on *:53? If it is LOCKED you have no name server
running on 192.168.122.1 to be reached by the VM, if it is NOT locked
can the guest ping 192.168.122.1, and can it reach dns at that IP on
port 53? Can the host send a packet BACK to the guest?
I apologize but I don't really know enough about these things to know how
to answer your question. I did post the output of tcpdump on the VM and
the host a while back but that was for the invalid request, so that
probably doesn't capture what you're describing.
Post by Rodney W. Grimes
Third you can "fix" the "nameserver 192.168.122.1" entry in /etc/resolv.conf
by configuring the DHCP server that handed out the lease to the VM to send
a namserver entry of 8.8.8.8.
If I understand correctly, that is indeed what we've done as a Band-Aid fix
for the time being: I added the line `prepend_nameservers=8.8.8.8` to
/etc/resolvconf.conf.
Post by Rodney W. Grimes
Post by Alex Arslan
Post by Bakul Shah
Post by Alex Arslan
Not a particularly satisfying conclusion to this saga as I don't understand
why it's happening but at least I have a workaround that should hopefully
do the job. I really appreciate everyone's help and input thus far!
What's the best way to add `nameserver 8.8.8.8` to /etc/resolv.conf as
part of the VM's configuration?
You should diagnose the problem of the nameserver at 192.168.122.1
and fix it to act properly. I don't use vm (just bhyve) so can't help
you with its config.
I do still plan to try to figure out what the actual issue is, but I also
now have a path forward in the meantime. :)
--
Rodney W. Grimes
2024-08-14 18:29:38 UTC
Permalink
Post by Alex Arslan
This weird 127. address seems like a systemd feature/bug thing: https://unix.stackexchange.com/questions/612416/why-does-etc-resolv-conf-point-at-127-0-0-53
This behavior seems like some strange interaction between systemd assumptions and freebsd?s, or something not being set up quite right on the linux side when the vm is running freebsd.
Could libvirt be a factor here, do you think? For example, perhaps the
network should be configured differently than the default when the host
is using systemd-resolved and/or when the guest is FreeBSD. In the network
XML format for libvirt (https://libvirt.org/formatnetwork.html), there is
a `domain` element with a `localOnly` attribute that I have seen set by
some virtualization projects. As far as I can tell, our setup isn't using
the `domain` element at all.
Having a /etc/resolv.conf entry of 127.0.0.53 is indeed something
out of the normal on a freebsd box. You need to find where that
is coming from and why that value is used.
Post by Alex Arslan
Post by Alex Arslan
?
Hi Rodney,
Post by Rodney W. Grimes
Post by Alex Arslan
Post by Bakul Shah
Post by Alex Arslan
Post by Alex Arslan
$ cat /etc/resolv.conf
# Generated by resolvconf
nameserver 192.168.122.1
I believe that is the host IP, so I guess the VM is using the host for DNS
resolution? Interestingly, if I add `nameserver 8.8.8.8` below the line
with the host IP, it takes 10 seconds rather than 30 to reach the expected
domain resolution failure. If I put 8.8.8.8 above the host IP, the domain
resolution failure is instantaneous.
What does your host use as a namesever?
The nameserver is 127.0.0.53. It sets options edns0 and trust-ad, and
includes a search entry as well.
First, is that a typo and you mean 127.0.0.1:53?
No, the host's /etc/resolv.conf has `nameserver 127.0.0.53`, I just went
back and rechecked to be sure.
Post by Rodney W. Grimes
Second, is that name server locked to 127.0.0.1, or is it
actually listinging on *:53? If it is LOCKED you have no name server
running on 192.168.122.1 to be reached by the VM, if it is NOT locked
can the guest ping 192.168.122.1, and can it reach dns at that IP on
port 53? Can the host send a packet BACK to the guest?
I apologize but I don't really know enough about these things to know how
to answer your question. I did post the output of tcpdump on the VM and
the host a while back but that was for the invalid request, so that
probably doesn't capture what you're describing.
Post by Rodney W. Grimes
Third you can "fix" the "nameserver 192.168.122.1" entry in /etc/resolv.conf
by configuring the DHCP server that handed out the lease to the VM to send
a namserver entry of 8.8.8.8.
If I understand correctly, that is indeed what we've done as a Band-Aid fix
for the time being: I added the line `prepend_nameservers=8.8.8.8` to
/etc/resolvconf.conf.
Post by Rodney W. Grimes
Post by Alex Arslan
Post by Bakul Shah
Post by Alex Arslan
Not a particularly satisfying conclusion to this saga as I don't understand
why it's happening but at least I have a workaround that should hopefully
do the job. I really appreciate everyone's help and input thus far!
What's the best way to add `nameserver 8.8.8.8` to /etc/resolv.conf as
part of the VM's configuration?
You should diagnose the problem of the nameserver at 192.168.122.1
and fix it to act properly. I don't use vm (just bhyve) so can't help
you with its config.
I do still plan to try to figure out what the actual issue is, but I also
now have a path forward in the meantime. :)
--
--
Rod Grimes ***@freebsd.org


--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Alex Arslan
2024-08-14 23:38:04 UTC
Permalink
Post by Rodney W. Grimes
Post by Alex Arslan
This weird 127. address seems like a systemd feature/bug thing: https://unix.stackexchange.com/questions/612416/why-does-etc-resolv-conf-point-at-127-0-0-53
This behavior seems like some strange interaction between systemd assumptions and freebsd?s, or something not being set up quite right on the linux side when the vm is running freebsd.
Could libvirt be a factor here, do you think? For example, perhaps the
network should be configured differently than the default when the host
is using systemd-resolved and/or when the guest is FreeBSD. In the network
XML format for libvirt (https://libvirt.org/formatnetwork.html), there is
a `domain` element with a `localOnly` attribute that I have seen set by
some virtualization projects. As far as I can tell, our setup isn't using
the `domain` element at all.
Having a /etc/resolv.conf entry of 127.0.0.53 is indeed something
out of the normal on a freebsd box. You need to find where that
is coming from and why that value is used.
The 127.0.0.53 entry in /etc/resolv.conf is on the Linux host machine,
not on the FreeBSD VM. The host is using `systemd-resolved` for managing
its /etc/resolv.conf. In the VM, /etc/resolv.conf has the host IP by
default, and we added 8.8.8.8 so that it wouldn't take a full 30 seconds
to report a domain resolution failure.
Post by Rodney W. Grimes
Post by Alex Arslan
Post by Alex Arslan
?
Hi Rodney,
Post by Rodney W. Grimes
Post by Alex Arslan
Post by Bakul Shah
Post by Alex Arslan
Post by Alex Arslan
$ cat /etc/resolv.conf
# Generated by resolvconf
nameserver 192.168.122.1
I believe that is the host IP, so I guess the VM is using the host for DNS
resolution? Interestingly, if I add `nameserver 8.8.8.8` below the line
with the host IP, it takes 10 seconds rather than 30 to reach the expected
domain resolution failure. If I put 8.8.8.8 above the host IP, the domain
resolution failure is instantaneous.
What does your host use as a namesever?
The nameserver is 127.0.0.53. It sets options edns0 and trust-ad, and
includes a search entry as well.
First, is that a typo and you mean 127.0.0.1:53?
No, the host's /etc/resolv.conf has `nameserver 127.0.0.53`, I just went
back and rechecked to be sure.
Post by Rodney W. Grimes
Second, is that name server locked to 127.0.0.1, or is it
actually listinging on *:53? If it is LOCKED you have no name server
running on 192.168.122.1 to be reached by the VM, if it is NOT locked
can the guest ping 192.168.122.1, and can it reach dns at that IP on
port 53? Can the host send a packet BACK to the guest?
I apologize but I don't really know enough about these things to know how
to answer your question. I did post the output of tcpdump on the VM and
the host a while back but that was for the invalid request, so that
probably doesn't capture what you're describing.
Post by Rodney W. Grimes
Third you can "fix" the "nameserver 192.168.122.1" entry in /etc/resolv.conf
by configuring the DHCP server that handed out the lease to the VM to send
a namserver entry of 8.8.8.8.
If I understand correctly, that is indeed what we've done as a Band-Aid fix
for the time being: I added the line `prepend_nameservers=8.8.8.8` to
/etc/resolvconf.conf.
Post by Rodney W. Grimes
Post by Alex Arslan
Post by Bakul Shah
Post by Alex Arslan
Not a particularly satisfying conclusion to this saga as I don't understand
why it's happening but at least I have a workaround that should hopefully
do the job. I really appreciate everyone's help and input thus far!
What's the best way to add `nameserver 8.8.8.8` to /etc/resolv.conf as
part of the VM's configuration?
You should diagnose the problem of the nameserver at 192.168.122.1
and fix it to act properly. I don't use vm (just bhyve) so can't help
you with its config.
I do still plan to try to figure out what the actual issue is, but I also
now have a path forward in the meantime. :)
--
--
Alex Arslan
2024-08-15 05:50:55 UTC
Permalink
Post by Bakul Shah
Post by Alex Arslan
Post by Bakul Shah
You may want to run tcpdump on the host and at the same time
on a linux VM and see what happens. You can do the same thing
for a freebsd VM to try to narrow down where the problem lies.
I actually did that exactly a while back at the suggestion of someone on
https://lists.freebsd.org/archives/freebsd-hackers/2024-July/003409.html
That doesn't show Linux VM and Linux Host, only FreeBSD VM and Linux Host.
Oh, my apologies, I misread. Unfortunately I'm not able to install
additional VMs on this machine.

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Bakul Shah
2024-08-14 23:53:19 UTC
Permalink
In the VM, /etc/resolv.conf has the host IP by default
/etc/resolv.conf should always point to a dns server. Is the
host running a DNS service? If it is, it should respond pretty
quickly for a nonexistent hostname query. Why doesn't it?
If it is not running a DNS service, how did you arrive at this
decision to point to the host?

You may want to run tcpdump on the host and at the same time
on a linux VM and see what happens. You can do the same thing
for a freebsd VM to try to narrow down where the problem lies.



--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Alex Arslan
2024-08-15 04:41:29 UTC
Permalink
Post by Bakul Shah
In the VM, /etc/resolv.conf has the host IP by default
/etc/resolv.conf should always point to a dns server. Is the
host running a DNS service? If it is, it should respond pretty
quickly for a nonexistent hostname query. Why doesn't it?
If it is not running a DNS service, how did you arrive at this
decision to point to the host?
I didn't set it explicitly, it's what got configured automatically with
`sysrc ifconfig_DEFAULT=SYNCDHCP`. I'm unsure whether the host is running
a DNS service, and to be honest I don't know how to tell. `resolvectl`
on the host says the current DNS server is 8.8.8.8. It also lists 1.1.1.1
as an available server, as well as 4.4.4.4 as a fallback.
Post by Bakul Shah
You may want to run tcpdump on the host and at the same time
on a linux VM and see what happens. You can do the same thing
for a freebsd VM to try to narrow down where the problem lies.
I actually did that exactly a while back at the suggestion of someone on
this mailing list, and I posted the results to the thread:
https://lists.freebsd.org/archives/freebsd-hackers/2024-July/003409.html

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Bakul Shah
2024-08-15 05:13:56 UTC
Permalink
Post by Alex Arslan
Post by Bakul Shah
You may want to run tcpdump on the host and at the same time
on a linux VM and see what happens. You can do the same thing
for a freebsd VM to try to narrow down where the problem lies.
I actually did that exactly a while back at the suggestion of someone on
https://lists.freebsd.org/archives/freebsd-hackers/2024-July/003409.html
That doesn't show Linux VM and Linux Host, only FreeBSD VM and Linux Host.

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Alex Arslan
2024-07-30 16:15:40 UTC
Permalink
Some progress on this front...
If it happens frequently and you think it might be the network, perhaps try with the Intel em driver instead of the virtio network driver ?
I forgot to mention, we swapped the virtio network device to the Intel
e1000 device but haven't observed any difference.

I did manage to narrow down the issue to a more minimal reproducer, which
is just a simple request to the invalid domain using libcurl, but with a
30-second timeout. (The Julia code sets that timeout internally.) I tried
compiling the reproducer and running the executable wrapped with `time`
separately in the FreeBSD VM and on the Linux host. In the VM, it actually
sits and waits out the 30 seconds then errors with a timeout, and `time`
reports a real time of 30.289 seconds. On the host, it's instantaneous;
the resolution failure is reported immediately. I then tried setting the
timeout in the VM to 31 seconds, and it produces the expected resolution
failure with a real time of 30.420 seconds. I don't understand what's
going on here...



--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Jason Bacon
2024-07-31 12:45:55 UTC
Permalink
Post by Jason Bacon
Coincidentally, I'm experimenting with FreeBSD under Qemu on my Mac Mini
M1 and seeing about 93 mbits/sec in iperf, regardless of the NIC
configured.  ( VM to bare metal host )  Bare metal to bare metal shows
930 mbits/sec.
Never mind. There was an intermittent issue of some sort, possibly a
loose Ethernet cable? MacOS may have been falling back on WiFi during
my VM tests. I'm now seeing native NIC performance under Qemu with -nic
user,model=virtio:

<<<***@tarponvm.acadix>>> ~ 16 # iperf -c barracuda

------------------------------------------------------------

Client connecting to barracuda, TCP port 5001

TCP window size: 32.8 KByte (default)

------------------------------------------------------------

[ 1] local 10.0.2.15 port 60623 connected with 192.168.0.48 port 5001

[ ID] Interval Transfer Bandwidth

[ 1] 0.00-10.01 sec 1.09 GBytes 939 Mbits/sec

So Qemu is totally viable for running a high-performance FreeBSD VM on
Apple Silicon.

--
Life is a game. Play hard. Play fair. Have fun.
Post by Jason Bacon
�-y֮����'�\�ɩ��]�{ jװk&���y�
Alex Arslan
2024-08-02 22:52:55 UTC
Permalink
Post by Alex Arslan
Post by Bakul Shah
Post by Alex Arslan
Can you provide more context? I'm not seeing earlier messages anywhere in my email folders. Is this a Qemu issue?
https://lists.freebsd.org/archives/freebsd-hackers/2024-June/003378.html
Basically, we have FreeBSD 13.2 VMs running under KVM on a Linux machine.
Some code is using libcurl to make a request to an invalid domain and is
testing that the error is a resolution failure. This test passes on all
platforms except specifically in these FreeBSD VMs; I can't reproduce
locally on FreeBSD. That made me think that there's an issue with how the
VM was set up, prompting the original message and discussion. Then what
I recently found was that we set a 30-second timeout for the libcurl
request, which FreeBSD hits in the VM, as it evidently spends a full
30 seconds attempting to resolve the host, while e.g. Linux reports a
resolution failure immediately.
What does /etc/resolv.conf look like on the FreeBSD VM?
$ cat /etc/resolv.conf
# Generated by resolvconf
nameserver 192.168.122.1
I believe that is the host IP, so I guess the VM is using the host for DNS
resolution? Interestingly, if I add `nameserver 8.8.8.8` below the line
with the host IP, it takes 10 seconds rather than 30 to reach the expected
domain resolution failure. If I put 8.8.8.8 above the host IP, the domain
resolution failure is instantaneous.

Not a particularly satisfying conclusion to this saga as I don't understand
why it's happening but at least I have a workaround that should hopefully
do the job. I really appreciate everyone's help and input thus far!

What's the best way to add `nameserver 8.8.8.8` to /etc/resolv.conf as
part of the VM's configuration?
Bakul Shah
2024-08-03 00:58:44 UTC
Permalink
Post by Alex Arslan
Post by Alex Arslan
$ cat /etc/resolv.conf
# Generated by resolvconf
nameserver 192.168.122.1
I believe that is the host IP, so I guess the VM is using the host for DNS
resolution? Interestingly, if I add `nameserver 8.8.8.8` below the line
with the host IP, it takes 10 seconds rather than 30 to reach the expected
domain resolution failure. If I put 8.8.8.8 above the host IP, the domain
resolution failure is instantaneous.
What does your host use as a namesever?
Post by Alex Arslan
Not a particularly satisfying conclusion to this saga as I don't understand
why it's happening but at least I have a workaround that should hopefully
do the job. I really appreciate everyone's help and input thus far!
What's the best way to add `nameserver 8.8.8.8` to /etc/resolv.conf as
part of the VM's configuration?
You should diagnose the problem of the nameserver at 192.168.122.1
and fix it to act properly. I don't use vm (just bhyve) so can't help
you with its config.




--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Jason Bacon
2024-07-30 20:27:53 UTC
Permalink
Post by Alex Arslan
Some progress on this front...
If it happens frequently and you think it might be the network, perhaps try with the Intel em driver instead of the virtio network driver ?
I forgot to mention, we swapped the virtio network device to the Intel
e1000 device but haven't observed any difference.
I did manage to narrow down the issue to a more minimal reproducer, which
is just a simple request to the invalid domain using libcurl, but with a
30-second timeout. (The Julia code sets that timeout internally.) I tried
compiling the reproducer and running the executable wrapped with `time`
separately in the FreeBSD VM and on the Linux host. In the VM, it actually
sits and waits out the 30 seconds then errors with a timeout, and `time`
reports a real time of 30.289 seconds. On the host, it's instantaneous;
the resolution failure is reported immediately. I then tried setting the
timeout in the VM to 31 seconds, and it produces the expected resolution
failure with a real time of 30.420 seconds. I don't understand what's
going on here...
Can you provide more context? I'm not seeing earlier messages anywhere
in my email folders. Is this a Qemu issue?

Coincidentally, I'm experimenting with FreeBSD under Qemu on my Mac Mini
M1 and seeing about 93 mbits/sec in iperf, regardless of the NIC
configured. ( VM to bare metal host ) Bare metal to bare metal shows
930 mbits/sec.

--
Life is a game. Play hard. Play fair. Have fun.
Post by Alex Arslan
�-y֮����'�\�ɩ��]�{ jװk&���y�
Jason Bacon
2024-07-30 22:54:17 UTC
Permalink
Post by Alex Arslan
Can you provide more context? I'm not seeing earlier messages anywhere in my email folders. Is this a Qemu issue?
https://lists.freebsd.org/archives/freebsd-hackers/2024-June/003378.html
Basically, we have FreeBSD 13.2 VMs running under KVM on a Linux machine.
Some code is using libcurl to make a request to an invalid domain and is
testing that the error is a resolution failure. This test passes on all
platforms except specifically in these FreeBSD VMs; I can't reproduce
locally on FreeBSD. That made me think that there's an issue with how the
VM was set up, prompting the original message and discussion. Then what
I recently found was that we set a 30-second timeout for the libcurl
request, which FreeBSD hits in the VM, as it evidently spends a full
30 seconds attempting to resolve the host, while e.g. Linux reports a
resolution failure immediately.
Coincidentally, I'm experimenting with FreeBSD under Qemu on my Mac Mini M1 and seeing about 93 mbits/sec in iperf, regardless of the NIC configured. ( VM to bare metal host ) Bare metal to bare metal shows 930 mbits/sec.
That's interesting, can you show how you did that? I'm not familiar with
iperf (or most things in the realm of networking). Do you know why it's
so much slower?
Typical use is very simple:

One one machine: iperf -s
On the other: iperf -c hostname-of-1st-machine

The iperf man page shows other options if you want to fine-tune.
--
Life is a game. Play hard. Play fair. Have fun.



--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Jason Bacon
2024-08-03 11:53:45 UTC
Permalink
Post by Alex Arslan
What's the best way to add `nameserver 8.8.8.8` to /etc/resolv.conf as
part of the VM's configuration?
You might be looking for resolvconf.conf. Not enough information posted
to be sure, but that's where you override name servers auto-configured
by DHCP.
--
Life is a game. Play hard. Play fair. Have fun.



--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Loading...