Mark Millard
2023-11-10 01:26:57 UTC
Reading some benchmark results for compilation activity that showed some
SMT vs. not examples and also using my C++ variant of the old HINT
benchmark, I ended up curious how a non-SMT from scratch bulk -a would
end up (ZFS context) compared my prior SMT based run.
I use a high load average style of bulk -a activity that has USE_TMPFS=all
involved. The system has 96 GiBytes of RAM (total across the 2 DIMMs).
The original under 1.5 day time definitely had significant swap space use
(RAM+SWAP = 96 GiBYtes + 364 GiBytes == 460 GiBytes == 471040 MiBytes).
The media was (and is) a PCIe based Optane 905P 1.5T. ZFS on a single
partition on the single drive, ZFS used just for bectl reasons, not other
typical use-ZFS reasons. I've not controlled the ARC size-range explicitly.
So less swap partition use is part of contribution to the results.
The original bulk -a spent a couple of hours at the end where it was
just fetching and building textproc/stardict-quick . I have not cleared
out /usr/ports/distfiles or updated anything.
So fetch time is also a difference here.
SMT (32 hardware threads, original bulk -a):
[33:10:00] [32] [04:37:23] Finished emulators/libretro-mame | libretro-mame-20220124_1: Success
[35:36:51] [23] [03:44:04] Finished textproc/stardict-quick | stardict-quick-2.4.2_9: Success
. . .
[main-amd64-bulk_a-default] [2023-11-01_07h14m50s] [committing:] Queued: 34683 Built: 33826 Failed: 179 Skipped: 358 Ignored: 320 Fetched: 0 Tobuild: 0 Time: 35:37:55
Swap-involved MaxObs (Max Observed) figures:
173310Mi MaxObsUsed
256332Mi MaxObs(Act+Lndry+SwapUsed)
265551Mi MaxObs(Act+Wir+Lndry+SwapUsed)
(So 265551Mi of 471040Mi RAM+SWAP.)
Just-RAM MaxObs figures:
81066Mi MaxObsActive
(Given the complications of getting usefully comparable wired figures for ZFS (ARC): omit.)
94493Mi MaxObs(Act+Wir+Lndry)
Note: MaxObs(A+B+C) <= MaxObs(A)+MaxObs(B)+MaxObs(C)
ALLOW_MAKE_JOBS=yes was used. No explicit restriction on PARALLEL_JOBS
or MAKE_JOBS_NUMBER (or analogous). So 32 builders allowed, each allowed
32 make jobs. This explains the high load averages of the bulk -a :
load averages . . . MaxObs: 360.70, 267.63, 210.84
(Those need not be all from the same time frame during the bulk -a .)
As for the ports vintage:
# ~/fbsd-based-on-what-commit.sh -C /usr/ports/
6ec8e3450b29 (HEAD -> main, freebsd/main, freebsd/HEAD) devel/sdts++: Mark DEPRECATED
Author: Muhammad Moinur Rahman <***@FreeBSD.org>
Commit: Muhammad Moinur Rahman <***@FreeBSD.org>
CommitDate: 2023-10-21 19:01:38 +0000
branch: main
merge-base: 6ec8e3450b29462a590d09fb0b07ed214d456bd5
merge-base: CommitDate: 2023-10-21 19:01:38 +0000
n637598 (--first-parent --count for merge-base)
I do have a environment that avoids various LLVM builds taking
as long to build :
llvm1[3-7] : no MLIR, no FLANG
llvm1[4-7] : use BE_NATIVE
other llvm* : use defaults (so, no avoidance)
I also prevent the builds from using strip on most of the install
materials built (not just toolchain materials).
non-SMT (16 hardware threads):
Note one builder (math/fricas), the last still present, was
stuck and I had to kill processes to have it stop unless I
was willing to wiat for my large timeout figures. The last
builder normal-finish was:
[39:48:10] [09] [00:16:23] Finished devel/gcc-msp430-ti-toolchain | gcc-msp430-ti-toolchain-9.3.1.2.20210722_1: Success
So, trying to place some bounds for comparing to SMT (32 hw threads)
and non-SMT (16 hw threads):
33:10:00 SMT -> 39:48:10 non-SMT would be over 6.5 hrs longer for non-SMT
35:36:51 SMT -> 39:48:10 non-SMT would be over 4 hrs longer for non-SMT
As for SMT vs. non-SMT Maximum Observed figures:
SMT load averages . . . MaxObs: 360.70, 267.63, 210.84
non-SMT load averages . . . MaxObs: 152.89, 100.94, 76.28
Swap-involved MaxObs figures for SMT (32 hw threads) vs not (16):
173310Mi vs. 33003Mi MaxObsUsed
256332Mi vs. 117221Mi MaxObs(Act+Lndry+SwapUsed)
265551Mi vs. 124776Mi MaxObs(Act+Wir+Lndry+SwapUsed)
Just-RAM MaxObs figures for SMT (32 hw threads) vs not (16):
81066Mi vs. 69763Mi MaxObsActive
(Given the complications of getting usefully comparable wired figures for ZFS (ARC): omit.)
94493Mi vs. 94303Mi MaxObs(Act+Wir+Lndry)
===
Mark Millard
marklmi at yahoo.com
--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
SMT vs. not examples and also using my C++ variant of the old HINT
benchmark, I ended up curious how a non-SMT from scratch bulk -a would
end up (ZFS context) compared my prior SMT based run.
I use a high load average style of bulk -a activity that has USE_TMPFS=all
involved. The system has 96 GiBytes of RAM (total across the 2 DIMMs).
The original under 1.5 day time definitely had significant swap space use
(RAM+SWAP = 96 GiBYtes + 364 GiBytes == 460 GiBytes == 471040 MiBytes).
The media was (and is) a PCIe based Optane 905P 1.5T. ZFS on a single
partition on the single drive, ZFS used just for bectl reasons, not other
typical use-ZFS reasons. I've not controlled the ARC size-range explicitly.
So less swap partition use is part of contribution to the results.
The original bulk -a spent a couple of hours at the end where it was
just fetching and building textproc/stardict-quick . I have not cleared
out /usr/ports/distfiles or updated anything.
So fetch time is also a difference here.
SMT (32 hardware threads, original bulk -a):
[33:10:00] [32] [04:37:23] Finished emulators/libretro-mame | libretro-mame-20220124_1: Success
[35:36:51] [23] [03:44:04] Finished textproc/stardict-quick | stardict-quick-2.4.2_9: Success
. . .
[main-amd64-bulk_a-default] [2023-11-01_07h14m50s] [committing:] Queued: 34683 Built: 33826 Failed: 179 Skipped: 358 Ignored: 320 Fetched: 0 Tobuild: 0 Time: 35:37:55
Swap-involved MaxObs (Max Observed) figures:
173310Mi MaxObsUsed
256332Mi MaxObs(Act+Lndry+SwapUsed)
265551Mi MaxObs(Act+Wir+Lndry+SwapUsed)
(So 265551Mi of 471040Mi RAM+SWAP.)
Just-RAM MaxObs figures:
81066Mi MaxObsActive
(Given the complications of getting usefully comparable wired figures for ZFS (ARC): omit.)
94493Mi MaxObs(Act+Wir+Lndry)
Note: MaxObs(A+B+C) <= MaxObs(A)+MaxObs(B)+MaxObs(C)
ALLOW_MAKE_JOBS=yes was used. No explicit restriction on PARALLEL_JOBS
or MAKE_JOBS_NUMBER (or analogous). So 32 builders allowed, each allowed
32 make jobs. This explains the high load averages of the bulk -a :
load averages . . . MaxObs: 360.70, 267.63, 210.84
(Those need not be all from the same time frame during the bulk -a .)
As for the ports vintage:
# ~/fbsd-based-on-what-commit.sh -C /usr/ports/
6ec8e3450b29 (HEAD -> main, freebsd/main, freebsd/HEAD) devel/sdts++: Mark DEPRECATED
Author: Muhammad Moinur Rahman <***@FreeBSD.org>
Commit: Muhammad Moinur Rahman <***@FreeBSD.org>
CommitDate: 2023-10-21 19:01:38 +0000
branch: main
merge-base: 6ec8e3450b29462a590d09fb0b07ed214d456bd5
merge-base: CommitDate: 2023-10-21 19:01:38 +0000
n637598 (--first-parent --count for merge-base)
I do have a environment that avoids various LLVM builds taking
as long to build :
llvm1[3-7] : no MLIR, no FLANG
llvm1[4-7] : use BE_NATIVE
other llvm* : use defaults (so, no avoidance)
I also prevent the builds from using strip on most of the install
materials built (not just toolchain materials).
non-SMT (16 hardware threads):
Note one builder (math/fricas), the last still present, was
stuck and I had to kill processes to have it stop unless I
was willing to wiat for my large timeout figures. The last
builder normal-finish was:
[39:48:10] [09] [00:16:23] Finished devel/gcc-msp430-ti-toolchain | gcc-msp430-ti-toolchain-9.3.1.2.20210722_1: Success
So, trying to place some bounds for comparing to SMT (32 hw threads)
and non-SMT (16 hw threads):
33:10:00 SMT -> 39:48:10 non-SMT would be over 6.5 hrs longer for non-SMT
35:36:51 SMT -> 39:48:10 non-SMT would be over 4 hrs longer for non-SMT
As for SMT vs. non-SMT Maximum Observed figures:
SMT load averages . . . MaxObs: 360.70, 267.63, 210.84
non-SMT load averages . . . MaxObs: 152.89, 100.94, 76.28
Swap-involved MaxObs figures for SMT (32 hw threads) vs not (16):
173310Mi vs. 33003Mi MaxObsUsed
256332Mi vs. 117221Mi MaxObs(Act+Lndry+SwapUsed)
265551Mi vs. 124776Mi MaxObs(Act+Wir+Lndry+SwapUsed)
Just-RAM MaxObs figures for SMT (32 hw threads) vs not (16):
81066Mi vs. 69763Mi MaxObsActive
(Given the complications of getting usefully comparable wired figures for ZFS (ARC): omit.)
94493Mi vs. 94303Mi MaxObs(Act+Wir+Lndry)
===
Mark Millard
marklmi at yahoo.com
--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de