Shrikanth Kamath
2023-08-03 07:32:22 UTC
A background on the query,
Trying to catch a memory âspikeâ trigger using DTrace, refer here two âtopâ
snapshots captured during a 2 minute window,
last pid: 89900; load averages: 0.75, 0.91, 0.94 up 39+00:37:30
20:03:14
Mem: 5575M Active, 2152M Inact, 4731M Laundry, 3044M Wired, 1151M Buf, 382M
Free
Swap: 8192M Total, 1058M Used, 7134M Free, 12% Inuse
PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND
12043 root 5 35 0 11G 9747M kqread 3
128.8H 23.34% app1
12051 root 1 20 0 3089M 2274M select 1 22:51
0.00% app2
last pid: 90442; load averages: 1.50, 1.12, 1.02 up 39+00:39:37
20:05:21
Mem: 8549M Active, 631M Inact, 3340M Laundry, 3159M Wired, 1252M Buf, 359M
Free
Swap: 8192M Total, 1894M Used, 6298M Free, 23% Inuse
PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND
12043 root 5 24 0 11G 9445M kqread 2
128.8H 10.45% app1
12051 root 1 20 0 3089M 2173M select 3 22:51
0.00% app2
The spike is ~3G in Active pages, the two large applications have a
combined resident size of ~12G. The resident size of the applications
hasnât changed between these 2 readings, however there is a tar archive and
gzip on a large directory during that window likely causing a reshuffle. If
I count the page allocs and dequeue by execname with DTrace, I see
tar/vmstat which probably alloc and quickly dequeue, along with a large
dequeue being undertaken by bufdaemon and pagedaemon.
fbt::vm_page_alloc*:entry
{
@cnt[execname] = count();
}
fbt::vm_page_dequeue:entry
{
@dcnt[execname] = count();
}
Page Alloc
vmstat
20222
tar 21284
Page Dequeue
vmstat 20114
bufdaemon 21402
tar 21635
pagedaemon 360387
Since the tar / vmstat will not hold the pages in Active, I need to find
out what application had its pages queued in Active page queue.
Is it possible that the system is just moving the LRU pages of these two
large applications into the inactive queue prior to addressing memory
pressure? Do these applications need to activate those pages later and
hence it brings it back into the Active queue? How do I watch this in
action by using DTrace? Will the following probe catch this trigger?
fbt::vm_page_activate:entry
{
@cnt[execname, pid] = count();
}
tick-10sec
{
printa(@cnt);
printf("ACTIVE[%d] pages\n", `vm_dom[0].vmd_pagequeues[1].pq_cnt);
}
*** This system is running only one vmdomain (# sysctl vm.ndomains â>
vm.ndomains: 1).
*** running release 12.1, on an amd64 kernel. The physical memory installed
is 16G.
Regards,
Trying to catch a memory âspikeâ trigger using DTrace, refer here two âtopâ
snapshots captured during a 2 minute window,
last pid: 89900; load averages: 0.75, 0.91, 0.94 up 39+00:37:30
20:03:14
Mem: 5575M Active, 2152M Inact, 4731M Laundry, 3044M Wired, 1151M Buf, 382M
Free
Swap: 8192M Total, 1058M Used, 7134M Free, 12% Inuse
PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND
12043 root 5 35 0 11G 9747M kqread 3
128.8H 23.34% app1
12051 root 1 20 0 3089M 2274M select 1 22:51
0.00% app2
last pid: 90442; load averages: 1.50, 1.12, 1.02 up 39+00:39:37
20:05:21
Mem: 8549M Active, 631M Inact, 3340M Laundry, 3159M Wired, 1252M Buf, 359M
Free
Swap: 8192M Total, 1894M Used, 6298M Free, 23% Inuse
PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND
12043 root 5 24 0 11G 9445M kqread 2
128.8H 10.45% app1
12051 root 1 20 0 3089M 2173M select 3 22:51
0.00% app2
The spike is ~3G in Active pages, the two large applications have a
combined resident size of ~12G. The resident size of the applications
hasnât changed between these 2 readings, however there is a tar archive and
gzip on a large directory during that window likely causing a reshuffle. If
I count the page allocs and dequeue by execname with DTrace, I see
tar/vmstat which probably alloc and quickly dequeue, along with a large
dequeue being undertaken by bufdaemon and pagedaemon.
fbt::vm_page_alloc*:entry
{
@cnt[execname] = count();
}
fbt::vm_page_dequeue:entry
{
@dcnt[execname] = count();
}
Page Alloc
vmstat
20222
tar 21284
Page Dequeue
vmstat 20114
bufdaemon 21402
tar 21635
pagedaemon 360387
Since the tar / vmstat will not hold the pages in Active, I need to find
out what application had its pages queued in Active page queue.
Is it possible that the system is just moving the LRU pages of these two
large applications into the inactive queue prior to addressing memory
pressure? Do these applications need to activate those pages later and
hence it brings it back into the Active queue? How do I watch this in
action by using DTrace? Will the following probe catch this trigger?
fbt::vm_page_activate:entry
{
@cnt[execname, pid] = count();
}
tick-10sec
{
printa(@cnt);
printf("ACTIVE[%d] pages\n", `vm_dom[0].vmd_pagequeues[1].pq_cnt);
}
*** This system is running only one vmdomain (# sysctl vm.ndomains â>
vm.ndomains: 1).
*** running release 12.1, on an amd64 kernel. The physical memory installed
is 16G.
Regards,
--
Shrikanth R K
Shrikanth R K