Collecting metrics¶
This page describes methods of Procfs [1] process (and thread) metrics collection implemented in Procpath. Usually an analysis of an issue with Procpath is a 2-step process:
collect data on relevant processes
analyse the collected data visually and/or with SQL
Procpath can collect process metrics from any Linux system that can run
Python, which includes Android (e.g. via Termux [2]), arm64
NAS devices,
GitLab pipeline jobs, containers, and usual server and desktop machines.
Snapshot¶
procpath query provides JSON point-in-time slices of the process tree running on the target Linux system. It’s useful for answering:
specific questions about the process/fields in its the JSON document (how many open file descriptors does this process have?)
$ procpath query -f stat,fd --indent 2 \ '$..children[?(@.stat.pid == 42 and @.pop\("children", 1\))]..fd' [ { "anon": 12, "blk": 0, "chr": 7, "dir": 0, "fifo": 4, "lnk": 0, "reg": 118, "sock": 36 } ]
Note
@.pop("children", 1)
can be used to get rid of descendants of the matched process unless they match themselvesprocess hierarchy questions (what are the PIDs of all descendants of this process?),
$ procpath query -d, "..children[?(@.stat.pid == 42)]..pid" 7342,7733,7931,78880,78884
counting processes (how many Celery workers are running on the server?),
$ procpath query -d $'\n' \ '$..children[?("celery worker" in @.cmdline)].stat.comm' | wc -l 97
calculating aggregates (how much main memory does this docker-compose stack consume?),
$ L=$(docker ps -f status=running -f name='^project_name' -q | xargs -I{} -- \ docker inspect -f '{{.State.Pid}}' {} | tr '\n' ,) $ procpath query "$..children[?(@.stat.pid in [$L])]" \ 'SELECT SUM(stat_rss) / 1024.0 * 4 "RSS MiB" FROM record' [{"RSS MiB": 390.515625}]
It also comes at handy for crafting JSONPath queries for procpath record (see below).
As demonstrated by the examples above procpath query
accepts two positional
argument for the JSONPath and SQL query (see Design for details on the
dialects). Both are optional.
To use only SQL pass empty string for the JSONPath (what is the sum of proportional set sizes of all process on the system?).
$ sudo procpath query -f stat,smaps_rollup \
'' 'SELECT SUM(smaps_rollup_pss) / 1024.0 "PSS MiB" FROM record'
[{"PSS MiB": 4007.9482421875}]
Note
To read smaps_rollup
and some other procfiles you may need to be
the owner of the process (or root):
$ ls -l /proc/1/smaps_rollup
-r--r--r-- 1 root root 0 Sep 3 19:54 /proc/1/smaps_rollup
When a SQL query is specified the tree is flattened to a table (see Data model for details).
Timeline¶
procpath record essentially does the same as
procpath query "..." "SELECT * FROM record"
but instead of an ephemeral
SQLite database, it creates a persistent one and saves snapshots there in
specified intervals. JSONPath can be specified too to narrow down the process
tree, and SQL queries can be run on the result database (also while it’s being
recorded).
The most basic form of JSONPath for procpath record
is selecting a subtree
by a PID i.e. all descendant processes including the one with the PID
(record snapshots of the process subtree of PID 2610 every second until it
exists).
procpath record -i 1 --stop-without-result -d subtree.sqlite \
'$..children[?(@.stat.pid == 2610)]'
Note
JSONPath query used for procpath record
must yield full process
documents. I.e. $..children[?(@.stat.pid == 2610)]
, not
$..children[?(@.stat.pid == 2610)]..pid
.
Tip
To spawn a process in background and to get its PID, a special shell
parameter $!
[3] can be used:
$ xz -9 /some/big/database.sqlite &
[1] 1451603
$ PID=$!
$ procpath record -i 0.1 -f stat -d xz_analysis.sqlite \
--stop-without-result -p $PID "$..children[?(@.stat.pid == $PID)]"
[1]+ Done xz -9 /some/big/database.sqlite
Alternatively procpath watch
can cover full target process
measurement life-cycle.
Additionally procpath record
supports --pid-list
argument which
is a pre-filter which specifies PIDs of branches to keep in the tree before
reading procfiles other than stat
and before running a JSONPath against it.
It minimises resources needed to Procpath which is relevant when it records
multiple procfiles at sub-second intervals. For instance, having on a system
this tree:
PID 1
├─ PID 2
├─ PID 3
│ └─ PID 4
└─ PID 5
└─ PID 6
├─ PID 7
├─ PID 8
└─ PID 9
procpath record -f stat,io,status,fd,smaps_rollup --pid-list 2,3 ...
will
only read easy-to-parse stat
procfiles for all processes, and the rest
procfiles only for the processes below (including running a JSONPath query
against a smaller tree, if specified):
PID 1
├─ PID 2
└─ PID 3
└─ PID 4
Besides PID hierarchy JSONPapth queries, other types of filters can be formulated (record once a second for a minute all processes that have resident set size bigger than 512 MiB).
procpath record -i 1 -r 60 -d hog.sqlite \
'$..children[?(@.stat.rss > 512 * 1024 / 4 and @.pop\("children", 1\))]'
Note
stat.rss
is usually measured in 4 KiB memory pages, see
meta.page_size
in Data model for more details.
Tasks¶
Procfs exposes information on both, processes and their threads. The Linux
kernel represents both internally as tasks [4], but in Procfs top-level
files, like /proc/{pid}/stat
, represent aggregate task metrics (process),
and bottom-level files represent individual task metrics (thread), like
/proc/{tgid}/task/{pid}/stat
.
Procpath has two Procfs target objects (specified with --procfs-target
argument to applicable sub-commands):
process
means collecting PIDs from directories in/proc
thread
means collecting PIDs from directories in/proc/*/task
The following fields are thread-specific with the thread
target
(non-exhaustive):
Status.pid
(identifier of the thread)Status.tgid
(identifier of the process and main thread)all
Io
counters
To discriminate between processes and threads Status.pid
and Status.tgid
fields can be compared. For instance, the
following command takes a snapshot of all threads in the system.
procpath query -f stat,status --procfs-target thread \
'$..children[?(@.status.pid != @.status.tgid)]'
Warning
Thread target recording can easily provide an order of magnitude more data
(as can be threads on a system) than for processes. However these data may
have no additional information. E.g. smaps_rollup
and fd
would be
slow to collect for all thread, but the data would be the same in each PID.
I.e. these procfiles are only practical for the process target. See more in
Procfs target files.
Remote¶
Procpath is designed to collect metrics from a local Linux system. However, because the path at which Procfs is read is configurable, Procfs snapshot analysis and remote recording is possible (albeit with limited frequency and features).
procpath record -f stat -i 60 --procfs /mnt/remote_proc -d remote.sqlite
Tip
To mount a remote Procfs over SSH use sshfs
(apt
-installable) like:
mkdir remote_proc
sshfs -o direct_io user@example.com:/proc remote_proc