Collecting metrics¶
This page describes methods of Procfs [1] process (and thread) metrics collection implemented in Procpath. Usually an analysis of an issue with Procpath is a 2-step process:
collect data on relevant processes
analyse the collected data visually and/or with SQL
Procpath can collect process metrics from any Linux system that can run
Python, which includes Android (e.g. via Termux [2]), arm64 NAS devices,
GitLab pipeline jobs, containers, and usual server and desktop machines.
Snapshot¶
procpath query provides JSON point-in-time slices of the process tree running on the target Linux system. It’s useful for answering:
specific questions about the process/fields in its the JSON document (how many open file descriptors does this process have?)
$ procpath query -f stat,fd --indent 2 \ '$..children[?(@.stat.pid == 42 and @.pop\("children", 1\))]..fd' [ { "anon": 12, "blk": 0, "chr": 7, "dir": 0, "fifo": 4, "lnk": 0, "reg": 118, "sock": 36 } ]Note
@.pop("children", 1)can be used to get rid of descendants of the matched process unless they match themselvesprocess hierarchy questions (what are the PIDs of all descendants of this process?),
$ procpath query -d, "..children[?(@.stat.pid == 42)]..pid" 7342,7733,7931,78880,78884
counting processes (how many Celery workers are running on the server?),
$ procpath query -d $'\n' \ '$..children[?("celery worker" in @.cmdline)].stat.comm' | wc -l 97calculating aggregates (how much main memory does this docker-compose stack consume?),
$ L=$(docker ps -f status=running -f name='^project_name' -q | xargs -I{} -- \ docker inspect -f '{{.State.Pid}}' {} | tr '\n' ,) $ procpath query "$..children[?(@.stat.pid in [$L])]" \ 'SELECT SUM(stat_rss) / 1024.0 * 4 "RSS MiB" FROM record' [{"RSS MiB": 390.515625}]
It also comes at handy for crafting JSONPath queries for procpath record (see below).
As demonstrated by the examples above procpath query accepts two positional
argument for the JSONPath and SQL query (see Design for details on the
dialects). Both are optional.
To use only SQL pass empty string for the JSONPath (what is the sum of proportional set sizes of all process on the system?).
$ sudo procpath query -f stat,smaps_rollup \
'' 'SELECT SUM(smaps_rollup_pss) / 1024.0 "PSS MiB" FROM record'
[{"PSS MiB": 4007.9482421875}]
Note
To read smaps_rollup and some other procfiles you may need to be
the owner of the process (or root):
$ ls -l /proc/1/smaps_rollup
-r--r--r-- 1 root root 0 Sep 3 19:54 /proc/1/smaps_rollup
When a SQL query is specified the tree is flattened to a table (see Data model for details).
Timeline¶
procpath record essentially does the same as
procpath query "..." "SELECT * FROM record" but instead of an ephemeral
SQLite database, it creates a persistent one and saves snapshots there in
specified intervals. JSONPath can be specified too to narrow down the process
tree, and SQL queries can be run on the result database (also while it’s being
recorded).
The most basic form of JSONPath for procpath record is selecting a subtree
by a PID i.e. all descendant processes including the one with the PID
(record snapshots of the process subtree of PID 2610 every second until it
exists).
procpath record -i 1 --stop-without-result -d subtree.sqlite \
'$..children[?(@.stat.pid == 2610)]'
Note
JSONPath query used for procpath record must yield full process
documents. I.e. $..children[?(@.stat.pid == 2610)], not
$..children[?(@.stat.pid == 2610)]..pid.
Tip
To spawn a process in background and to get its PID, a special shell
parameter $! [3] can be used:
$ xz -9 /some/big/database.sqlite &
[1] 1451603
$ PID=$!
$ procpath record -i 0.1 -f stat -d xz_analysis.sqlite \
--stop-without-result -p $PID "$..children[?(@.stat.pid == $PID)]"
[1]+ Done xz -9 /some/big/database.sqlite
Alternatively procpath watch can cover full target process
measurement life-cycle.
Additionally procpath record supports --pid-list argument which
is a pre-filter which specifies PIDs of branches to keep in the tree before
reading procfiles other than stat and before running a JSONPath against it.
It minimises resources needed to Procpath which is relevant when it records
multiple procfiles at sub-second intervals. For instance, having on a system
this tree:
PID 1
├─ PID 2
├─ PID 3
│ └─ PID 4
└─ PID 5
└─ PID 6
├─ PID 7
├─ PID 8
└─ PID 9
procpath record -f stat,io,status,fd,smaps_rollup --pid-list 2,3 ... will
only read easy-to-parse stat procfiles for all processes, and the rest
procfiles only for the processes below (including running a JSONPath query
against a smaller tree, if specified):
PID 1
├─ PID 2
└─ PID 3
└─ PID 4
Besides PID hierarchy JSONPapth queries, other types of filters can be formulated (record once a second for a minute all processes that have resident set size bigger than 512 MiB).
procpath record -i 1 -r 60 -d hog.sqlite \
'$..children[?(@.stat.rss > 512 * 1024 / 4 and @.pop\("children", 1\))]'
Note
stat.rss is usually measured in 4 KiB memory pages, see
meta.page_size in Data model for more details.
Tasks¶
Procfs exposes information on both, processes and their threads. The Linux
kernel represents both internally as tasks [4], but in Procfs top-level
files, like /proc/{pid}/stat, represent aggregate task metrics (process),
and bottom-level files represent individual task metrics (thread), like
/proc/{tgid}/task/{pid}/stat.
Procpath has two Procfs target objects (specified with --procfs-target
argument to applicable sub-commands):
processmeans collecting PIDs from directories in/procthreadmeans collecting PIDs from directories in/proc/*/task
The following fields are thread-specific with the thread target
(non-exhaustive):
Status.pid(identifier of the thread)Status.tgid(identifier of the process and main thread)all
Iocounters
To discriminate between processes and threads Status.pid and Status.tgid fields can be compared. For instance, the
following command takes a snapshot of all threads in the system.
procpath query -f stat,status --procfs-target thread \
'$..children[?(@.status.pid != @.status.tgid)]'
Warning
Thread target recording can easily provide an order of magnitude more data
(as can be threads on a system) than for processes. However these data may
have no additional information. E.g. smaps_rollup and fd would be
slow to collect for all thread, but the data would be the same in each PID.
I.e. these procfiles are only practical for the process target. See more in
Procfs target files.
Remote¶
Procpath is designed to collect metrics from a local Linux system. However, because the path at which Procfs is read is configurable, Procfs snapshot analysis and remote recording is possible (albeit with limited frequency and features).
procpath record -f stat -i 60 --procfs /mnt/remote_proc -d remote.sqlite
Tip
To mount a remote Procfs over SSH use sshfs (apt-installable) like:
mkdir remote_proc
sshfs -o direct_io user@example.com:/proc remote_proc