Collecting metrics

This page describes methods of Procfs [1] process (and thread) metrics collection implemented in Procpath. Usually an analysis of an issue with Procpath is a 2-step process:

  1. collect data on relevant processes

  2. analyse the collected data visually and/or with SQL

Procpath can collect process metrics from any Linux system that can run Python, which includes Android (e.g. via Termux [2]), arm64 NAS devices, GitLab pipeline jobs, containers, and usual server and desktop machines.

Snapshot

procpath query provides JSON point-in-time slices of the process tree running on the target Linux system. It’s useful for answering:

  • specific questions about the process/fields in its the JSON document (how many open file descriptors does this process have?)

    $ procpath query -f stat,fd --indent 2 \
      '$..children[?(@.stat.pid == 42 and @.pop\("children", 1\))]..fd'
    [
      {
        "anon": 12,
        "blk": 0,
        "chr": 7,
        "dir": 0,
        "fifo": 4,
        "lnk": 0,
        "reg": 118,
        "sock": 36
      }
    ]
    

    Note

    @.pop("children", 1) can be used to get rid of descendants of the matched process unless they match themselves

  • process hierarchy questions (what are the PIDs of all descendants of this process?),

    $ procpath query -d, "..children[?(@.stat.pid == 42)]..pid"
    7342,7733,7931,78880,78884
    
  • counting processes (how many Celery workers are running on the server?),

    $ procpath query -d $'\n' \
        '$..children[?("celery worker" in @.cmdline)].stat.comm' | wc -l
    97
    
  • calculating aggregates (how much main memory does this docker-compose stack consume?),

    $ L=$(docker ps -f status=running -f name='^project_name' -q | xargs -I{} -- \
        docker inspect -f '{{.State.Pid}}' {} | tr '\n' ,)
    $ procpath query "$..children[?(@.stat.pid in [$L])]" \
        'SELECT SUM(stat_rss) / 1024.0 * 4 "RSS MiB" FROM record'
    [{"RSS MiB": 390.515625}]
    

It also comes at handy for crafting JSONPath queries for procpath record (see below).

As demonstrated by the examples above procpath query accepts two positional argument for the JSONPath and SQL query (see Design for details on the dialects). Both are optional.

To use only SQL pass empty string for the JSONPath (what is the sum of proportional set sizes of all process on the system?).

$ sudo procpath query -f stat,smaps_rollup \
  '' 'SELECT SUM(smaps_rollup_pss) / 1024.0 "PSS MiB" FROM record'
[{"PSS MiB": 4007.9482421875}]

Note

To read smaps_rollup and some other procfiles you may need to be the owner of the process (or root):

$ ls -l /proc/1/smaps_rollup
-r--r--r-- 1 root root 0 Sep  3 19:54 /proc/1/smaps_rollup

When a SQL query is specified the tree is flattened to a table (see Data model for details).

Timeline

procpath record essentially does the same as procpath query "..." "SELECT * FROM record" but instead of an ephemeral SQLite database, it creates a persistent one and saves snapshots there in specified intervals. JSONPath can be specified too to narrow down the process tree, and SQL queries can be run on the result database (also while it’s being recorded).

The most basic form of JSONPath for procpath record is selecting a subtree by a PID i.e. all descendant processes including the one with the PID (record snapshots of the process subtree of PID 2610 every second until it exists).

procpath record -i 1 --stop-without-result -d subtree.sqlite \
  '$..children[?(@.stat.pid == 2610)]'

Note

JSONPath query used for procpath record must yield full process documents. I.e. $..children[?(@.stat.pid == 2610)], not $..children[?(@.stat.pid == 2610)]..pid.

Tip

To spawn a process in background and to get its PID, a special shell parameter $! [3] can be used:

$ xz -9 /some/big/database.sqlite &
[1] 1451603
$ PID=$!
$ procpath record -i 0.1 -f stat -d xz_analysis.sqlite  \
    --stop-without-result -p $PID "$..children[?(@.stat.pid == $PID)]"
[1]+  Done    xz -9 /some/big/database.sqlite

Alternatively procpath watch can cover full target process measurement life-cycle.

Additionally procpath record supports --pid-list argument which is a pre-filter which specifies PIDs of branches to keep in the tree before reading procfiles other than stat and before running a JSONPath against it. It minimises resources needed to Procpath which is relevant when it records multiple procfiles at sub-second intervals. For instance, having on a system this tree:

PID 1
├─ PID 2
├─ PID 3
│  └─ PID 4
└─ PID 5
   └─ PID 6
      ├─ PID 7
      ├─ PID 8
      └─ PID 9

procpath record -f stat,io,status,fd,smaps_rollup --pid-list 2,3 ... will only read easy-to-parse stat procfiles for all processes, and the rest procfiles only for the processes below (including running a JSONPath query against a smaller tree, if specified):

PID 1
├─ PID 2
└─ PID 3
   └─ PID 4

Besides PID hierarchy JSONPapth queries, other types of filters can be formulated (record once a second for a minute all processes that have resident set size bigger than 512 MiB).

procpath record -i 1 -r 60 -d hog.sqlite \
  '$..children[?(@.stat.rss > 512 * 1024 / 4 and @.pop\("children", 1\))]'

Note

stat.rss is usually measured in 4 KiB memory pages, see meta.page_size in Data model for more details.

Tasks

Procfs exposes information on both, processes and their threads. The Linux kernel represents both internally as tasks [4], but in Procfs top-level files, like /proc/{pid}/stat, represent aggregate task metrics (process), and bottom-level files represent individual task metrics (thread), like /proc/{tgid}/task/{pid}/stat.

Procpath has two Procfs target objects (specified with --procfs-target argument to applicable sub-commands):

  • process means collecting PIDs from directories in /proc

  • thread means collecting PIDs from directories in /proc/*/task

The following fields are thread-specific with the thread target (non-exhaustive):

To discriminate between processes and threads Status.pid and Status.tgid fields can be compared. For instance, the following command takes a snapshot of all threads in the system.

procpath query -f stat,status --procfs-target thread \
  '$..children[?(@.status.pid != @.status.tgid)]'

Warning

Thread target recording can easily provide an order of magnitude more data (as can be threads on a system) than for processes. However these data may have no additional information. E.g. smaps_rollup and fd would be slow to collect for all thread, but the data would be the same in each PID. I.e. these procfiles are only practical for the process target. See more in Procfs target files.

Remote

Procpath is designed to collect metrics from a local Linux system. However, because the path at which Procfs is read is configurable, Procfs snapshot analysis and remote recording is possible (albeit with limited frequency and features).

procpath record -f stat -i 60 --procfs /mnt/remote_proc -d remote.sqlite

Tip

To mount a remote Procfs over SSH use sshfs (apt-installable) like:

mkdir remote_proc
sshfs -o direct_io user@example.com:/proc remote_proc