Mapping iostat to the node exporter’s node_disk_* metrics#
The node exporter and tools like iostat and sar use the same core data, but how do they relate to each other?
Prometheus metric names tend to tie pretty directly to a raw data source, so
node_disk_reads_completed_total is the number of reads completed from a given disk device given by the 1st field of /proc/diskstats. It’s reasonably obvious what it means just from the name.
iostat -x has output like:
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r\_await w\_await svctm %util sda 0.00 8.00 0.00 15.00 0.00 342.00 45.60 0.04 2.40 0.00 2.40 0.40 0.60
r/s is the number of reads per second calculated from the previous measurement iostat made (or since boot for the first one). The equivalent is
rate(node_disk_reads_completed_total[5m]) in PromQL. Similarly with
iostat will report in kilobytes by default. The kernel reports this in 512-sectors (irrelevant of the sector size of the underlying device), and Prometheus uses bytes as standard. So
wkB/s would be
rate(node_disk_written_bytes_total[5m]) only 1024 times bigger.
avgrq-sz is the average size of each request, combining both read and write. It’s calculated by iostat by dividing the bytes by the operations, so
(rate(node_disk_read_bytes_total[5m]) + rate(node_disk_written_bytes_total[5m])) / (rate(node_disk_reads_completed_total[5m]) + rate(node_disk_writes_completed_total[5m]))
and once again in bytes rather than kilobytes. Personally I’d rather view reads and writes separately.
avgqu-sz is simpler, the average queue length. This is based on field 11, which gives us
w_await are how long read and write requests took on average, so for reads that’s
rate(node_disk_read_time_seconds_total[5m]) / rate(node_disk_reads_completed_total[5m])and similarly for writes.
await is both combined, so you can add and then divide if you want it.
%util is utilisation as a percentage,
rate(node_disk_io_time_seconds_total[5m]) will produce the same as a ratio which is more standard in Prometheus.
svctm is deprecated, but it’d be the IO time divided by the sum of the reads and writes completed.
There one other notable metric which
iostat doesn’t expose which is field 9,
node_disk_io_now the number of IOs in progress. Newer kernels will also expose discard stats, useful for SSDs.