Container Metrics#
Memory#
K8s 自带容器指标数据源是来自 kubelet 中运行的 cAdvisor 模块 的。
而 cAdvisor 的官方 Metric 说明文档在这:Monitoring cAdvisor with Prometheus 。这个官方文档是写得太简单了,简单到不太适合问题定位……
好在,高手在民间:Out-of-memory (OOM) in Kubernetes – Part 3: Memory metrics sources and tools to collect them:
cAdvisor metric |
Source OS metric(s) |
Explanation of source OS metric(s) |
What does the metric mean? |
---|---|---|---|
|
|
number of bytes of page cache memory |
Size of memory used by the cache that’s automatically populated when reading/writing files |
|
|
number of bytes of anonymous and swap cache memory (includes transparent hugepages). […]This should not be confused with the true ‘resident set size’ or the amount of physical memory used by the cgroup. ‘rss + mapped_file’ will give you resident set size of cgroup” |
Size of memory not used for mapping files from the disk |
|
|
number of bytes of mapped file (includes tmpfs/shmem) |
Size of memory that’s used for mapping files |
|
|
number of bytes of swap usage |
|
|
The value inside the |
shows the number of times that a usage counter hit its limit |
|
|
The value inside the |
doesn’t show ‘exact’ value of memory (and swap) usage, it’s a fuzz value for efficient access. (Of course, when necessary, it’s synchronized.) If you want to know more exact memory usage, you should use RSS+CACHE(+SWAP) value in memory.stat |
Size of overall memory used, regardless if it’s for mapping from disk or just allocating |
|
The value inside the |
max memory usage recorded |
|
|
Deduct |
|
A heuristic for the minimum size of memory required for the app to work. |
表:CAdvisor 的指标和来源
如果上面的描述还不足以满足你的好奇心,那么这里有更多:
https://jpetazzo.github.io/2013/10/08/docker-containers-metrics/
常被误解的 K8s 指标#
container_memory_usage_bytes#
A Deep Dive into Kubernetes Metrics — Part 3 Container Resource Metrics
You might think that memory utilization is easily tracked with
container_memory_usage_bytes
, however, this metric also includes cached (think filesystem cache) items that can be evicted under memory pressure. The better metric iscontainer_memory_working_set_bytes
as this is what the OOM killer is watching for.
container_memory_working_set_bytes#
Memory usage discrepancy: cgroup memory.usage_in_bytes vs. RSS inside docker container
container_memory_working_set_bytes
=container_memory_usage_bytes
-total_inactive_file
(from /sys/fs/cgroup/memory/memory.stat), this is calculated in cAdvisor and is <=container_memory_usage_bytes
kubectl top#
Memory usage discrepancy: cgroup memory.usage_in_bytes vs. RSS inside docker container
when you use the
kubectl top pods
command, you get the value ofcontainer_memory_working_set_bytes
notcontainer_memory_usage_bytes
metric.
container_memory_cache 与 container_memory_mapped_file 的关系#
Out-of-memory (OOM) in Kubernetes – Part 3: Memory metrics sources and tools to collect them:
Notice the “page cache” term on the definition of the
container_memory_cache
metric. In Linux the page cache is “used to cache the content of files as IO is performed upon them” as per the “Linux Kernel Programming
” book by Kaiwan N Billimoria(本文作者注:这本书我看过,是我看到的,最近最好解理的内核图书). You might be tempted as such to think thatcontainer_memory_mapped_file
pretty much refers to the same thing, but that’s actually just a subset: e.g. a file can be mapped in memory (whole or parts of it) or it can be read in blocks, but the page cache will include data coming from either way of accessing that file. See https://stackoverflow.com/questions/258091/when-should-i-use-mmap-for-file-access for more info.
什么 metric 才是 OOM Kill 相关#
Memory usage discrepancy: cgroup memory.usage_in_bytes vs. RSS inside docker container
It is also worth to mention that when the value of
container_memory_usage_bytes
reaches to the limits, your pod will NOT get oom-killed. BUT ifcontainer_memory_working_set_bytes
orcontainer_memory_rss
reached to the limits, the pod will be killed.