Histograms 与 Summary 比较#




Histograms and summaries are more complex metric types. Not only does a single histogram or summary create a multitude of time series, it is also more difficult to use these metric types correctly. This section helps you to pick and configure the appropriate metric type for your use case.

Library support#

First of all, check the library support for histograms and summaries.

Some libraries support only one of the two types, or they support summaries only in a limited fashion (lacking quantile calculation).

Count and sum of observations#

Histograms and summaries both sample observations, typically request durations or response sizes. They track the number of observations and the sum of the observed values, allowing you to calculate the average of the observed values. Note that:

  • the number of observations (showing up in Prometheus as a time series with a _count suffix) is inherently a counter (as described above, it only goes up).

  • The sum of observations (showing up as a time series with a _sum suffix) behaves like a counter, too, as long as there are no negative observations. Obviously, request durations or response sizes are never negative. In principle, however, you can use summaries and histograms to observe negative values (e.g. temperatures in centigrade). In that case, the sum of observations can go down, so you cannot apply rate() to it anymore. In those rare cases where you need to apply rate() and cannot avoid negative observations, you can use two separate summaries, one for positive and one for negative observations (the latter with inverted sign), and combine the results later with suitable PromQL expressions.

To calculate the average request duration during the last 5 minutes from a histogram or summary called http_request_duration_seconds, use the following expression:


Apdex score#

A straight-forward use of histograms (but not summaries) is to count observations falling into particular buckets of observation values.

You might have an SLO to serve 95% of requests within 300ms. In that case, configure a histogram to have a bucket with an upper limit of 0.3 seconds. You can then directly express the relative amount of requests served within 300ms and easily alert if the value drops below 0.95. The following expression calculates it by job for the requests served in the last 5 minutes. The request durations were collected with a histogram called http_request_duration_seconds.

  sum(rate(http_request_duration_seconds_bucket{le="0.3"}[5m])) by (job) #5m内小于 300ms 的请求数
  sum(rate(http_request_duration_seconds_count[5m])) by (job) #5m内总请求数

You can approximate the well-known Apdex score in a similar way. Configure a bucket with the target request duration as the upper bound and another bucket with the tolerated request duration (usually 4 times the target request duration) as the upper bound. Example: The target request duration is 300ms. The tolerable request duration is 1.2s. The following expression yields the Apdex score for each job over the last 5 minutes:

  sum(rate(http_request_duration_seconds_bucket{le="0.3"}[5m])) by (job)
  sum(rate(http_request_duration_seconds_bucket{le="1.2"}[5m])) by (job)
) / 2 / sum(rate(http_request_duration_seconds_count[5m])) by (job)

Note that we divide the sum of both buckets. The reason is that the histogram buckets are cumulative. The le="0.3" bucket is also contained in the le="1.2" bucket; dividing it by 2 corrects for that.

The calculation does not exactly match the traditional Apdex score, as it includes errors in the satisfied and tolerable parts of the calculation.


You can use both summaries and histograms to calculate so-called φ-quantiles, where 0 ≤ φ ≤ 1. The φ-quantile is the observation value that ranks at number φ*N among the N observations. Examples for φ-quantiles: The 0.5-quantile is known as the median. The 0.95-quantile is the 95th percentile.

The essential difference between summaries and histograms is that:

  • summaries calculate streaming φ-quantiles on the client side(这里指标准数据源,即被监控的应用) and expose them directly,

  • while histograms expose bucketed observation counts and the calculation of quantiles from the buckets of a histogram happens on the server side using the histogram_quantile() function.

The two approaches have a number of different implications:



Required configuration

Pick buckets suitable for the expected range of observed values.

Pick desired φ-quantiles and sliding window. Other φ-quantiles and sliding windows cannot be calculated later.

Client(这里指标准数据源,即被监控的应用) performance

Observations are very cheap as they only need to increment counters.

Observations are expensive due to the streaming quantile calculation.(应用需要流式计算百分位)

Server performance

The server has to calculate quantiles. You can use recording rules should the ad-hoc calculation take too long (e.g. in a large dashboard).

Low server-side cost.

Number of time series (in addition to the _sum and _count series)

One time series per configured bucket.

One time series per configured quantile.

Quantile error (see below for details)

Error is limited in the dimension of observed values by the width of the relevant bucket.

Error is limited in the dimension of φ by a configurable value.

Specification of φ-quantile and sliding time-window

Ad-hoc with Prometheus expressions.

Preconfigured by the client.


Ad-hoc with Prometheus expressions.

In general not aggregatable.

Note the importance of the last item in the table. Let us return to the SLO of serving 95% of requests within 300ms. This time, you do not want to display the percentage of requests served within 300ms, but instead the 95th percentile, i.e. the request duration within which you have served 95% of requests. To do that, you can either configure a summary with a 0.95-quantile and (for example) a 5-minute decay(衰变) time, or you configure a histogram with a few buckets around the 300ms mark, e.g. {le="0.1"}, {le="0.2"}, {le="0.3"}, and {le="0.45"}. If your service runs replicated with a number of instances, you will collect request durations from every single one of them, and then you want to aggregate everything into an overall 95th percentile. However, aggregating the precomputed quantiles from a summary rarely makes sense. In this particular case, averaging the quantiles yields statistically nonsensical values.

avg(http_request_duration_seconds{quantile="0.95"}) // BAD! 原因之一是:在跨 pod 计算百分位时,每个应用的 pod 的 0.95 百分位的响应时间都是不一样的;且每个 pod 的负载也可能不同。

Using histograms, the aggregation is perfectly possible with the histogram_quantile() function.

histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket[5m])) by (le)) // GOOD.

Furthermore, should your SLO change and you now want to plot the 90th percentile, or you want to take into account the last 10 minutes instead of the last 5 minutes, you only have to adjust the expression above and you do not need to reconfigure the clients.

Two rules of thumb(总结):

  1. If you need to aggregate, choose histograms.

  2. Otherwise,

    Choose a histogram if you have an idea of the range and distribution of values that will be observed.

    Choose a summary if you need an accurate quantile, no matter what the range and distribution of the values is.

histogram_quantile() - histogram 转百分位#

histogram_quantile(φ scalar, b instant-vector) calculates the φ-quantile (0 ≤ φ ≤ 1) from the buckets b of a histogram. (See histograms and summaries for a detailed explanation of φ-quantiles and the usage of the histogram metric type in general.) The samples in b are the counts of observations in each bucket. Each sample must have a label le where the label value denotes the inclusive upper bound of the bucket. (Samples without such a label are silently ignored.) The histogram metric type automatically provides time series with the _bucket suffix and the appropriate labels.

Use the rate() function to specify the time window for the quantile calculation.

Example: A histogram metric is called http_request_duration_seconds. To calculate the 90th percentile of request durations over the last 10m, use the following expression:

histogram_quantile(0.9, rate(http_request_duration_seconds_bucket[10m]))

The quantile is calculated for each label combination in http_request_duration_seconds. To aggregate, use the sum() aggregator around the rate() function. Since the le label is required by histogram_quantile(), it has to be included in the by clause. The following expression aggregates the 90th percentile by job:

histogram_quantile(0.9, sum by (job, le) (rate(http_request_duration_seconds_bucket[10m])))

To aggregate everything, specify only the le label:

histogram_quantile(0.9, sum by (le) (rate(http_request_duration_seconds_bucket[10m])))

The histogram_quantile() function interpolates quantile values by assuming a linear distribution within a bucket. The highest bucket must have an upper bound of +Inf. (Otherwise, NaN is returned.) If a quantile is located in the highest bucket, the upper bound of the second highest bucket is returned. A lower limit of the lowest bucket is assumed to be 0 if the upper bound of that bucket is greater than 0. In that case, the usual linear interpolation is applied within that bucket. Otherwise, the upper bound of the lowest bucket is returned for quantiles located in the lowest bucket.

If b has 0 observations, NaN is returned. If b contains fewer than two buckets, NaN is returned. For φ < 0, -Inf is returned. For φ > 1, +Inf is returned. For φ = NaN, NaN is returned.