V2EX = way to explore

V2EX 是一个关于分享和探索的地方

现在注册

已注册用户请登录

The Go Programming Language

› http://golang.org/

› Go Playground

Go Projects

› Revel Web Framework

这是一个创建于 2159 天前的主题，其中的信息可能已经有所发展或是发生改变。

最近我对 Prometheus 刮目相看了, 服务加一行代码就能轻轻松松地监控起来服务的 CPU 使用率、内存、协程数、线程数、打开的文件描述符数量及软限制、重启次数等重要的基本指标, 配合 Grafana 建立了直观的图表, 对查问题很有帮助, 故想写写折腾 Prometheus 和 Grafana 后得到的值得一讲的实践与理解.

GO 服务几个重要的基本指标 Dashboard

介绍

Prometheus 是CNCF 的项目之一(ps.CNCF 的项目代码都值得研究), 而且还是 Graduated Projects. 同时因为其主要是方便灵活的 pull 方式, 暴露出个 http 接口出来给 prometheusd 拉取就行了, 而 push 方式客户端要做更多的事情, 如果要改 push 的地址的话就很麻烦, 所以很多著名的项目都在用它, 比如 k8s, tidb, etcd, 甚至是时序数据库 influxdb 都在用它.

我体会到, 很多场景很适合使用 Prometheus sdk 去加一些指标, 比如 logger 包, Error 级别的消息数是一个很有用的指标; 对于消息队列的 SDK, 可以用 Prometheus 收集客户端侧的发送时延、消费时延、消费处理耗时、消费处理出错等指标; 封装 DB 操作的 SDK, 连接池打开的 DB 连接数与最大连接数是个很重要的指标; 写个 HTTP Middleware, http handler 的调用次数、处理时间和 responseCode 是感兴趣的指标.

安装

Prometheus 是 Go 写的, 故部署方便且跨平台, 一个二进制文件加配置文件就能跑起来.

GitHub release 页面有各个平台的编译好的二进制文件,通常配合 supervisor 等进程管理工具来服务化, 也可以用 docker.

文档上有基础的配置文件示例, 复制为prometheus.yml即可.

./prometheus --config.file=prometheus.yml

prometheus.yml主要是定义一些全局的抓取间隔等参数以及抓取的 job, 抓取的 job 可以指定名字、抓取间隔、抓取目标的 IP 端口号列表, 目标的路由路径, 额外的 label 等参数.

抓取指标时会自动加上job="<job_name>"和instance="<target ip port>"两个 label, 如果想给 job 添加额外的固定 label, 则可以在配置文件中按如下语法添加.

scrape_configs:
  - job_name: foo
    metrics_path: "/prometheus/metrics"
    static_configs:
      - targets: ['localhost:10056']
        labels:
          service_name: "bar"

服务发现

前面说到, Prometheus 的配置文件主要就是定义要抓取的 job 配置, 显然新加服务要改配置文件是比较麻烦的, Prometheus 的一大重要的功能点就是原生支持多种服务发现方式, 支持 consul etcd 等服务发现组件, 还支持非常通用的基于文件的服务发现, 即你可以定义一个写好 target 的 IP 端口号等配置的配置文件路径, 由外部程序定期去更新这个文件, prometheus 会定期加载它, 更新抓取的目标, 非常灵活.

数据描述

Prometheus 的时序指标数据由 timestamp、metric name、label、value 组成:

timestamp 是毫秒级的时间戳.
metric name 是符合正则[a-zA-Z_:][a-zA-Z0-9_:]*的字符串, 即只包含英文字母和数字及两个特殊符号_:, 不能包含横杆-这样的特殊符号.
label 是一个 kv 都是 string 类型的 map.
value 是 float64.

指标类型

Prometheus 的指标类型包括基本指标类型 Counter 和 Guage 及进阶指标类型 Historygram 和 Summary.

所有指标都是在 client SDK 端内存存储的, 由 prometheus 抓取器抓取.

Counter

Counter 是计数器, 单调递增的, 只有服务重启时才会清零, 比如 http 请求数, errorLevel 的 log 数. 值得一提的是, prometheus 的内置函数求值时会自动处理重启清零的情况.

counter 的 value 是 float64, 怎么无锁地操作 float64 呢? 答案是用 math 包将其视作 uint64 来操作.

func (v *value) Add(val float64) {
	for {
		oldBits := atomic.LoadUint64(&v.valBits)
		newBits := math.Float64bits(math.Float64frombits(oldBits) + val)
		if atomic.CompareAndSwapUint64(&v.valBits, oldBits, newBits) {
			return
		}
	}
}

Guage

Guage 是一个可增可减的数值指标, 比如 CPU 使用率, 内存使用率, 协程数.

Historygram

Historygram 是直方图, 适合需要知道数值分布范围的场景, 比如 http 请求的响应时长, http 请求的响应包体大小等.

直方图的组距不一定是固定的, 可以自己定义适合, 这里称其为 bucket, 每一个 metric value 根据其数值大小落在对应的 bucket.

Historygram 实际上包含多个时序数据.

<basename>_bucket{le="<upper inclusive bound>"}小于等于指定数值的计数.
<basename>_sum 总和
<basename>_count 总计数, 其值当然也等于<basename>_bucket{le="+Inf"}

Summary

Summary 相比 Historygram 是按百分位聚合好的直方图, 适合需要知道百分比分布范围的场景, 比如对于 http 请求的响应时长, Historygram 是侧重在于统计小于 1ms 的请求有多少个, 1ms~10ms 的请求有多少个, 10ms 以上的请求有多少个, 而 Summary 在于统计 20%的请求的响应时间是多少, 50%的请求的响应时间是多少, 99%的请求的响应时间是多少. Historygram 是计数原始数据, 开销小, 执行查询时有对应的函数计算得到 p50, p99, 而 Summary 是在客户端 SDK 测做了聚合计算得到指定的百分位, 开销更大一些.

SDK 的使用

prometheus 的 Golang SDK 设计得很地道, 充分利用了 GO 语言的特性.

在 SDK 中所有的指标类型都实现了prometheus.Collector 接口.

// Collector is the interface implemented by anything that can be used by
// Prometheus to collect metrics. A Collector has to be registered for
// collection. See Registerer.Register.
//
// The stock metrics provided by this package (Gauge, Counter, Summary,
// Histogram, Untyped) are also Collectors (which only ever collect one metric,
// namely itself). An implementer of Collector may, however, collect multiple
// metrics in a coordinated fashion and/or create metrics on the fly. Examples
// for collectors already implemented in this library are the metric vectors
// (i.e. collection of multiple instances of the same Metric but with different
// label values) like GaugeVec or SummaryVec, and the ExpvarCollector.
type Collector interface {
	// Describe sends the super-set of all possible descriptors of metrics
	// collected by this Collector to the provided channel and returns once
	// the last descriptor has been sent. The sent descriptors fulfill the
	// consistency and uniqueness requirements described in the Desc
	// documentation. (It is valid if one and the same Collector sends
	// duplicate descriptors. Those duplicates are simply ignored. However,
	// two different Collectors must not send duplicate descriptors.) This
	// method idempotently sends the same descriptors throughout the
	// lifetime of the Collector. If a Collector encounters an error while
	// executing this method, it must send an invalid descriptor (created
	// with NewInvalidDesc) to signal the error to the registry.
	Describe(chan<- *Desc)
	// Collect is called by the Prometheus registry when collecting
	// metrics. The implementation sends each collected metric via the
	// provided channel and returns once the last metric has been sent. The
	// descriptor of each sent metric is one of those returned by
	// Describe. Returned metrics that share the same descriptor must differ
	// in their variable label values. This method may be called
	// concurrently and must therefore be implemented in a concurrency safe
	// way. Blocking occurs at the expense of total performance of rendering
	// all registered metrics. Ideally, Collector implementations support
	// concurrent readers.
	Collect(chan<- Metric)
}

prometheus.Collector 接口中的方法传参都是只写的chan, 使得实现接口的代码无论是同步还是并行都可以. Describe(chan<- *Desc)方法是在将 Collector 注册或注销时调用的, Collect(chan<- Metric)方法是在被抓取收集指标时调用的.

基本使用

不带 label 的指标类型使用prometheus.NewCounter prometheus.NewGauge prometheus.NewHistogram prometheus.NewSummary去创建并使用prometheus.MustRegister 注册, 一般是初始化好作为一个包内全局变量, 在 init 函数中注册.

var (
	sentBytes = prometheus.NewCounter(prometheus.CounterOpts{
		Namespace: "etcd",
		Subsystem: "network",
		Name:      "client_grpc_sent_bytes_total",
		Help:      "The total number of bytes sent to grpc clients.",
	})

	receivedBytes = prometheus.NewCounter(prometheus.CounterOpts{
		Namespace: "etcd",
		Subsystem: "network",
		Name:      "client_grpc_received_bytes_total",
		Help:      "The total number of bytes received from grpc clients.",
	})
)

func init() {
	prometheus.MustRegister(sentBytes)
	prometheus.MustRegister(receivedBytes)
}

counter 的 Add 方法不能传负数, 否则会 panic.

带 label 的指标类型使用prometheus.NewCounterVec prometheus.NewGaugeVec prometheus.NewHistogramVec prometheus.NewSummaryVec, 不同的 label 值就像空间直角坐标系中的以原点为七点的不同方向的向量一样.

调用 Vec 类型的WithLabelValues方法传入的 value 参数数量一定要和注册时定义的 label 数量一致, 否则会 panic.

进阶使用

默认情况下, Collector 都是主动去计数, 但有的指标无法主动计数, 比如监控服务当前打开的 DB 连接数, 这个指标更适合在拉取指标时去获取值, 这个时候就可以使用prometheus.NewCounterFunc prometheus.NewGaugeFunc, 传入一个返回指标值的函数func() float64, 在拉取指标时就会调用这个函数, 当然, 这样定义的是没有带 Label 的, 如果想在拉取指标时执行自己定义的函数并且附加上 label, 就只能自己定义一个实现 prometheus.Collector接口的指标收集器, prometheus SDK 设计得足够灵活, 暴露了底层方法MustNewConstMetric, 使得可以很方便地实现一个这样的自定义 Collector, 代码如下.

type gaugeVecFuncCollector struct {
	desc                        *prometheus.Desc
	gaugeVecFuncWithLabelValues []gaugeVecFuncWithLabelValues
	labelsDeduplicatedMap       map[string]bool
}

// NewGaugeVecFunc
func NewGaugeVecFunc(opts GaugeOpts, labelNames []string) *gaugeVecFuncCollector {
	return &gaugeVecFuncCollector{
		desc: prometheus.NewDesc(
			prometheus.BuildFQName(opts.Namespace, opts.Subsystem, opts.Name),
			opts.Help,
			labelNames,
			opts.ConstLabels,
		),
		labelsDeduplicatedMap: make(map[string]bool),
	}
}

// Describe
func (dc *gaugeVecFuncCollector) Describe(ch chan<- *prometheus.Desc) {
	ch <- dc.desc
}

// Collect
func (dc *gaugeVecFuncCollector) Collect(ch chan<- prometheus.Metric) {
	for _, v := range dc.gaugeVecFuncWithLabelValues {
		ch <- prometheus.MustNewConstMetric(dc.desc, prometheus.GaugeValue, v.gaugeVecFunc(), v.labelValues...)
	}
}

// RegisterGaugeVecFunc 
// 同一组 labelValues 只能注册一次
func (dc *gaugeVecFuncCollector) RegisterGaugeVecFunc(labelValues []string, gaugeVecFunc func() float64) (err error) {
	// prometheus 每次允许收集一次 labelValues 相同的 metric
	deduplicateKey := strings.Join(labelValues, "")
	if dc.labelsDeduplicatedMap[deduplicateKey] {
		return fmt.Errorf("labelValues func already registered, labelValues:%v", labelValues)
	}
	dc.labelsDeduplicatedMap[deduplicateKey] = true
	handlePanicGaugeVecFunc := func() float64 {
		if rec := recover(); rec != nil {
			const size = 10 * 1024
			buf := make([]byte, size)
			buf = buf[:runtime.Stack(buf, false)]
			logger.Errorf("gaugeVecFunc panic:%v\nstack:%s", rec, buf)
		}
		return gaugeVecFunc()
	}
	dc.gaugeVecFuncWithLabelValues = append(dc.gaugeVecFuncWithLabelValues, gaugeVecFuncWithLabelValues{
		gaugeVecFunc: handlePanicGaugeVecFunc,
		labelValues:  labelValues,
	})
	return nil
}

最佳实践

在编辑图表写查询语句时,不会显示指标类型, 所以最好看到 metric name 就能知道是一个什么类型的指标, 约定 counter 类型的指标名字以_total为后缀.
在编辑图表写查询语句时, 也不会显示指标类型的单位, 所以最好看到 metric name 就能知道是一个什么单位的指标, 比如时长要写是纳秒还是毫秒还是秒, http_request_duration_seconds, 数据大小要写是 MB 还是 bytes, client_grpc_sent_bytes_total.
每个指标要有单个词的 namespace 前缀, 比如process_cpu_seconds_total, http_request_duration_seconds.
不带 label 的 Counter 和 Guage 内部是个无锁的 atomic uint64, 不带 Label 的 Historygram 内部是多个无锁的 atomic uint64, 不带 Label 的 Summary 因为内部要聚合计算, 是有锁的, 所以并发要求高的话优先选择 Historygram 而不是 Summary.
带 label 的每次会去计算 label 值的 hash 找到对应的向量, 然后去计数, 所以 label 数不要太多, label 值的长度不要太长, label 值是要可枚举的并且不能太多, 否则执行查询时慢, 面板加载慢, 存储也费空间. label 如果可以提前计算则尽量使用 GetMetricWithLabelValues 提前计算好得到一个普通的计数器, 减少每次计数的一次计算 label 的 hash, 提升程序性能.
```
// GetMetricWithLabelValues replaces the method of the same name in
// MetricVec. The difference is that this method returns a Counter and not a
// Metric so that no type conversion is required.
func (m *CounterVec) GetMetricWithLabelValues(lvs ...string) (Counter, error) {
   metric, err := m.MetricVec.GetMetricWithLabelValues(lvs...)
   if metric != nil {
      return metric.(Counter), err
   }
   return nil, err
}
```
对于时长 time.Duration 数据类型的指标值收集, time.Since 是优化过的, 直接走 runtimeNano, 无需走系统调用取当前时间, 性能优于 time.Now 后相减, 另外, 频繁调用 time.Now 在性能要求高的程序中也会变成不小的开销.

查询语句 promQL

Prometheus 查询语句(PromQL)是一个相比 SQL 更简单也很有表达力的专用查询语言, 通过文档及例子学习.

Prometheus 自带的 Graph 面板比较简陋, 一般情况下直接用强大的 Grafana 就行了, 制作图表 dashboard 时, 直接输入 PromQL 即可展示时序图表.

label 条件 (Instant vector selectors)

http_requests_total{job="prometheus",group="canary"}

查询条件中,除了=和!=外, =~表示正则匹配, !~表示正则不匹配.

查询条件也可以作用在 metric name 上, 语法有点像 Python 的__前缀的魔法, 如用 {__name__=~"job:.*"}表示选择名字符合job:.*这样的正则的 metric.

范围条件(Range Vector Selectors)

http_requests_total{job="prometheus"}[5m]

范围条件中, 时长字符串语法和 GO 一样, s 代表秒, m 代表分, h 代表小时, d 代表天, w 代表星期, y 代表年.

常用函数

changes() 变化次数
delta(v range-vector) 平均变化量, 只适用于 guage
idelta(v range-vector) 即时变化量, 只适用于 guage
histogram_quantile(φ float, b instant-vector) histogram 专用函数, 用来计算 p99 p90 等百分位的 summary. 例子histogram_quantile(0.9, avg(rate( http_request_duration_seconds_bucket[10m])) by (job, le))
increase(v range-vector) 增量, 只适用于 counter
rate - 平均 QPS
irate - 即时 QPS, 如果原始数据变化快, 可以使用更敏感的 irate

Snippet

这里列举一些我通过搜索及自行摸索出来的对于 Prometheus GO SDK 默认收集的指标的 PromQL Snippet.

CPU 使用率: rate(process_cpu_seconds_total[1m])* 100
系统内存使用率: go_memstats_sys_bytes
重启次数: changes(process_start_time_seconds[5m])

Grafana 面板

编辑 Grafana 面板时, 有几个技巧:

Query 界面可以设置下方说明条 Legend 的格式, 支持双花括号形式{{labelName}}的模板语法.
Visualization 界面可以设置坐标轴的单位, 比如百分比, 数据大小单位, 时长单位等等, 让 Y 轴的值更具有可读性.
Visualization 界面可以设置 Legend 的更多选项, 是否显示为一个表格, 表格是放在下方还是右方, 支持显示额外的聚合值如最大值最小值平均值当前值总值, 支持设置这些聚合值的小数位数.

监控告警

告警在 Grafana 处可视化界面设置会比较简单, 可设置连续多少次指定的 promQL 查出的值不在指定的范围即触发报警, 告警通知的最佳搭配当然是 slack channel.

我的博客原文地址:https://imhanjm.com/2019/10/06/%E6%B7%B1%E5%85%A5%E7%90%86%E8%A7%A3Prometheus(GO%20SDK)/

9 条回复 • 2019-10-17 20:47:08 +08:00

pmispig

2019-10-08 09:43:13 +08:00

我个人是非常厌恶这种 pull 模式的东西，agent 还得实现 http 接口，这不是脱了裤子放屁吗

poplar50

2019-10-08 09:54:00 +08:00 via Android

@pmispig 这个官方 faq 里给了答案 https://prometheus.io/docs/introduction/faq/#why-do-you-pull-rather-than-push

defunct9

2019-10-08 09:58:13 +08:00

客户端主动上报也是一样的，推和拉其实都一样。

zpfhbyx

2019-10-08 12:34:33 +08:00

grafana 的 alter 不支持模板变量，模板变量就只能在展示里面用了。。这个比较蛋疼，

freestyle

2019-10-08 12:54:59 +08:00 via iPhone

@pmispig 用 pull 是有一定优势的，tidb 都改为 pull 方式了.
https://asktug.com/t/grafana-tidb/1061
从 TiDB 2.1.3 版本开始，监控采用 pull 的方式，而之前采用的是 push 的方式，这是一个非常好的调整，它解决了几个问题：

之前如果 Prometheus 需要迁移，需要重启整个集群，因为组件要调整 push 的目标地址。
现在可以部署 2 套 Prometheus，防止监控的单点，因为 pull 的 source 端是可以多个。
去掉了 PushGateWay 这个单点组件。

pmispig

2019-10-09 09:50:03 +08:00

@freestyle 请问他这个有没有给不同的开发语言一个 agent SDK，以前执行一条 shell push 还得写一个 web 接口，这个有点蛋疼

freestyle

2019-10-09 13:04:31 +08:00 via iPhone

@pmispig go java Python Ruby 都有官方支持的 sdk，用 Python 写 push 到 prometheus gateway 的话也可以写成一行的呢，哈哈，看 https://github.com/prometheus/client_python/blob/master/README.md#exporting-to-a-pushgateway.

https://prometheus.io/docs/instrumenting/pushing/

pmispig

2019-10-09 13:24:08 +08:00

@freestyle 多谢~我看看 sdk

freestyle

2019-10-17 20:47:08 +08:00 via iPhone

@zpfhbyx alert message 框不能写模板变量，但发出的告警内容里会包含编辑 query 时的 legend，legend 里可以写模板变量的.

深入理解 Prometheus(GO SDK 及 Grafana 基本面板)

介绍

安装

服务发现

数据描述

指标类型

Counter

Guage

Historygram

Summary

SDK 的使用

基本使用

进阶使用

最佳实践

查询语句 promQL

label 条件 (Instant vector selectors)

范围条件(Range Vector Selectors)

常用函数

Snippet

Grafana 面板

监控告警