基于prometheus的监控方案

背景

在从物理机部署向容器化部署的过程中，基于目前主流容器编排的k8s系统管理下，传统监控系统无法满足对容器和容器集群以及容器内的服务进行监控的需求。

在调研中项目组对2016年5月继Kubernetes之后成为第二个正式加入CNCF基金会的项目--prometheus产生了兴趣，基于prometheus的监控方案可以解决目前监控的痛点。

一、简介

1、prometheus是什么

首先，给prometheus下个大定义，这是一个监控系统，干的事是和Nagios，open-falcon，夜莺等一样的一个东西；
其次，prometheus还是是一个时序数据库的上游对接程序，能存数据到本地的tsdb中，能读能写，同时也支持远端持久化存储m3db，成为数据的dbproxy；

2、为什么容器化选用prometheus做为监控

prometheus是继Kubernetes之后成为第二个正式加入CNCF基金会的项目，它可以更好地与容器平台、云平台配合，其他系统主要还是对主机监控；
prometheus是一站式监控告警平台，依赖少，功能齐全；
prometheus配置灵活，支持多种服务发现来监控你的服务和基础项，也可以自己定制prometheus的探测端exporter，自定义监控；

ps:提前先透露一个prometheus的小缺点：它对分布式支持不太友好，后续解释一下

二、部署

1、安装

方式	步骤
包安装	https://prometheus.io/download/ （等去吧。。。）
wget压缩包	1、创建下载文件夹 mkdir ~/Download;cd ~/Download 2、下载 wget https://github.com/prometheus/prometheus/releases/download/v1.6.2/prometheus-1.6.2.linux-amd64.tar.gz 3、创建一个Prometheus放东西的文件夹 mkdir ~/Prometheus；cd ~/Prometheus; 4、解压，开箱即用 tar -xvzf ~/Download/prometheus-1.6.2.linux-amd64.tar.gz;cd prometheus-1.6.2.linux-amd64 (等去吧。。。）
源码编译	git clone https://github.com/prometheus/prometheus.git make和go build都行（爽去吧。。。）

最后不管怎么样 ./prometheus version有数据就能用

2、启动

./prometheus

3、访问

http://127.0.0.1:9090打开新世界

三、prometheus结构

这是官方给出的架构图从架构可以看出prometheus“插件”很多，功能也很多，其中大体包括Exporter，PushGateway，Prometheus Server，AlertManager 四类常用组件。

·Exporter：可以将Exporter分为2类：

1）内置采集：内置采集指的是服务或者实例中直接内置了对Prometheus监控的支持，比如cAdvisor，Kubernetes，Etcd等直接内置了用于向Prometheus暴露监控数据的端点。业务服务程序中通过添加/metricsHTTP接口，采用promhttp中的路由方法进行路由注册向Prometheus暴露监控数据的端点同时将。

2）收集采集：原有监控目标并不直接支持Prometheus，因此需要通过Prometheus提供的Client Library编写该监控目标的监控采集程序。如：Mysql Exporter，JMX Exporter，Consul Exporter等。

·PushGateway:

jobs可直接向PushGateway推送metrics数据，PushGateway会进行统计同时暴露监控数据的端点等待Prometheus采集数据。

主要使用场景是用于短期的非侵入式的打点信息。由于这类 jobs 存在时间较短，可能在 Prometheus 来 pull 之前就消失了。

为此，这次 jobs 可以直接向 PushGateway推送它们的监控指标数据。类似于Prometheus的proxy。这种方式主要用于服务层面的监控指标

·Prometheus Server：

监控系统的核心组件，负责实现对监控实例的服务发现，监控数据的抓取，存储和查询。Prometheus Server可以通过静态配置管理监控目标，也可以配合使用服务发现的方式动态管理监控目标，并从这些监控目标中获取数据。

其次Prometheus Sever需要对采集到的数据进行存储，Prometheus Server本身实现了tsdb高性能时序数据库，将采集到的监控数据按照时间序列的方式存储在本地磁盘当中。Prometheus Server查询模块对外提供了自定义的PromQL查询语法，实现对数据的查询以及分析。

另外下面要讲的金字塔模型会用到Prometheus Server的联邦集群能力，它可以使其从其他的Prometheus Server实例中获取数据。

·AlertManager：

在Prometheus Server中支持基于配置文件中的告警规则下如果满足条件，则会产生一条告警以alerts的结构推送到AlertManager中，AlertManager会进行去除重复数据，分组，并路由到配置好的接收方式中发出报警。其中常见的接收方式有：电子邮件，钉钉（现在不支持了），企业微信，webhook 等。

四、获取一个指标信息

1、添加配置

获取一个指标信息首先按照之前将的内容，根据server的的功能首先要让server获得一个拉数据的源，一般我们在配置文件中进行配置，在prometheus.yml文件中我们看如下默认配置:

# my global config
global:
  scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).

# Alertmanager configuration
alerting:
  alertmanagers:
  - static_configs:
    - targets:
      # - alertmanager:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  # - "first_rules.yml"
  # - "second_rules.yml"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=` to any timeseries scraped from this config.
  - job_name: 'prometheus'

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
    - targets: ['localhost:9090']

我们看向scrape_configs，意思是：目标地址localhost:9090/metrics采集数据，任务名叫做prometheus ，使用http协议请求。

2、通知server配置发生变化

通知server的方法我总结有三种：

后两种属于热更比较建议

•重启server

•向服务中的prometheus进行发信号

•向prometheus发送http请求

发信号

/prometheus # ps -ef | grep prometheus
    1 root       13:33 /bin/prometheus --config.file=/etc/prometheus/prometheus.yml ......
/prometheus # kill -HUP 1
#prometheus会有单独协程接收HUP信号进行reload配置

http请求

curl -XPOST http://prometheusaddr/-/reload

加载配置的工作到这里结束

3、查询指标

用自带的ui界面，浏览器打开

http://prometheusIp:port/graph

在输入框输入prometheus会出现很多相关指标，从名字上我们不难区分指标含义，选择点击execute会查询某一个时间段内该指标的各个值（x:时间，y:values）举个栗子：可以理解为打点在某时刻出现次数。

五、配置文件

Prometheus 启动的时候，可以加载运行参数 -config.file 指定配置文件，默认为 prometheus.yml。在配置文件中我们可以指定 global, alerting, rulefiles, scrapeconfigs, remotewrite, remoteread 等属性。

简单概述： GlobalConfig全局配置，配置包括:

scrape_interval: 拉取 targets 的默认时间间隔。
scrape_timeout: 拉取一个 target 的超时时间。
evaluation_interval: 执行 rules 的时间间隔。
external_labels: 额外的属性，会添加到拉取的数据并存到数据库中。

AlertingConfig告警配置，配置包括:

alert_relabel_configs: 动态修改 alert 属性的规则配置。
alertmanagers: 用于动态发现 Alertmanager 的配置。

RuleFiles报警规则配置，配置包括:

RuleFiles 主要用于配置 rules 文件，它支持多个文件以及文件目录。

ScrapeConfigs拉取配置，配置包括:

job_name：任务名称
honor_labels：用于解决拉取数据标签有冲突，当设置为 true, 以拉取数据为准，否则以服务配置为准
params：数据拉取访问时带的请求参数
scrape_interval：拉取时间间隔
scrape_timeout: 拉取超时时间
metrics_path：拉取节点的 metric 路径
scheme：拉取数据访问协议
sample_limit：存储的数据标签个数限制，如果超过限制，该数据将被忽略，不入存储；默认值为0，表示没有限制ServiceDiscoveryConfig
ServiceDiscoveryConfig：服务发现配置（详情见下面服务发现配置）
HTTPClientConfig：网络配置
relabel_configs：拉取数据重置标签配置
metric_relabel_configs：metric 重置标签配置

RemoteWriteConfigs远程写配置，配置包括（配置m3db的地方）:

url: 访问地址
remote_timeout: 请求超时时间
write_relabel_configs: 标签重置配置, 拉取到的数据，经过重置处理后，发送给远程存储

RemoteReadConfigs远程读配置，配置包括（配置m3db的地方）:

url: 访问地址
remote_timeout: 请求超时时间
服务发现配置(多了去。。。。)，承接ScrapeConfigs的ServiceDiscoveryConfig:
static_configs: 静态服务发现
dns_sd_configs: DNS 服务发现
file_sd_configs: 文件服务发现
consul_sd_configs: Consul 服务发现
serverset_sd_configs: Serverset 服务发现
nerve_sd_configs: Nerve 服务发现
marathon_sd_configs: Marathon 服务发现
kubernetes_sd_configs: Kubernetes 服务发现
gce_sd_configs: GCE 服务发现
ec2_sd_configs: EC2 服务发现
openstack_sd_configs: OpenStack 服务发现
azure_sd_configs: Azure 服务发现
triton_sd_configs: Triton 服务发现

最后给个完整配置，这里没有用服务发现走的服务发现配置：

global:
  scrape_interval:     15s # By default, scrape targets every 15 seconds.
  evaluation_interval: 15s # By default, scrape targets every 15 seconds.
 
rule_files:
  - "rules/node.rules"
 
scrape_configs:
  - job_name: 'prometheus'
    scrape_interval: 5s
    static_configs:
      - targets: ['localhost:9090']
 #service级别服务发现，发现endpoints里每一个service
  job_name: kubernetes-service-endpoints
      kubernetes_sd_configs:
      - role: endpoints
      relabel_configs:
      - action: keep
        regex: true
        source_labels:
        - __meta_kubernetes_service_annotation_prometheus_io_scrape
      - action: replace
        regex: (https?)
        source_labels:
        - __meta_kubernetes_service_annotation_prometheus_io_scheme
        target_label: __scheme__
      - action: replace
        regex: (.+)
        source_labels:
        - __meta_kubernetes_service_annotation_prometheus_io_path
        target_label: __metrics_path__
      - action: replace
        regex: ([^:]+)(?::d+)?;(d+)
        replacement: $1:$2
        source_labels:
        - __address__
        - __meta_kubernetes_service_annotation_prometheus_io_port
        target_label: __address__
      - action: labelmap
        regex: __meta_kubernetes_service_label_(.+)
      - action: replace
        source_labels:
        - __meta_kubernetes_namespace
        target_label: kubernetes_namespace
      - action: replace
        source_labels:
        - __meta_kubernetes_service_name
        target_label: kubernetes_name

六、数据模型

指标是Prometheus的监控聚集项，指标由一个或者多个标签去维护，这些标签可以成为是指标的维度去聚合，去解释指标，举个栗子：

log_metric{ip="xxx.xxx.xxx.xxx",instance="xxx.xxx.xxx.xxx:xxx",code="access1",metrics_storage="m3db_remote",server="test"}

这是个日志指标，这个表达式中，logmetric为指标名字，描述它的是标签label，有：是哪个服务：server，日志文件是哪个：filterCode，机器地址是什么：instance，匹配关键字是什么：matchCode，数据源是什么：metricsstorage

像不像关系型数据库表和字段的关系，相同的指标（各个标签的值都相同，指标名也得相同）在一次抓取中会做聚合

七、Prometheus抓取数据分析

首先Prometheus的监控指标获取是通过拉的方式，那么只要符合Prometheus拉方式的方法那么都可以成为数据产生源，让我们从源码来看一下一个数据是怎么被拉到的:

在cmd→main.go中，初始化完的服务发现抓取对象首先开始工作--获取所有目标地址！，暂时先不管这里，我们先关注抓取这里

紧接着抓取器同学也开始了工作(Run函数)，它需要服务发现同学传球，就是把地址给它

开始reloader，做抓取准备工作

这里对服务发现过来的数据建了一个所谓的连接池，并保持热更新，实际上生成在下面的闭包中的sync

找到数据结构group的真正ip，维护ip池子

沿着代码追到最后，，，原来是scrape包中targetScrape类的一个方法，是一次get请求，那么官方这么规定就很灵活了，exporter就是完成了这样的一个接口就可以把数据吐出去了

八、一个自己的exporter

exporter是需要完成返回固定格式的，那么为了保证这种协议，官方要求完成两个方法就可以被抓取数据，对于指标应该由开发人员生成或者就单一功能的exporter直接预先指定。

这里给出一个文章同时给出自己的实践：到这里基础知识介绍的差不多了，这是快速了解的最后一个只是点，其余知识点暂时不做补充或者后续补充，让我们来看一下exporter的知识点~

1、首先创建指标

// 指标结构体
type Metrics struct {
           metrics map[string]*prometheus.Desc
           mutex   sync.Mutex
}

2、创建指标描述符

func newGlobalMetric(namespace string, metricName string, docString string, labels []string) *prometheus.Desc {
           return prometheus.NewDesc(namespace+"_"+metricName, docString, labels, nil)
}

3、初始化指标信息，即Metrics结构体

func NewMetrics(namespace string) *Metrics {
           return &Metrics{
                      metrics: map[string]*prometheus.Desc{
                                 "log_metric": newGlobalMetric(namespace,
                                            "log_metric",
                                            "The description of log_metric",
                                            []string{"server", "code"},
                                 ),
                      },
           }
}

4、完成两个方法

func (c *Metrics) Describe(ch chan<- *prometheus.Desc) {
           for _, m := range c.metrics {
                      ch <- m
           }
}

func (c *Metrics) Collect(ch chan<- prometheus.Metric) {
           defer Lock.Unlock()
           Lock.Lock()
           ch <- prometheus.MustNewConstMetric(c.metrics["xxxxx"], prometheus.CounterValue, testVal, label1, label2...)
           return
}

5、注册服务

metrics := collector.NewMetrics("your_Metrics")
registry := prometheus.NewRegistry()
registry.MustRegister(metrics)
router.Get("/metrics", promhttp.HandlerFor(registry, promhttp.HandlerOpts{}))

缺啥少啥包补上就行

九、分布式方案

常用prometheus分布式架构：

上面说过prometheus对分布式支持不是很好，我们先来看一下正常建议的联邦机制和拆分机制：

1）联邦允许 Prometheus 能够扩展到十几个数据中心和上百万的节点。在此场景下，联邦拓扑类似一个树形拓扑结构，上层的 Prometheus 服务器从大量的下层 Prometheus 服务器中收集和汇聚的时序数据。

2）拆分永远是解决问题的一个好思路。尤其是如果不需要一个全局的监控。我们可以根据环境或是业务类型，甚至是根据部门类型来拆分，但是会带来以下问题：

1、缺乏一个全局的视图，我们仍然无法跨不同集群查询服务，并且无法真正执行全局查询。

2、给统一报警增加了困难。

3、如果群集是动态的或变化的，则每次群集中部署Prometheus时，您通常都需要实现一种自动向Grafana添加数据源的方法。

以上本质上Prometheus的单机能力依旧没有得到解决。再说一个痛点，就是在抓取时必须保证业务服务是可触达的，所以对服务部署的要求会高

业界案例：

360多啦A梦：

以上是360的基于prometheus监控体系--多啦A梦，（多啦A梦延续的是第一种解决方案，金字塔型搭建）在下层集群中每个prometheus实例抓取固定的集群的目标，在本地计算完数据再由上层prometheus节点进行汇总，最后由一个节点进行汇总，这样实现了分布式

抓取的金字塔结构，好处是：最终master的prometheus实例拥有全部指标数据，方便UI查询。但是也只有一台进行读写db，路径长，损失数据可能性增加，上层prometheus实例可能跨集群。值得一提的是：

360延续了filebeat->kafka(Qbus)->logstatus->exporter的elk日志监控设计，通过定制的Qbus来保证了高可用，一般情况下kafka的抖动不要忽略。

架构设计简介：

1、log采集端以及sdk

首先日志监控这块将服务产生的日志数据挂载到了宿主机上，因为无法高质量解决kafka rebalance问题，log-exporter继承了filebeat，logstatsh和exporter的功能，负责统计计数暴露prometheus拉取接口的功能

sdk采用同样的方法，宏观上看接入sdk的服务都将成为exporter向server暴露

2、常驻实例

每个node上都有一个node-exporter和cadvisor，分别对宿主机和容器进行监控

3、对prometheus的改造

方案--在k8s集群内prometheus的任务分摊：一个prometheus实例的抓取还是有极限的，所以我们直接在数据源上做文章做任务分摊。这个地方可以在两处解决，1）服务发现处。2）目标最后抓取处 1）在服务发现处做任务分摊我们需要了解源码中的这个结构体：

type ServiceDiscoveryConfig struct {
           // List of labeled target groups for this job.
           StaticConfigs []*targetgroup.Group `yaml:"static_configs,omitempty"`
           // List of DNS service discovery configurations.
           DNSSDConfigs []*dns.SDConfig `yaml:"dns_sd_configs,omitempty"`
           // List of file service discovery configurations.
           FileSDConfigs []*file.SDConfig `yaml:"file_sd_configs,omitempty"`
           // List of Consul service discovery configurations.
           ConsulSDConfigs []*consul.SDConfig `yaml:"consul_sd_configs,omitempty"`
           // List of DigitalOcean service discovery configurations.
           DigitalOceanSDConfigs []*digitalocean.SDConfig `yaml:"digitalocean_sd_configs,omitempty"`
           // List of Docker Swarm service discovery configurations.
           DockerSwarmSDConfigs []*dockerswarm.SDConfig `yaml:"dockerswarm_sd_configs,omitempty"`
           // List of Serverset service discovery configurations.
           ServersetSDConfigs []*zookeeper.ServersetSDConfig `yaml:"serverset_sd_configs,omitempty"`
           // NerveSDConfigs is a list of Nerve service discovery configurations.
           NerveSDConfigs []*zookeeper.NerveSDConfig `yaml:"nerve_sd_configs,omitempty"`
           // MarathonSDConfigs is a list of Marathon service discovery configurations.
           MarathonSDConfigs []*marathon.SDConfig `yaml:"marathon_sd_configs,omitempty"`
           // List of Kubernetes service discovery configurations.
           KubernetesSDConfigs []*kubernetes.SDConfig `yaml:"kubernetes_sd_configs,omitempty"`
           // List of GCE service discovery configurations.
           GCESDConfigs []*gce.SDConfig `yaml:"gce_sd_configs,omitempty"`
           // List of EC2 service discovery configurations.
           EC2SDConfigs []*ec2.SDConfig `yaml:"ec2_sd_configs,omitempty"`
           // List of OpenStack service discovery configurations.
           OpenstackSDConfigs []*openstack.SDConfig `yaml:"openstack_sd_configs,omitempty"`
           // List of Azure service discovery configurations.
           AzureSDConfigs []*azure.SDConfig `yaml:"azure_sd_configs,omitempty"`
           // List of Triton service discovery configurations.
           TritonSDConfigs []*triton.SDConfig `yaml:"triton_sd_configs,omitempty"`
}

结构体展示了所有支持的服务发现类型，当然也可以添加自己的服务发现类型，只要实现规定方法即可。在服务发现做分摊的好处是首先从源头切断了数据，这样后续可以避免好多内存操作，同时对ui的展示影响降低到最小（只是发现不了任务而已），缺点是你看我的图，你细品你得改多少地方 2）在抓取时嗯，前面一切都走通了，但是首先不说前面内存浪费，因为你要舍弃，在抓取的时候要解决各种ui展示问题空值0值，还有写入远程0值的问题（0值也是prometheus的监控值），当然这个地方我们以后会优化掉

两者选其一，准备采取第一种，因为之前使用prometheus加入了自己的服务发现已经有过更改，目前基于k8s的服务发现也只走这一种，所以采用成本较小的。

4、alertmanager（未来可能去掉这个组件）

接收请求后直接利用配置文件中唯一的接收者webhook来传递报警