prometheus学习系列(六)

prometheus mail + wechat报警

prometheus学习系列(六)

prometheus mail + wechat报警

首先把alertmanager-main这个service改为NodePort的server.

➜  manifests git:(master) ✗ k get svc -n monitoring
NAME                    TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)                      AGE
alertmanager-main       NodePort    10.96.120.87    <none>        9093:31175/TCP               29h
alertmanager-operated   ClusterIP   None            <none>        9093/TCP,9094/TCP,9094/UDP   29h
grafana                 NodePort    10.96.163.124   <none>        3000:31263/TCP               29h
kube-state-metrics      ClusterIP   None            <none>        8443/TCP,9443/TCP            29h
node-exporter           ClusterIP   None            <none>        9100/TCP                     29h
prometheus-adapter      ClusterIP   10.96.121.50    <none>        443/TCP                      29h
prometheus-k8s          NodePort    10.96.129.132   <none>        9090:30525/TCP               29h
prometheus-operated     ClusterIP   None            <none>        9090/TCP                     29h
prometheus-operator     ClusterIP   None            <none>        8080/TCP                     29h
➜  manifests git:(master) ✗

可以使用svc“alertmanager-main”的nodeport查看Alertmanager的Status状态,获取访问prometheus地址和alertmanager地址.

➜  alertmanager git:(master) minikube service list

这些配置在manifests中的alertmanager-secret.yaml文件中

➜  manifests git:(master) ✗ pwd
/Users/steven/code/prometheus/kube-prometheus/manifests
➜  manifests git:(master) ✗ cat alertmanager-secret.yaml
apiVersion: v1
data:
  alertmanager.yaml: Imdsb2JhbCI6CiAgInJlc29sdmVfdGltZW91dCI6ICI1bSIKInJlY2VpdmVycyI6Ci0gIm5hbWUiOiAibnVsbCIKInJvdXRlIjoKICAiZ3JvdXBfYnkiOgogIC0gIm5hbWVzcGFjZSIKICAiZ3JvdXBfaW50ZXJ2YWwiOiAiNW0iCiAgImdyb3VwX3dhaXQiOiAiMzBzIgogICJyZWNlaXZlciI6ICJudWxsIgogICJyZXBlYXRfaW50ZXJ2YWwiOiAiMTJoIgogICJyb3V0ZXMiOgogIC0gIm1hdGNoIjoKICAgICAgImFsZXJ0bmFtZSI6ICJXYXRjaGRvZyIKICAgICJyZWNlaXZlciI6ICJudWxsIg==
kind: Secret
metadata:
  name: alertmanager-main
  namespace: monitoring
type: Opaque
➜  manifests git:(master) ✗

将alertmanager-secret.yaml对应的value值用base64进行解码,发现这里面的内容和web页面查看的内容一样.

➜  manifests git:(master) ✗ echo "Imdsb2JhbCI6CiAgInJlc29sdmVfdGltZW91dCI6ICI1bSIKInJlY2VpdmVycyI6Ci0gIm5hbWUiOiAibnVsbCIKInJvdXRlIjoKICAiZ3JvdXBfYnkiOgogIC0gIm5hbWVzcGFjZSIKICAiZ3JvdXBfaW50ZXJ2YWwiOiAiNW0iCiAgImdyb3VwX3dhaXQiOiAiMzBzIgogICJyZWNlaXZlciI6ICJudWxsIgogICJyZXBlYXRfaW50ZXJ2YWwiOiAiMTJoIgogICJyb3V0ZXMiOgogIC0gIm1hdGNoIjoKICAgICAgImFsZXJ0bmFtZSI6ICJXYXRjaGRvZyIKICAgICJyZWNlaXZlciI6ICJudWxsIg==" | base64 -d
"global":
  "resolve_timeout": "5m"
"receivers":
- "name": "null"
"route":
  "group_by":
  - "namespace"
  "group_interval": "5m"
  "group_wait": "30s"
  "receiver": "null"
  "repeat_interval": "12h"
  "routes":
  - "match":
      "alertname": "Watchdog"
    "receiver": "null"%                                                                                                                                       ➜  manifests git:(master) ✗

创建yaml文件

➜  manifests git:(master) ✗ pwd
/Users/steven/code/prometheus/kube-prometheus/manifests
➜  manifests git:(master) ✗ cat alertmanager.yaml
global:
  resolve_timeout: 5m
  smtp_smarthost: 'smtp.qq.com:465'
  smtp_from: '431054426@qq.com'
  smtp_auth_username: '431054426@qq.com'
  smtp_auth_password: ‘****’ # 生成的token
  smtp_require_tls: false   # 默认是true需要改成false

templates:
  - "/etc/alertmanager/config/*.tmpl"

route:
  group_by: ['job','cluster','service']
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 12h
  receiver: ‘wechat‘
  routes:
  - receiver: ‘email’
    group_wait: 10s
    match:
      alertname: KubeCPUOvercommit # 报警对象

receivers:
- name: ‘email’
  email_configs:
  - to: 'zky.linux@gmail.com'
    send_resolved: true
- name: 'wechat'
  wechat_configs:
  - corp_id: ‘*****’ # 在企业中也可以找到
    to_party: '8'
    to_user: 'steven'
    agent_id: '1000028'
    api_secret: ‘****’ # 可以在微信新建的应用中找到
    send_resolved: true

inhibit_rules:
  - source_match:
      severity: 'critical'
    target_match:
      severity: 'warning'
    equal: ['alertname', 'dev', 'instance']
➜  manifests git:(master) ✗

查看wechat.tmpl文件

➜  manifests git:(master) ✗ cat wechat.tmpl
{{ define "wechat.default.message" }}
{{- if gt (len .Alerts.Firing) 0 -}}
{{- range $index, $alert := .Alerts -}}
{{- if eq $index 0 -}}
告警类型: {{ $alert.Labels.alertname }}
告警级别: {{ $alert.Labels.severity }}

=====================
{{- end }}
===告警详情===
告警详情: {{ $alert.Annotations.message }}
故障时间: {{ $alert.StartsAt.Format "2006-01-02 15:04:05" }}
===参考信息===
{{ if gt (len $alert.Labels.instance) 0 -}}故障实例ip: {{ $alert.Labels.instance }};{{- end -}}
{{- if gt (len $alert.Labels.namespace) 0 -}}故障实例所在namespace: {{ $alert.Labels.namespace }};{{- end -}}
{{- if gt (len $alert.Labels.node) 0 -}}故障物理机ip: {{ $alert.Labels.node }};{{- end -}}
{{- if gt (len $alert.Labels.pod_name) 0 -}}故障pod名称: {{ $alert.Labels.pod_name }}{{- end }}
=====================
{{- end }}
{{- end }}

{{- if gt (len .Alerts.Resolved) 0 -}}
{{- range $index, $alert := .Alerts -}}
{{- if eq $index 0 -}}
告警类型: {{ $alert.Labels.alertname }}
告警级别: {{ $alert.Labels.severity }}

=====================
{{- end }}
===告警详情===
告警详情: {{ $alert.Annotations.message }}
故障时间: {{ $alert.StartsAt.Format "2006-01-02 15:04:05" }}
恢复时间: {{ $alert.EndsAt.Format "2006-01-02 15:04:05" }}
===参考信息===
{{ if gt (len $alert.Labels.instance) 0 -}}故障实例ip: {{ $alert.Labels.instance }};{{- end -}}
{{- if gt (len $alert.Labels.namespace) 0 -}}故障实例所在namespace: {{ $alert.Labels.namespace }};{{- end -}}
{{- if gt (len $alert.Labels.node) 0 -}}故障物理机ip: {{ $alert.Labels.node }};{{- end -}}
{{- if gt (len $alert.Labels.pod_name) 0 -}}故障pod名称: {{ $alert.Labels.pod_name }};{{- end }}
=====================
{{- end }}
{{- end }}
{{- end }}
➜  manifests git:(master) ✗

删除原来的Secret对象,用我们刚才创建的文件来创建Secret

➜  manifests git:(master) ✗ k get secrets -n monitoring | grep aler
alertmanager-main                 Opaque                                2      18h
alertmanager-main-token-lqf7k     kubernetes.io/service-account-token   3      2d3h
➜  manifests git:(master) ✗
➜  manifests git:(master) ✗ k delete secrets alertmanager-main -n monitoring
secret "alertmanager-main" deleted
➜  manifests git:(master) ✗ k create secret generic alertmanager-main --from-file=./alertmanager.yaml --from-file=./wechat.tmpl -n monitoring
secret/alertmanager-main created
➜  manifests git:(master) ✗

验证

➜  manifests git:(master) ✗ k delete secrets alertmanager-main -n monitoring
secret "alertmanager-main" deleted
➜  manifests git:(master) ✗ k create secret generic alertmanager-main --from-file=./alertmanager.yaml --from-file=./wechat.tmpl -n monitoring
secret/alertmanager-main created
➜  manifests git:(master) ✗ k get secrets -n monitoring | grep aler
alertmanager-main                 Opaque                                2      6s
alertmanager-main-token-lqf7k     kubernetes.io/service-account-token   3      2d3h
➜  manifests git:(master) ✗

验证是否收到邮件,需要注意的是,小心别当成垃圾邮件.

alertmanager的config文件已变更

可以登录pods查看到我们创建的2个文件

➜  manifests git:(master) ✗ k exec -it alertmanager-main-0 /bin/sh -n monitoring
Defaulting container name to alertmanager.
Use 'kubectl describe pod/alertmanager-main-0 -n monitoring' to see all of the containers in this pod.
/alertmanager $ ls /etc/alertmanager/config/
alertmanager.yaml  wechat.tmpl
/alertmanager $

See also