admin管理员组

文章数量:1794759

process exporter 监控进程并告警

process exporter 监控进程并告警

本文内容基于 k8s部署prometheus + grafana。


  • process-exporter介绍:

在 prometheus 中,process-exporter 可以用来检测所选进程的存活状态。

用法:

process-exporter [options] -config.path filename.yml

如果选择监控的进程并将其分组,可以提供命令行参数或使用 yaml 配置文件。推荐通过 -config.path 指定配置文件。

-config.path yaml 文件的常规格式是顶级 process_names 部分,其中包含名称匹配器列表:

process_names: - matcher1 - matcher2 ... - matcherN

deb/rpm 软件包附带的默认配置为:

process_names: - name: "{{.Comm}}" cmdline: - '.+'

一个进程仅可能属于一个组:即使匹配多个,也只会归属于第一个匹配的 groupname 组。

其中的每一项 process_names 都提供了用于识别和命名过程的方法。可选 name 标签定义用于命名匹配过程的模板;如果未指定,则 name 默认为 {{.ExeBase}}。

可用的模板变量:

{{.Comm}} 包含原始可执行文件的基本名称,即 /proc/<pid>/stat {{.ExeBase}} 包含可执行文件的基本名称 {{.ExeFull}} 包含可执行文件的标准路径 {{.Username}} 包含有效用户的用户名 {{.Matches}} 包含所有由于应用cmdline正则表达式而产生的匹配项 {{.PID}} 包含过程的PID。请注意,使用PID意味着该组将仅包含一个进程 {{.StartTime}} 包含过程的开始时间。与PID结合使用时,这很有用,因为PID会随着时间的推移而被重用

不建议使用 PID 或 StartTime:这并不会得到想要的结果,并且可能会导致 prometheus 遇到麻烦——metrics 基数过高。

process_exporter 配置参考:process-exporter

  • 安装process-exporter:
vim process.sh #!/bin/bash #用于安装process_exporter PROCESS_VER=0.7.5 PROCESS_DIR=/usr/local/process-exporter [ ! -d /software/ ] && mkdir /software install_process() { cd /software yum install -y wget if [ $? -eq 0 ] then echo -e "\\033[36myum安装依赖包成功\\033[0m" else echo -e "\\033[31myum安装依赖包失败,请检查\\033[0m" exit 1 fi [ ! -f process-exporter-$PROCESS_VER.linux-amd64.tar.gz ] && wget github/ncabatoff/process-exporter/releases/download/v$PROCESS_VER/process-exporter-$PROCESS_VER.linux-amd64.tar.gz [ ! -d process-exporter-$PROCESS_VER.linux-amd64 ] && tar xf process-exporter-$PROCESS_VER.linux-amd64.tar.gz [ ! -d $PROCESS_DIR ] && mv process-exporter-$PROCESS_VER.linux-amd64 $PROCESS_DIR cat > $PROCESS_DIR/process-exporter.yaml << EOF process_names: - name: "{{.Matches}}" cmdline: - 'redis-server' - name: "{{.Matches}}" cmdline: - 'mysqld' - name: "{{.Matches}}" cmdline: - 'org.apache.zookeeper.server.quorum.QuorumPeerMain' - name: "{{.Matches}}" cmdline: - 'org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer' - name: "{{.Matches}}" cmdline: - 'org.apache.hadoop.hdfs.qjournal.server.JournalNode' EOF id prometheus || useradd -M -s /sbin/nologin prometheus chown -R prometheus:prometheus $PROCESS_DIR cat > /usr/lib/systemd/system/process_exporter.service << EOF [Unit] Description=process_exporter Documentation=github/ncabatoff/process-exporter After=network.target [Service] Type=simple User=prometheus Group=prometheus WorkingDirectory=$PROCESS_DIR ExecStart=$PROCESS_DIR/process-exporter -config.path=$PROCESS_DIR/process-exporter.yaml Restart=always [Install] WantedBy=multi-user.target EOF systemctl daemon-reload && systemctl enable process_exporter systemctl start process_exporter if [ $? -eq 0 ] then echo -e "\\033[36mprocess_exporter安装完成\\033[0m" else echo -e "\\033[31mprocess_exporter安装失败\\033[0m" exit 1 fi } install_process sh process.sh
  • 修改配置:

按监控进程名称自定义该配置文件

vim /usr/local/process-exporter/process-exporter.yaml process_names: - name: "{{.Matches}}" cmdline: - 'sys#abut-exec.jar' - name: "{{.Matches}}" cmdline: - 'sys#open-exec.jar' - name: "{{.Matches}}" cmdline: - 'sys#activity-exec.jar' systemctl restart process_exporter
  • prometheus 添加监控:
vim prometheus/config.yaml #添加 - job_name: 'yty-process' #进程监控 static_configs: - targets: ['xxx.xxx.xxx.xxx:9256'] vim prometheus/rules.yaml #添加 - name: process rules: - alert: ProcessAbutDown expr: (namedprocess_namegroup_num_procs{groupname="map[:sys#abut-exec.jar]"}) == 0 for: 1m labels: severity: warning annotations: summary: "{{ $labels.instance }}: Process Abut-exec Down" description: "{{ $labels.instance }}: Process Abut-exec has been down for more than 1m" value: "{{ $value }}" - alert: ProcessOpenDown expr: (namedprocess_namegroup_num_procs{groupname="map[:sys#open-exec.jar]"}) == 0 for: 1m labels: severity: warning annotations: summary: "{{ $labels.instance }}: Process Open-exec Down" description: "{{ $labels.instance }}: Process Open-exec has been down for more than 1m" value: "{{ $value }}" - alert: ProcessActivityDown expr: (namedprocess_namegroup_num_procs{groupname="map[:sys#activity-exec.jar]"}) == 0 for: 1m labels: severity: warning annotations: summary: "{{ $labels.instance }}: Process Activity-exec Down" description: "{{ $labels.instance }}: Process Activity-exec has been down for more than 1m" value: "{{ $value }}" kubectl apply -f prometheus/ kubectl delete pod -n monitoring prometheus-b58f6d4c7-v8m7x

  • 测试告警:

任选一个监控的进程宕掉,

ps aux |grep abut |grep java |awk '{print $2}' | xargs kill

等待1m,收到钉钉告警,

重启该进程,收到恢复告警,

至此,process exporter 监控进程并告警配置完成。


本文标签: 进程PROCESSexporter