Prometheus使用初步
|Word Count:1.3k|Reading Time:5mins|Post Views:
概述
Prometheus是新一代监控系统解决方案,可以和Kubernetes无缝对接,是容器监控的不二之选,其功能组件有:
- Prometheus Server,主程序,同时也是一个时序数据库
- AlertManager,告警组件
- Pushgateway 中间网管组件
- Data visualization and export 数据展示组件
- Service discovery 服务发现组件

部署
Server
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53
| # promtheus server 的部署 podman pull prom/prometheus:v.3.1.0 mkdir ~/prometheus # 创建配置文件 cat > ~/prometheus/prometheus.yml <<EOF global: scrape_interval: 15s scrape_timeout: 10s evaluation_interval: 15s
alerting: alertmanagers: - follow_redirects: true scheme: http timeout: 10s static_configs: - targets: []
scrape_configs: - job_name: "prometheus" honor_timestamps: true scrape_interval: 15s scrape_timeout: 10s metrics_path: /metrics scheme: http follow_redirects: true static_configs: - targets: - localhost:9090
- job_name: "node" honor_timestamps: true scrape_interval: 15s scrape_timeout: 10s metrics_path: /metrics scheme: http follow_redirects: true static_configs: - targets: - 192.168.24.10:9100
EOF podman run --name prometheus -d -p 9090:9090 -v /root/prometheus/prometheus.yml:/etc/prometheus/prometheus.yml docker.io/prom/prometheus:v3.1.0 # 开通防火墙端口 firewall-cmd --permanent --add-service=prometheus firewall-cmd --reload
# 配置自动启动 podman generate systemd --name prometheus > ~/prometheus/prometheus.service cp ~/prometheus/prometheus.service /etc/systemd/system/ systemctl daemon-reload systemctl enable --now prometheus.service
|
Node-exporter
1 2 3 4 5 6 7 8 9 10 11 12 13
| # 安装采集node podman pull prom/node-exporter:v1.8.2 podman run -d -p 9100:9100 prom/node-exporter:v1.8.2
# 开通防火墙端口 firewall-cmd --permanent --add-service=prometheus-node-exporter firewall-cmd --reload
# 配置自动启动 podman generate systemd --name nifty_lamarr > ~/prometheus/node-exporter.service cp ~/prometheus/node-exporter.service /etc/systemd/system/ systemctl daemon-reload systemctl enable --now node-exporter.service
|
Grafana
1 2 3 4 5 6 7 8 9 10 11 12 13
| # 安装web界面 podman pull grafana/grafana:11.4.0 mkdir ~/grafana_data podman run --name grafana -d -p 3000:3000 -v ~/grafana_data/:/grafana/db:Z grafana:11.4.0
# 开通防火墙 firewall-cmd --permanent --add-service=grafana irewall-cmd --reload
# 配置自动启动 podman generate systemd --name grafana > /etc/systemd/system/grafana.service systemctl daemon-reload systemctl enable --now grafana.service
|
展示
Server状态

登录

添加数据源

添加面板
导入面板21559

获取信息

监控
监控主机
安装插件
1 2 3 4 5 6 7 8
| # 新增一台主机192.168.24.100 # 采用包管理器部署node-exporter dnf install -y node-exporter systemctl enable --now prometheus-node-exporter.service
# 开放端口 firewall-cmd --permanent --add-port=9100/tcp firewall-cmd --reload
|
配置文件
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
| # 编辑Prometheus.yml文件,添加Node信息 - job_name: "node" honor_timestamps: true scrape_interval: 15s scrape_timeout: 10s metrics_path: /metrics scheme: http follow_redirects: true static_configs: - targets: - 192.168.24.10:9100 - 192.168.24.100:9100 # 重启server容器 podman restart prometheus
|
监控效果

监控Podman
安装插件
1 2 3 4 5 6 7 8 9 10 11 12 13
| # 容器部署 # 拉取podman监控exporter podman pull quay.io/navidys/prometheus-podman-exporter:v1.14.0 systemctl enable --now podman.socket podman run -e CONTAINER_HOST=unix:///run/podman/podman.sock -v /run/podman/podman.sock:/run/podman/podman.sock -u root -p 9882:9882 --security-opt label=disable quay.io/navidys/prometheus-podman-exporter:v1.14.0
# 也可以采用包部署 dnf -y install prometheus-podman-exporter systemctl enable --now prometheus-podman-exporter.service
# 开放端口 firewall-cmd --permanent --add-port=9882/tcp firewall-cmd --reload
|
配置文件
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
| # 编辑Prometheus.yml文件,添加Podman信息 - job_name: "podman" honor_timestamps: true scrape_interval: 15s scrape_timeout: 10s metrics_path: /metrics scheme: http follow_redirects: true static_configs: - targets: - 192.168.24.10:9882 - 192.168.24.100:9882
# 重启server容器 podman restart prometheus
|
监控效果
导入面板21559

监控Nginx
配置文件
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
| # 安装Nginx插件 dnf install -y nignx nginx-mod-vts
# 修改配置 http { …… vhost_traffic_status_zone; # 添加配置
server { …… # 添加以下配置 location /status { vhost_traffic_status_display; vhost_traffic_status_display_format html; } }
# 启动服务 systemctl enable --now nginx firewall-cmd --permanent --add-service={http,https} firewall-cmd --reload
|

监控效果
导入面板9785

告警
告警能力在Prometheus的架构中被划分成两个独立的部分。通过在Prometheus中定义AlertRule(告警规则),Prometheus会周期性的对告警规则进行计算,如果满足告警触发条件就会向Alertmanager发送告警信息。

在Prometheus中一条告警规则主要由以下几部分组成:
- 告警名称:用户需要为告警规则命名,当然对于命名而言,需要能够直接表达出该告警的主要内容
- 告警规则:告警规则实际上主要由PromQL进行定义,其实际意义是当表达式(PromQL)查询结果持续多长时间(During)后出发告警
Alertmanager作为一个独立的组件,负责接收并处理来自Prometheus Server(也可以是其它的客户端程序)的告警信息。Alertmanager可以对这些告警信息进行进一步的处理,比如当接收到大量重复告警时能够消除重复的告警信息,同时对告警信息进行分组并且路由到正确的通知方,Prometheus内置了对邮件,Slack等多种通知方式的支持,同时还支持与Webhook的集成,以支持更多定制化的场景。

Alertmanager
创建配置文件
Alertmangager配置
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38
| # 默认配置文件 global: resolve_timeout: 5m
route: group_by: ['alertname'] group_wait: 10s group_interval: 10s repeat_interval: 1h receiver: 'web.hook' receivers: - name: 'web.hook' webhook_configs: - url: 'http://127.0.0.1:5001/' inhibit_rules: - source_match: severity: 'critical' target_match: severity: 'warning' equal: ['alertname', 'dev', 'instance'] # 邮件通知配置文件 global: smtp_smarthost: smtp.gmail.com:587 smtp_from: <smtp mail from> smtp_auth_username: <usernae> smtp_auth_identity: <username> smtp_auth_password: <password>
route: group_by: ['alertname'] receiver: 'default-receiver'
receivers: - name: default-receiver email_configs: - to: <mail to address> send_resolved: true
|
关联Prometheus
1 2 3 4
| alerting: alertmanagers: - static_configs: - targets: ['localhost:9093']
|
启动程序
1 2 3 4 5 6 7 8 9 10 11 12 13
| # 拉取镜像 podman pull prom/alertmanager:v0.28.0 podman run --name alertmanager -d -p 9093:9093 -v ~/prometheus/alertmanager.yml:/etc/alertmanager/alertmanager.yml prom/alertmanager:v0.28.0
# 开通防火墙端口 firewall-cmd --permanent --add-port=9093/tcp firewall-cmd --reload
# 配置自动启动 podman generate systemd --name alertmanager > /etc/systemd/system/alertmanager.service
systemctl daemon-reload systemctl enable --now alertmanager.service
|