Prometheus使用初步
|Word Count:1.3k|Reading Time:5mins|Post Views:
概述
Prometheus是新一代监控系统解决方案,可以和Kubernetes无缝对接,是容器监控的不二之选,其功能组件有:
- Prometheus Server,主程序,同时也是一个时序数据库
- AlertManager,告警组件
- Pushgateway 中间网管组件
- Data visualization and export 数据展示组件
- Service discovery 服务发现组件
data:image/s3,"s3://crabby-images/8261f/8261fea8ad01c52c04bb13ee958886a81472ee46" alt=""
部署
Server
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53
| # promtheus server 的部署 podman pull prom/prometheus:v.3.1.0 mkdir ~/prometheus # 创建配置文件 cat > ~/prometheus/prometheus.yml <<EOF global: scrape_interval: 15s scrape_timeout: 10s evaluation_interval: 15s
alerting: alertmanagers: - follow_redirects: true scheme: http timeout: 10s static_configs: - targets: []
scrape_configs: - job_name: "prometheus" honor_timestamps: true scrape_interval: 15s scrape_timeout: 10s metrics_path: /metrics scheme: http follow_redirects: true static_configs: - targets: - localhost:9090
- job_name: "node" honor_timestamps: true scrape_interval: 15s scrape_timeout: 10s metrics_path: /metrics scheme: http follow_redirects: true static_configs: - targets: - 192.168.24.10:9100
EOF podman run --name prometheus -d -p 9090:9090 -v /root/prometheus/prometheus.yml:/etc/prometheus/prometheus.yml docker.io/prom/prometheus:v3.1.0 # 开通防火墙端口 firewall-cmd --permanent --add-service=prometheus firewall-cmd --reload
# 配置自动启动 podman generate systemd --name prometheus > ~/prometheus/prometheus.service cp ~/prometheus/prometheus.service /etc/systemd/system/ systemctl daemon-reload systemctl enable --now prometheus.service
|
Node-exporter
1 2 3 4 5 6 7 8 9 10 11 12 13
| # 安装采集node podman pull prom/node-exporter:v1.8.2 podman run -d -p 9100:9100 prom/node-exporter:v1.8.2
# 开通防火墙端口 firewall-cmd --permanent --add-service=prometheus-node-exporter firewall-cmd --reload
# 配置自动启动 podman generate systemd --name nifty_lamarr > ~/prometheus/node-exporter.service cp ~/prometheus/node-exporter.service /etc/systemd/system/ systemctl daemon-reload systemctl enable --now node-exporter.service
|
Grafana
1 2 3 4 5 6 7 8 9 10 11 12 13
| # 安装web界面 podman pull grafana/grafana:11.4.0 mkdir ~/grafana_data podman run --name grafana -d -p 3000:3000 -v ~/grafana_data/:/grafana/db:Z grafana:11.4.0
# 开通防火墙 firewall-cmd --permanent --add-service=grafana irewall-cmd --reload
# 配置自动启动 podman generate systemd --name grafana > /etc/systemd/system/grafana.service systemctl daemon-reload systemctl enable --now grafana.service
|
展示
Server状态
data:image/s3,"s3://crabby-images/c7134/c71348cd00f117675fd6b19e4a305d75d93dd660" alt=""
登录
data:image/s3,"s3://crabby-images/33df3/33df31306bbef784f52b24bd42185c2f83bc2b5f" alt=""
添加数据源
data:image/s3,"s3://crabby-images/f3544/f35444b04a63f48d0e10d451a31984591410f569" alt=""
添加面板
导入面板21559
data:image/s3,"s3://crabby-images/439f1/439f1145c849fba60415fe446eaf4c1af4ef236f" alt=""
获取信息
data:image/s3,"s3://crabby-images/2dfb7/2dfb7095374d6d179408295b89c3f8d2053677fd" alt=""
监控
监控主机
安装插件
1 2 3 4 5 6 7 8
| # 新增一台主机192.168.24.100 # 采用包管理器部署node-exporter dnf install -y node-exporter systemctl enable --now prometheus-node-exporter.service
# 开放端口 firewall-cmd --permanent --add-port=9100/tcp firewall-cmd --reload
|
配置文件
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
| # 编辑Prometheus.yml文件,添加Node信息 - job_name: "node" honor_timestamps: true scrape_interval: 15s scrape_timeout: 10s metrics_path: /metrics scheme: http follow_redirects: true static_configs: - targets: - 192.168.24.10:9100 - 192.168.24.100:9100 # 重启server容器 podman restart prometheus
|
监控效果
data:image/s3,"s3://crabby-images/fb3fb/fb3fb50165747e04442789467141dea6dc809103" alt=""
监控Podman
安装插件
1 2 3 4 5 6 7 8 9 10 11 12 13
| # 容器部署 # 拉取podman监控exporter podman pull quay.io/navidys/prometheus-podman-exporter:v1.14.0 systemctl enable --now podman.socket podman run -e CONTAINER_HOST=unix:///run/podman/podman.sock -v /run/podman/podman.sock:/run/podman/podman.sock -u root -p 9882:9882 --security-opt label=disable quay.io/navidys/prometheus-podman-exporter:v1.14.0
# 也可以采用包部署 dnf -y install prometheus-podman-exporter systemctl enable --now prometheus-podman-exporter.service
# 开放端口 firewall-cmd --permanent --add-port=9882/tcp firewall-cmd --reload
|
配置文件
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
| # 编辑Prometheus.yml文件,添加Podman信息 - job_name: "podman" honor_timestamps: true scrape_interval: 15s scrape_timeout: 10s metrics_path: /metrics scheme: http follow_redirects: true static_configs: - targets: - 192.168.24.10:9882 - 192.168.24.100:9882
# 重启server容器 podman restart prometheus
|
监控效果
导入面板21559
data:image/s3,"s3://crabby-images/7b8af/7b8afce0957df13b786bc37c728ab394b6d057fd" alt=""
监控Nginx
配置文件
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
| # 安装Nginx插件 dnf install -y nignx nginx-mod-vts
# 修改配置 http { …… vhost_traffic_status_zone; # 添加配置
server { …… # 添加以下配置 location /status { vhost_traffic_status_display; vhost_traffic_status_display_format html; } }
# 启动服务 systemctl enable --now nginx firewall-cmd --permanent --add-service={http,https} firewall-cmd --reload
|
data:image/s3,"s3://crabby-images/54584/54584693eccb8ca176926b7ce7a689a41ad3d810" alt=""
监控效果
导入面板9785
data:image/s3,"s3://crabby-images/9523f/9523f6d42fbaf2f6a48e8ce1d759329c147115fd" alt=""
告警
告警能力在Prometheus的架构中被划分成两个独立的部分。通过在Prometheus中定义AlertRule(告警规则),Prometheus会周期性的对告警规则进行计算,如果满足告警触发条件就会向Alertmanager发送告警信息。
data:image/s3,"s3://crabby-images/a9bd8/a9bd871c4b7262b4872b98559f103f7173060faa" alt=""
在Prometheus中一条告警规则主要由以下几部分组成:
- 告警名称:用户需要为告警规则命名,当然对于命名而言,需要能够直接表达出该告警的主要内容
- 告警规则:告警规则实际上主要由PromQL进行定义,其实际意义是当表达式(PromQL)查询结果持续多长时间(During)后出发告警
Alertmanager作为一个独立的组件,负责接收并处理来自Prometheus Server(也可以是其它的客户端程序)的告警信息。Alertmanager可以对这些告警信息进行进一步的处理,比如当接收到大量重复告警时能够消除重复的告警信息,同时对告警信息进行分组并且路由到正确的通知方,Prometheus内置了对邮件,Slack等多种通知方式的支持,同时还支持与Webhook的集成,以支持更多定制化的场景。
data:image/s3,"s3://crabby-images/7599a/7599a02e45bfce1f576bf6c34314bc7e25e8b455" alt=""
Alertmanager
创建配置文件
Alertmangager配置
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38
| # 默认配置文件 global: resolve_timeout: 5m
route: group_by: ['alertname'] group_wait: 10s group_interval: 10s repeat_interval: 1h receiver: 'web.hook' receivers: - name: 'web.hook' webhook_configs: - url: 'http://127.0.0.1:5001/' inhibit_rules: - source_match: severity: 'critical' target_match: severity: 'warning' equal: ['alertname', 'dev', 'instance'] # 邮件通知配置文件 global: smtp_smarthost: smtp.gmail.com:587 smtp_from: <smtp mail from> smtp_auth_username: <usernae> smtp_auth_identity: <username> smtp_auth_password: <password>
route: group_by: ['alertname'] receiver: 'default-receiver'
receivers: - name: default-receiver email_configs: - to: <mail to address> send_resolved: true
|
关联Prometheus
1 2 3 4
| alerting: alertmanagers: - static_configs: - targets: ['localhost:9093']
|
启动程序
1 2 3 4 5 6 7 8 9 10 11 12 13
| # 拉取镜像 podman pull prom/alertmanager:v0.28.0 podman run --name alertmanager -d -p 9093:9093 -v ~/prometheus/alertmanager.yml:/etc/alertmanager/alertmanager.yml prom/alertmanager:v0.28.0
# 开通防火墙端口 firewall-cmd --permanent --add-port=9093/tcp firewall-cmd --reload
# 配置自动启动 podman generate systemd --name alertmanager > /etc/systemd/system/alertmanager.service
systemctl daemon-reload systemctl enable --now alertmanager.service
|