2024-11-10

Prometheus + Grafana 监控指南

1. 环境准备

1.1 Docker 安装 Prometheus

docker run -d --name prometheus \
    -p 9090:9090 \
    -v /path/to/prometheus.yml:/etc/prometheus/prometheus.yml \
    prom/prometheus

1.2 Docker 安装 Grafana

docker run -d --name grafana \
    -p 3000:3000 \
    grafana/grafana

2. 配置说明

2.1 应用配置

在 application.yml 中添加以下配置：

management:
  endpoints:
    web:
      exposure:
        include: prometheus,metrics,health
  metrics:
    tags:
      application: ${spring.application.name}
    enable:
      jvm: true
      process: true
      system: true
      thread-pool: true

2.2 Prometheus 配置

创建 prometheus.yml：

global:
  scrape_interval: 15s
  evaluation_interval: 15s

scrape_configs:
  - job_name: 'spring-threadpool'
    metrics_path: '/actuator/prometheus'
    static_configs:
      - targets: ['localhost:8080']

3. 监控指标

3.1 JVM 相关指标

jvm_memory_used_bytes: JVM内存使用情况
jvm_threads_states: JVM线程状态
system_cpu_usage: CPU使用率
process_cpu_usage: 进程CPU使用率

3.2 线程池相关指标

thread_pool_size: 线程池大小
thread_pool_active_threads: 活跃线程数
thread_pool_queue_size: 队列大小
thread_pool_completed_tasks: 已完成任务数
thread_pool_rejected_tasks: 被拒绝的任务数

4. Grafana 配置

4.1 添加数据源

访问 Grafana (http://localhost:3000)
配置 -> 数据源 -> 添加数据源
选择 Prometheus
URL 填写 http://localhost:9090
保存并测试

4.2 导入监控面板

创建新的 Dashboard
添加以下面板：
- JVM 内存使用情况
- 线程池监控
- CPU 使用率
- GC 情况

4.3 示例 PromQL 查询

# 线程池活跃线程数
thread_pool_active_threads{application="spring-threadpool"}

# CPU 使用率
rate(process_cpu_usage[1m])

# JVM 内存使用
jvm_memory_used_bytes{area="heap"}

5. 动态线程池配置

5.1 Nacos 配置

在 Nacos 控制台创建配置：

thread-pool:
  core-size: 10
  max-size: 20
  queue-capacity: 100
  keep-alive-time: 60

5.2 配置动态刷新

使用 @RefreshScope 注解实现配置的动态刷新：

@RefreshScope
@Configuration
public class ThreadPoolConfig {
    @Value("${thread-pool.core-size}")
    private int coreSize;
    // ... 其他配置
}

6. 告警配置

6.1 Prometheus 告警规则

groups:
- name: thread-pool-alerts
  rules:
  - alert: ThreadPoolQueueFull
    expr: thread_pool_queue_size > thread_pool_queue_capacity * 0.8
    for: 1m
    labels:
      severity: warning
    annotations:
      summary: "线程池队列即将满"

6.2 Grafana 告警

可以在 Grafana 面板中设置以下告警阈值：

线程池使用率 > 80%
任务拒绝率 > 0
CPU 使用率 > 80%