@zero1036 2018-08-22T06:52:18.000000Z 字数 3061 阅读 2769

Spring actuator prometheus

Java-Spring

spring actuator

actuator Endpoints详解

Endpoints	描述	是否需要鉴权
actuator	为其他端点提供“发现页面”。要求Spring HATEOAS在classpath路径上。	需要
auditevents	陈列当前应用程序的审计事件信息	需要
autoconfig	展示自动配置信息并且显示所有自动配置候选人以及他们“被不被”应用的原因	需要
beans	显示应用程序中所有Spring bean的完整列表。	需要
configprops	显示所有配置信息。	需要
dump	dump所有线程。	需要
env	陈列所有的环境变量。	需要
flyway	Shows any Flyway database migrations that have been applied	需要
health	显示应用程序运行状况信息	不需要
info	显示应用信息。	不需要
loggers	显示和修改应用程序中的loggers配置。	需要
liquibase	显示已经应用的任何Liquibase数据库迁移。	需要
metrics	显示当前应用程序的“指标”信息。	需要
mappings	显示所有@RequestMapping的url整理列表。	需要
shutdown	关闭应用（默认情况下不启用）。	需要
trace	显示跟踪信息（默认最后100个HTTP请求）。	需要

prometheus介绍

官网：https://prometheus.io/docs/prometheus/latest/querying/functions/
参考自：https://blog.ruhm.me/post/prometheus-intro/

prometheus架构与运作逻辑

大致使用逻辑是这样： 1. Prometheus server 定期从静态配置的 targets 或者服务发现的 targets 拉取数据。 2. 当新拉取的数据大于配置内存缓存区的时候，Prometheus 会将数据持久化到磁盘（如果使用 remote storage 将持久化到云端）。 3. Prometheus 可以配置 rules，然后定时查询数据，当条件触发的时候，会将 alert 推送到配置的 Alertmanager。 4. Alertmanager 收到警告的时候，可以根据配置，聚合，去重，降噪，最后发送警告。

Micrometer Api

Counter:是表示单个单调递增计数器的累积度量，其值只能增加,或在重启时重置为零。
Gauge:是一个瞬时度量，表示可以任意上下的单个数值。
Timer/Summary:记录持续时间和响应大小等，是Counter和Gauge的混合示例，有三个指标，总次数和总时长，和时间范围内的max值

官网：http://micrometer.io/docs

表达式函数

官网API文档：https://prometheus.io/docs/prometheus/latest/querying/functions/

increase()

increase():calculates the increase in the time series in the range vector.指定时间范围内的增量

过去5分钟内新增请求总数：increase(loan_total{appName="grafana-design-competition-server"}[5m])

sum() 、count():

sum():分组求和。类似select sum() group by。例如以下result是领取红包编码，表示每个红包的领取成功记录counter求和。

sum(action_draw_total{appName="red-packet-service",group="lc-yxx",instance="192.168.13.42:8024",type="winning"})by(result)

**注意：**sum()与count()的区别在于，count()方法是select count() group by。例如以下，表示为每个红包，结果为result = A，数量为1；result = b，数量为1 。。。

count(action_draw_total{appName="red-packet-service",group="lc-yxx",instance="192.168.13.42:8024",type="winning"})by(result)

组合使用可得，所有存在红包的总数，注意，不是所有红包的领取成功记录数之和：

sum(count(action_draw_total{appName="red-packet-service",group="lc-yxx",instance="192.168.13.42:8024",type="winning"})by(result))

rate():

rate():calculates the per-second average rate of increase of the time series in the range vector. 过去时间范围内，每秒的增量相对该时间范围内总量的占比。

1.QPS统计，根据过去5分钟内每秒增加请求数的占比：

rate(lz_http_requests_total{job="02_lzmh_microservice_base_service_docker"}[5m]) > 0

2.QPS统计，根据过去5分钟内每秒增加请求数的占比，并根据handler分组：

sum(rate(lz_http_requests_total{job="lzmh_microservice_weixin_applet_api"}[5m])) by (handler) > 0

3.平均响应时间占比：每秒请求总时长占比除以每秒请求总数占比：

(rate(lz_http_response_time_milliseconds_sum{job="02_lzmh_microservice_base_service_docker"}[5m]) /
rate(lz_http_response_time_milliseconds_count{job="02_lzmh_microservice_base_service_docker"}[5m])
) > 0

offset

offset：往过去偏移指定时长

获取一天的近4分钟内loan请求总数的增量：

increase(loan_total{appName="grafana-design-competition-server",type="success"}[5m] offset 1d)

环比计算方法：例如增量请求环比，以30分钟为一个周期，每个周期相对上一个周期请求增量之比

sum(increase(http_requests_total{appName="grafana-design-competition-server"}[10m] offset 10m)) by (appName)/
sum(increase(http_requests_total{appName="grafana-design-competition-server"}[10m])) by (appName)

附录

环比：表示连续2个单位周期（比如连续两月）内的量的变化比
同比：一般情况下是今年第n月与去年第n月比

例如：同比,比如,2012年3月比2011年3月叫同比。环比,比如,2012年3月比2012年2月叫环比。