Prometheus安装
本教程在centos7.9下操作
Prometheus介绍
Prometheus是Cloud Native Computing Foundation项目,是一个系统和服务监视系统。
它以给定的时间间隔从已配置的目标收集指标,显示结果,实现告警。
Prometheus适用于
- 用来做机器状态或者服务状态的监控
- 单机部署时不依赖其他网络存储和服务,用来快速排查问题
Prometheus不适用于
- 需要收集完整准确的计费数据,因为它本身就是固定时间间隔进行采样
下载地址也是在Github Releases上,根据自己的情况选择对应版本。
新增启动服务用户和用户组
groupadd prometheus
useradd -g prometheus -m -d /home/prometheus -s /sbin/nologin prometheus
开始安装
本教程安装2.46.0/2023-07-25
版本
cd ~/Downloads
wget https://github.com/prometheus/prometheus/releases/download/v2.46.0/prometheus-2.46.0.linux-amd64.tar.gz
# 查看文件hash
sha256sum prometheus-2.46.0.linux-amd64.tar.gz
tar -zxvf prometheus-2.46.0.linux-amd64.tar.gz -C /home/local/prometheus/
cd /home/prometheus/prometheus-2.46.0.linux-amd64
ln -rs prometheus /usr/local/bin/prometheus
ln -rs promtool /usr/local/bin/promtool
制作systemd服务
cat <<EOF | tee /etc/systemd/system/prometheus.service
[Unit]
Description=Prometheus Server
Documentation=https://prometheus.io/docs/introduction/overview/
After=network-online.target
[Service]
User=prometheus
Group=prometheus
Restart=on-failure
ExecStart=/home/prometheus/prometheus-2.46.0.linux-amd64/prometheus --config.file=/homw/prometheus/prometheus-2.46.0.linux-amd64/prometheus.yml --web.listen-address=0.0.0.0:19080 --storage.tsdb.path=/data/prometheus
[Install]
WantedBy=multi-user.target
EOF
修改目录权限
mkdir -p /data/prometheus
chown -R prometheus:prometheus /data/prometheus
启动/停止/重启服务
启动后,端口默认为9090,可在启动命令时添加--web.listen-address
参数指定端口号,本教程端口号为:19080
# 启动服务
systemctl start prometheus
# 停止服务
systemctl stop prometheus
# 重启服务
systemctl restart prometheus
# 查看服务状态
systemctl status prometheus
# 设置为开机启动
systemctl enable prometheus
# 取消设置为开机启动
systemctl disable prometheus
prometheus管理页面增加基础验证
基于http basic auth,此教程帐号密码都设置为prom
密码不能是明文,必须经过bcrypt
处理,点击在线bcrypt
mkdir -p /home/prometheus/prometheus-2.46.0.linux-amd64/conf
chown -R prometheus:prometheus /home/prometheus/prometheus-2.46.0.linux-amd64/conf
vi /home/prometheus/prometheus-2.46.0.linux-amd64/conf/web.yml
# 贴入内容开始
basic_auth_users:
prom: $2a$10$Y99wbAh3XI.cq2n0tB9lAOHrRSQ3sZ/iMXRYtlP5xjU1W0JsPQtTS
# 贴入内容结束
#保存退出后,修改prometheus.service文件
#启动命令增加`--web.config.file=/home/prometheus/prometheus-2.46.0.linux-amd64/conf/web.yml`配置
#重启服务,浏览器打开`http://localhost:19080`输入帐号密码即可
systemctl daemon-reload && systemctl restart prometheus
提示
prometheus增加basic auth验证后,相应的grafana数据源也需要配置上帐号密码
将node exporter添加到prometheus监控中
node_exporter安装教程查看这里
vi /home/prometheus/prometheus-2.46.0.linux-amd64/prometheus.yaml
# 修改scrape_configs,增加一个job_name,名称为: node-exporter
...
- job_name: 'node-exporter'
static_configs:
- targets: ['localhost:19100']
labels:
env: 'local'
...
加完配置后,使用自带工具测试配置文件是否有问题
promtool check config /home/prometheus/prometheus-2.46.0.linux-amd64/prometheus.yml
重启prometheus
systemctl restart prometheus
使用influxDB存储prometheus数据
默认的情况下,Prometheus的数据存放时间是15天,也就是说,这样无法和之前的数据来对比,所以需要引入influxDB来存放历史数据
查看此文档安装和使用influxDB
在influxdb1.x中,prometheus可以直接将数据写入influxdb中,influxdb2.0的改动比较大,使用telegraf 插件来收集指标数据到influxdb中。所以这儿我们使用telegraf 来将prometheus的数据同步到influxdb中。在influxdb1.x中,使用Influxql来查询数据,在influxdb2.0中,引入了自己的脚本语言(flux)来做查询,如果在influxdb2.0中想使用influx1.0版本通过influxql来查询数据的话,需要在influxdb中配置database和retention policy,详情见官网。
influxdb2.0,使用tick(telegraf、influxdb2.0,chronnograf、kapacitor)组合来处理指标数据的收集、存储、展示和报警。
安装telegraf插件
这里安装v1.27.3
版本,使用yum安装
cat <<EOF | sudo tee /etc/yum.repos.d/influxdata.repo
[influxdata]
name = InfluxData Repository - Stable
baseurl = https://repos.influxdata.com/stable/\$basearch/main
enabled = 1
gpgcheck = 1
gpgkey = https://repos.influxdata.com/influxdata-archive_compat.key
EOF
# 安装
yum install -y telegraf
配置telegraf
配置文件所在路径:/etc/telegraf/telegraf.conf
- output influxdb 配置
可以从influxdb页面把配置粘贴过来,页面路径:Load Data > TELEGRAF > INFLUXDB OUTPUT PLUGIN
[[outputs.influxdb_v2]]
## The URLs of the InfluxDB cluster nodes.
##
## Multiple URLs can be specified for a single cluster, only ONE of the
## urls will be written to each interval.
## ex: urls = ["https://us-west-2-1.aws.cloud2.influxdata.com"]
urls = ["http://127.0.0.1:18087"]
## API token for authentication.
token = "$INFLUX_TOKEN"
## Organization is the name of the organization you wish to write to; must exist.
organization = "YOUR API TOKEN"
## Destination bucket to write into.
bucket = "prometheus"
## The value of this tag will be used to determine the bucket. If this
## tag is not set the 'bucket' option is used as the default.
# bucket_tag = ""
## If true, the bucket tag will not be added to the metric.
# exclude_bucket_tag = false
## Timeout for HTTP messages.
timeout = "10s"
## Additional HTTP headers
# http_headers = {"X-Special-Header" = "Special-Value"}
## HTTP Proxy override, if unset values the standard proxy environment
## variables are consulted to determine which proxy, if any, should be used.
# http_proxy = "http://corporate.proxy:3128"
## HTTP User-Agent
# user_agent = "telegraf"
## Content-Encoding for write request body, can be set to "gzip" to
## compress body or "identity" to apply no encoding.
# content_encoding = "gzip"
## Enable or disable uint support for writing uints influxdb 2.0.
# influx_uint_support = false
## Optional TLS Config for use on HTTP connections.
# tls_ca = "/etc/telegraf/ca.pem"
# tls_cert = "/etc/telegraf/cert.pem"
# tls_key = "/etc/telegraf/key.pem"
## Use TLS but skip chain & host verification
# insecure_skip_verify = false
- input prometheus的配置
可以从github上获取配置
# Read metrics from one or many prometheus clients
[[inputs.prometheus]]
## An array of urls to scrape metrics from.
urls = ["http://localhost:9100/metrics"]
## Metric version controls the mapping from Prometheus metrics into Telegraf metrics.
## See "Metric Format Configuration" in plugins/inputs/prometheus/README.md for details.
## Valid options: 1, 2
# metric_version = 1
## Url tag name (tag containing scrapped url. optional, default is "url")
# url_tag = "url"
## Whether the timestamp of the scraped metrics will be ignored.
## If set to true, the gather time will be used.
# ignore_timestamp = false
## An array of Kubernetes services to scrape metrics from.
# kubernetes_services = ["http://my-service-dns.my-namespace:9100/metrics"]
## Kubernetes config file to create client from.
# kube_config = "/path/to/kubernetes.config"
## Scrape Pods
## Enable scraping of k8s pods. Further settings as to which pods to scape
## are determiend by the 'method' option below. When enabled, the default is
## to use annotations to determine whether to scrape or not.
# monitor_kubernetes_pods = false
## Scrape Pods Method
## annotations: default, looks for specific pod annotations documented below
## settings: only look for pods matching the settings provided, not
## annotations
## settings+annotations: looks at pods that match annotations using the user
## defined settings
# monitor_kubernetes_pods_method = "annotations"
## Scrape Pods 'annotations' method options
## If set method is set to 'annotations' or 'settings+annotations', these
## annotation flags are looked for:
## - prometheus.io/scrape: Required to enable scraping for this pod. Can also
## use 'prometheus.io/scrape=false' annotation to opt-out entirely.
## - prometheus.io/scheme: If the metrics endpoint is secured then you will
## need to set this to 'https' & most likely set the tls config
## - prometheus.io/path: If the metrics path is not /metrics, define it with
## this annotation
## - prometheus.io/port: If port is not 9102 use this annotation
## Scrape Pods 'settings' method options
## When using 'settings' or 'settings+annotations', the default values for
## annotations can be modified using with the following options:
# monitor_kubernetes_pods_scheme = "http"
# monitor_kubernetes_pods_port = "9102"
# monitor_kubernetes_pods_path = "/metrics"
## Get the list of pods to scrape with either the scope of
## - cluster: the kubernetes watch api (default, no need to specify)
## - node: the local cadvisor api; for scalability. Note that the config node_ip or the environment variable NODE_IP must be set to the host IP.
# pod_scrape_scope = "cluster"
## Only for node scrape scope: node IP of the node that telegraf is running on.
## Either this config or the environment variable NODE_IP must be set.
# node_ip = "10.180.1.1"
## Only for node scrape scope: interval in seconds for how often to get updated pod list for scraping.
## Default is 60 seconds.
# pod_scrape_interval = 60
## Restricts Kubernetes monitoring to a single namespace
## ex: monitor_kubernetes_pods_namespace = "default"
# monitor_kubernetes_pods_namespace = ""
## The name of the label for the pod that is being scraped.
## Default is 'namespace' but this can conflict with metrics that have the label 'namespace'
# pod_namespace_label_name = "namespace"
# label selector to target pods which have the label
# kubernetes_label_selector = "env=dev,app=nginx"
# field selector to target pods
# eg. To scrape pods on a specific node
# kubernetes_field_selector = "spec.nodeName=$HOSTNAME"
## Filter which pod annotations and labels will be added to metric tags
#
# pod_annotation_include = ["annotation-key-1"]
# pod_annotation_exclude = ["exclude-me"]
# pod_label_include = ["label-key-1"]
# pod_label_exclude = ["exclude-me"]
# cache refresh interval to set the interval for re-sync of pods list.
# Default is 60 minutes.
# cache_refresh_interval = 60
## Scrape Services available in Consul Catalog
# [inputs.prometheus.consul]
# enabled = true
# agent = "http://localhost:8500"
# query_interval = "5m"
# [[inputs.prometheus.consul.query]]
# name = "a service name"
# tag = "a service tag"
# url = 'http://{{if ne .ServiceAddress ""}}{{.ServiceAddress}}{{else}}{{.Address}}{{end}}:{{.ServicePort}}/{{with .ServiceMeta.metrics_path}}{{.}}{{else}}metrics{{end}}'
# [inputs.prometheus.consul.query.tags]
# host = "{{.Node}}"
## Use bearer token for authorization. ('bearer_token' takes priority)
# bearer_token = "/path/to/bearer/token"
## OR
# bearer_token_string = "abc_123"
## HTTP Basic Authentication username and password. ('bearer_token' and
## 'bearer_token_string' take priority)
# username = ""
# password = ""
## Optional custom HTTP headers
# http_headers = {"X-Special-Header" = "Special-Value"}
## Specify timeout duration for slower prometheus clients (default is 5s)
# timeout = "5s"
## deprecated in 1.26; use the timeout option
# response_timeout = "5s"
## HTTP Proxy support
# use_system_proxy = false
# http_proxy_url = ""
## Optional TLS Config
# tls_ca = /path/to/cafile
# tls_cert = /path/to/certfile
# tls_key = /path/to/keyfile
## Use TLS but skip chain & host verification
# insecure_skip_verify = false
## Use the given name as the SNI server name on each URL
# tls_server_name = "myhost.example.org"
## TLS renegotiation method, choose from "never", "once", "freely"
# tls_renegotiation_method = "never"
## Enable/disable TLS
## Set to true/false to enforce TLS being enabled/disabled. If not set,
## enable TLS only if any of the other options are specified.
# tls_enable = true
## Control pod scraping based on pod namespace annotations
## Pass and drop here act like tagpass and tagdrop, but instead
## of filtering metrics they filters pod candidates for scraping
#[inputs.prometheus.namespace_annotation_pass]
# annotation_key = ["value1", "value2"]
#[inputs.prometheus.namespace_annotation_drop]
# some_annotation_key = ["dont-scrape"]
- 开启telegraf端口,供prometheus remote write使用
[[inputs.http_listener_v2]]
## Address and port to host HTTP listener on
service_address = ":18080"
## Paths to listen to.
paths = ["/receive"]
## Save path as http_listener_v2_path tag if set to true
# path_tag = false
## HTTP methods to accept.
# methods = ["POST", "PUT"]
## Optional HTTP headers
## These headers are applied to the server that is listening for HTTP
## requests and included in responses.
# http_headers = {"HTTP_HEADER" = "TAG_NAME"}
## maximum duration before timing out read of the request
# read_timeout = "10s"
## maximum duration before timing out write of the response
# write_timeout = "10s"
## Maximum allowed http request body size in bytes.
## 0 means to use the default of 524,288,000 bytes (500 mebibytes)
# max_body_size = "500MB"
## Part of the request to consume. Available options are "body" and
## "query".
# data_source = "body"
## Set one or more allowed client CA certificate file names to
## enable mutually authenticated TLS connections
# tls_allowed_cacerts = ["/etc/telegraf/clientca.pem"]
## Add service certificate and key
# tls_cert = "/etc/telegraf/cert.pem"
# tls_key = "/etc/telegraf/key.pem"
## Minimal TLS version accepted by the server
# tls_min_version = "TLS12"
## Optional username and password to accept for HTTP basic authentication.
## You probably want to make sure you have TLS configured above for this.
# basic_username = "foobar"
# basic_password = "barfoo"
## Optional setting to map http headers into tags
## If the http header is not present on the request, no corresponding tag will be added
## If multiple instances of the http header are present, only the first value will be used
# http_header_tags = {"HTTP_HEADER" = "TAG_NAME"}
## Data format to consume.
## Each data format has its own unique set of configuration options, read
## more about them here:
## https://github.com/influxdata/telegraf/blob/master/docs/DATA_FORMATS_INPUT.md
data_format = "prometheusremotewrite"
- 配置prometheus远程读写
...省略配置
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: "prometheus"
# metrics_path defaults to '/metrics'
# scheme defaults to 'http'.
static_configs:
- targets: ["localhost:19080"]
- job_name: 'node-exporter'
static_configs:
- targets: ['localhost:19100']
labels:
env: 'local'
remote_write:
- url: "http://localhost:18081/receive"
...省略配置