Prometheus安装

zhengcog...大约 8 分钟

本教程在centos7.9下操作

Prometheus介绍

Prometheus是Cloud Native Computing Foundationopen in new window项目，是一个系统和服务监视系统。

它以给定的时间间隔从已配置的目标收集指标，显示结果，实现告警。

Prometheus适用于

用来做机器状态或者服务状态的监控
单机部署时不依赖其他网络存储和服务，用来快速排查问题

Prometheus不适用于

需要收集完整准确的计费数据，因为它本身就是固定时间间隔进行采样

下载地址也是在Github Releasesopen in new window上，根据自己的情况选择对应版本。

新增启动服务用户和用户组

groupadd prometheus
useradd -g prometheus -m -d /home/prometheus -s /sbin/nologin prometheus

开始安装

本教程安装2.46.0/2023-07-25版本

cd ~/Downloads
wget https://github.com/prometheus/prometheus/releases/download/v2.46.0/prometheus-2.46.0.linux-amd64.tar.gz
# 查看文件hash
sha256sum prometheus-2.46.0.linux-amd64.tar.gz

tar -zxvf prometheus-2.46.0.linux-amd64.tar.gz -C /home/local/prometheus/

cd /home/prometheus/prometheus-2.46.0.linux-amd64
ln -rs prometheus /usr/local/bin/prometheus
ln -rs promtool /usr/local/bin/promtool

制作systemd服务

cat <<EOF | tee /etc/systemd/system/prometheus.service
[Unit]
Description=Prometheus Server
Documentation=https://prometheus.io/docs/introduction/overview/
After=network-online.target

[Service]
User=prometheus
Group=prometheus
Restart=on-failure
ExecStart=/home/prometheus/prometheus-2.46.0.linux-amd64/prometheus --config.file=/homw/prometheus/prometheus-2.46.0.linux-amd64/prometheus.yml --web.listen-address=0.0.0.0:19080 --storage.tsdb.path=/data/prometheus

[Install]
WantedBy=multi-user.target
EOF

修改目录权限

mkdir -p /data/prometheus
chown -R prometheus:prometheus /data/prometheus

启动/停止/重启服务

启动后，端口默认为9090，可在启动命令时添加--web.listen-address参数指定端口号，本教程端口号为：19080

# 启动服务
systemctl start prometheus
# 停止服务
systemctl stop prometheus
# 重启服务
systemctl restart prometheus
# 查看服务状态
systemctl status prometheus
# 设置为开机启动
systemctl enable prometheus
# 取消设置为开机启动
systemctl disable prometheus

prometheus管理页面增加基础验证

基于http basic auth,此教程帐号密码都设置为prom

密码不能是明文，必须经过bcrypt处理，点击在线bcryptopen in new window

mkdir -p /home/prometheus/prometheus-2.46.0.linux-amd64/conf

chown -R prometheus:prometheus /home/prometheus/prometheus-2.46.0.linux-amd64/conf

vi /home/prometheus/prometheus-2.46.0.linux-amd64/conf/web.yml

# 贴入内容开始
basic_auth_users:
  prom: $2a$10$Y99wbAh3XI.cq2n0tB9lAOHrRSQ3sZ/iMXRYtlP5xjU1W0JsPQtTS
# 贴入内容结束

#保存退出后，修改prometheus.service文件
#启动命令增加`--web.config.file=/home/prometheus/prometheus-2.46.0.linux-amd64/conf/web.yml`配置
#重启服务，浏览器打开`http://localhost:19080`输入帐号密码即可
systemctl daemon-reload && systemctl restart prometheus

提示

prometheus增加basic auth验证后，相应的grafana数据源也需要配置上帐号密码

将node exporter添加到prometheus监控中

node_exporter安装教程查看这里

vi /home/prometheus/prometheus-2.46.0.linux-amd64/prometheus.yaml

# 修改scrape_configs，增加一个job_name，名称为: node-exporter

...
- job_name: 'node-exporter'
    static_configs:
    - targets: ['localhost:19100']
      labels:
        env: 'local' 
...

加完配置后，使用自带工具测试配置文件是否有问题

promtool check config /home/prometheus/prometheus-2.46.0.linux-amd64/prometheus.yml

重启prometheus

systemctl restart prometheus

使用influxDB存储prometheus数据

默认的情况下，Prometheus的数据存放时间是15天，也就是说，这样无法和之前的数据来对比，所以需要引入influxDB来存放历史数据

查看此文档安装和使用influxDB

在influxdb1.x中，prometheus可以直接将数据写入influxdb中，influxdb2.0的改动比较大，使用telegraf 插件来收集指标数据到influxdb中。所以这儿我们使用telegraf 来将prometheus的数据同步到influxdb中。在influxdb1.x中，使用Influxql来查询数据，在influxdb2.0中，引入了自己的脚本语言（flux）来做查询，如果在influxdb2.0中想使用influx1.0版本通过influxql来查询数据的话，需要在influxdb中配置database和retention policy，详情见官网。
influxdb2.0，使用tick（telegraf、influxdb2.0，chronnograf、kapacitor）组合来处理指标数据的收集、存储、展示和报警。

安装telegraf插件

插件文档地址open in new window

这里安装v1.27.3版本,使用yum安装

cat <<EOF | sudo tee /etc/yum.repos.d/influxdata.repo
[influxdata]
name = InfluxData Repository - Stable
baseurl = https://repos.influxdata.com/stable/\$basearch/main
enabled = 1
gpgcheck = 1
gpgkey = https://repos.influxdata.com/influxdata-archive_compat.key
EOF

# 安装
yum install -y telegraf

配置telegraf

配置文件所在路径：/etc/telegraf/telegraf.conf

output influxdb 配置

可以从influxdb页面把配置粘贴过来，页面路径：Load Data > TELEGRAF > INFLUXDB OUTPUT PLUGIN

[[outputs.influxdb_v2]]
  ## The URLs of the InfluxDB cluster nodes.
  ##
  ## Multiple URLs can be specified for a single cluster, only ONE of the
  ## urls will be written to each interval.
  ##   ex: urls = ["https://us-west-2-1.aws.cloud2.influxdata.com"]
  urls = ["http://127.0.0.1:18087"]

  ## API token for authentication.
  token = "$INFLUX_TOKEN"

  ## Organization is the name of the organization you wish to write to; must exist.
  organization = "YOUR API TOKEN"

  ## Destination bucket to write into.
  bucket = "prometheus"

  ## The value of this tag will be used to determine the bucket.  If this
  ## tag is not set the 'bucket' option is used as the default.
  # bucket_tag = ""

  ## If true, the bucket tag will not be added to the metric.
  # exclude_bucket_tag = false

  ## Timeout for HTTP messages.
  timeout = "10s"

  ## Additional HTTP headers
  # http_headers = {"X-Special-Header" = "Special-Value"}

  ## HTTP Proxy override, if unset values the standard proxy environment
  ## variables are consulted to determine which proxy, if any, should be used.
  # http_proxy = "http://corporate.proxy:3128"

  ## HTTP User-Agent
  # user_agent = "telegraf"

  ## Content-Encoding for write request body, can be set to "gzip" to
  ## compress body or "identity" to apply no encoding.
  # content_encoding = "gzip"

  ## Enable or disable uint support for writing uints influxdb 2.0.
  # influx_uint_support = false

  ## Optional TLS Config for use on HTTP connections.
  # tls_ca = "/etc/telegraf/ca.pem"
  # tls_cert = "/etc/telegraf/cert.pem"
  # tls_key = "/etc/telegraf/key.pem"
  ## Use TLS but skip chain & host verification
  # insecure_skip_verify = false

input prometheus的配置

可以从githubopen in new window上获取配置

# Read metrics from one or many prometheus clients
[[inputs.prometheus]]
  ## An array of urls to scrape metrics from.
  urls = ["http://localhost:9100/metrics"]

  ## Metric version controls the mapping from Prometheus metrics into Telegraf metrics.
  ## See "Metric Format Configuration" in plugins/inputs/prometheus/README.md for details.
  ## Valid options: 1, 2
  # metric_version = 1

  ## Url tag name (tag containing scrapped url. optional, default is "url")
  # url_tag = "url"

  ## Whether the timestamp of the scraped metrics will be ignored.
  ## If set to true, the gather time will be used.
  # ignore_timestamp = false

  ## An array of Kubernetes services to scrape metrics from.
  # kubernetes_services = ["http://my-service-dns.my-namespace:9100/metrics"]

  ## Kubernetes config file to create client from.
  # kube_config = "/path/to/kubernetes.config"

  ## Scrape Pods
  ## Enable scraping of k8s pods. Further settings as to which pods to scape
  ## are determiend by the 'method' option below. When enabled, the default is
  ## to use annotations to determine whether to scrape or not.
  # monitor_kubernetes_pods = false

  ## Scrape Pods Method
  ## annotations: default, looks for specific pod annotations documented below
  ## settings: only look for pods matching the settings provided, not
  ##   annotations
  ## settings+annotations: looks at pods that match annotations using the user
  ##   defined settings
  # monitor_kubernetes_pods_method = "annotations"

  ## Scrape Pods 'annotations' method options
  ## If set method is set to 'annotations' or 'settings+annotations', these
  ## annotation flags are looked for:
  ## - prometheus.io/scrape: Required to enable scraping for this pod. Can also
  ##     use 'prometheus.io/scrape=false' annotation to opt-out entirely.
  ## - prometheus.io/scheme: If the metrics endpoint is secured then you will
  ##     need to set this to 'https' & most likely set the tls config
  ## - prometheus.io/path: If the metrics path is not /metrics, define it with
  ##     this annotation
  ## - prometheus.io/port: If port is not 9102 use this annotation

  ## Scrape Pods 'settings' method options
  ## When using 'settings' or 'settings+annotations', the default values for
  ## annotations can be modified using with the following options:
  # monitor_kubernetes_pods_scheme = "http"
  # monitor_kubernetes_pods_port = "9102"
  # monitor_kubernetes_pods_path = "/metrics"

  ## Get the list of pods to scrape with either the scope of
  ## - cluster: the kubernetes watch api (default, no need to specify)
  ## - node: the local cadvisor api; for scalability. Note that the config node_ip or the environment variable NODE_IP must be set to the host IP.
  # pod_scrape_scope = "cluster"

  ## Only for node scrape scope: node IP of the node that telegraf is running on.
  ## Either this config or the environment variable NODE_IP must be set.
  # node_ip = "10.180.1.1"

  ## Only for node scrape scope: interval in seconds for how often to get updated pod list for scraping.
  ## Default is 60 seconds.
  # pod_scrape_interval = 60

  ## Restricts Kubernetes monitoring to a single namespace
  ##   ex: monitor_kubernetes_pods_namespace = "default"
  # monitor_kubernetes_pods_namespace = ""
  ## The name of the label for the pod that is being scraped.
  ## Default is 'namespace' but this can conflict with metrics that have the label 'namespace'
  # pod_namespace_label_name = "namespace"
  # label selector to target pods which have the label
  # kubernetes_label_selector = "env=dev,app=nginx"
  # field selector to target pods
  # eg. To scrape pods on a specific node
  # kubernetes_field_selector = "spec.nodeName=$HOSTNAME"

  ## Filter which pod annotations and labels will be added to metric tags
  #
  # pod_annotation_include = ["annotation-key-1"]
  # pod_annotation_exclude = ["exclude-me"]
  # pod_label_include = ["label-key-1"]
  # pod_label_exclude = ["exclude-me"]

  # cache refresh interval to set the interval for re-sync of pods list.
  # Default is 60 minutes.
  # cache_refresh_interval = 60

  ## Scrape Services available in Consul Catalog
  # [inputs.prometheus.consul]
  #   enabled = true
  #   agent = "http://localhost:8500"
  #   query_interval = "5m"

  #   [[inputs.prometheus.consul.query]]
  #     name = "a service name"
  #     tag = "a service tag"
  #     url = 'http://{{if ne .ServiceAddress ""}}{{.ServiceAddress}}{{else}}{{.Address}}{{end}}:{{.ServicePort}}/{{with .ServiceMeta.metrics_path}}{{.}}{{else}}metrics{{end}}'
  #     [inputs.prometheus.consul.query.tags]
  #       host = "{{.Node}}"

  ## Use bearer token for authorization. ('bearer_token' takes priority)
  # bearer_token = "/path/to/bearer/token"
  ## OR
  # bearer_token_string = "abc_123"

  ## HTTP Basic Authentication username and password. ('bearer_token' and
  ## 'bearer_token_string' take priority)
  # username = ""
  # password = ""

  ## Optional custom HTTP headers
  # http_headers = {"X-Special-Header" = "Special-Value"}

  ## Specify timeout duration for slower prometheus clients (default is 5s)
  # timeout = "5s"

  ## deprecated in 1.26; use the timeout option
  # response_timeout = "5s"

  ## HTTP Proxy support
  # use_system_proxy = false
  # http_proxy_url = ""

  ## Optional TLS Config
  # tls_ca = /path/to/cafile
  # tls_cert = /path/to/certfile
  # tls_key = /path/to/keyfile

  ## Use TLS but skip chain & host verification
  # insecure_skip_verify = false

  ## Use the given name as the SNI server name on each URL
  # tls_server_name = "myhost.example.org"

  ## TLS renegotiation method, choose from "never", "once", "freely"
  # tls_renegotiation_method = "never"

  ## Enable/disable TLS
  ## Set to true/false to enforce TLS being enabled/disabled. If not set,
  ## enable TLS only if any of the other options are specified.
  # tls_enable = true

  ## Control pod scraping based on pod namespace annotations
  ## Pass and drop here act like tagpass and tagdrop, but instead
  ## of filtering metrics they filters pod candidates for scraping
  #[inputs.prometheus.namespace_annotation_pass]
  # annotation_key = ["value1", "value2"]
  #[inputs.prometheus.namespace_annotation_drop]
  # some_annotation_key = ["dont-scrape"]

开启telegraf端口，供prometheus remote write使用

 [[inputs.http_listener_v2]]
   ## Address and port to host HTTP listener on
   service_address = ":18080"

   ## Paths to listen to.
   paths = ["/receive"]

   ## Save path as http_listener_v2_path tag if set to true
   # path_tag = false

   ## HTTP methods to accept.
   # methods = ["POST", "PUT"]

   ## Optional HTTP headers
   ## These headers are applied to the server that is listening for HTTP
   ## requests and included in responses.
   # http_headers = {"HTTP_HEADER" = "TAG_NAME"}

   ## maximum duration before timing out read of the request
   # read_timeout = "10s"
   ## maximum duration before timing out write of the response
   # write_timeout = "10s"

   ## Maximum allowed http request body size in bytes.
   ## 0 means to use the default of 524,288,000 bytes (500 mebibytes)
   # max_body_size = "500MB"

   ## Part of the request to consume.  Available options are "body" and
   ## "query".
   # data_source = "body"

   ## Set one or more allowed client CA certificate file names to
   ## enable mutually authenticated TLS connections
   # tls_allowed_cacerts = ["/etc/telegraf/clientca.pem"]

   ## Add service certificate and key
   # tls_cert = "/etc/telegraf/cert.pem"
   # tls_key = "/etc/telegraf/key.pem"

   ## Minimal TLS version accepted by the server
   # tls_min_version = "TLS12"

   ## Optional username and password to accept for HTTP basic authentication.
   ## You probably want to make sure you have TLS configured above for this.
   # basic_username = "foobar"
   # basic_password = "barfoo"

   ## Optional setting to map http headers into tags
   ## If the http header is not present on the request, no corresponding tag will be added
   ## If multiple instances of the http header are present, only the first value will be used
   # http_header_tags = {"HTTP_HEADER" = "TAG_NAME"}

   ## Data format to consume.
   ## Each data format has its own unique set of configuration options, read
   ## more about them here:
   ## https://github.com/influxdata/telegraf/blob/master/docs/DATA_FORMATS_INPUT.md
   data_format = "prometheusremotewrite"

配置prometheus远程读写


...省略配置
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: "prometheus"

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
      - targets: ["localhost:19080"]
  - job_name: 'node-exporter'
    static_configs:
    - targets: ['localhost:19100']
      labels:
        env: 'local'

remote_write:
 - url: "http://localhost:18081/receive"
...省略配置

昵称

邮箱

网址

按正序
按倒序
按热度

Prometheus安装

# Prometheus介绍

# 新增启动服务用户和用户组

# 开始安装

# 制作systemd服务

# 修改目录权限

# 启动/停止/重启服务

# prometheus管理页面增加基础验证

# 将node exporter添加到prometheus监控中

# 使用influxDB存储prometheus数据

# 安装telegraf插件

# 配置telegraf

预览: