Prommail+loki+grafana build log reporting and alarm services

use promtail + loki + grafana / Nightingale Build a log reporting and statistical analysis system. Of course, these components are also applicable to other log reporting.

Previous

The content comes from a failed attempt, and this article records the useful parts.

A game server has no information query interface, and can only view the player's entry, exit and status information from the printed log. In order to be a status notification robot, we are looking for solutions for log reporting and analysis everywhere.

In the process of trying, the log reporting and statistical analysis have been completed, but my requirement is to push the new log as an instant message to the processing module for semantic analysis and alarm. The finished software used focuses on the overall analysis and alarm within a certain period of time, so it cannot be used out of the box and needs secondary development.

The root cause of the problem is that LogQL focuses on aggregate queries (eg: number of occurrences/proportion/grouping within X minutes) and does not have a complete SQL query capability.

I have other things to do, so the secondary development is temporarily suspended.

However, the process of building the log file reporting and analysis system is still useful, so I write it here, hoping to help others.

Software Role

This construction involves three (four) software:

  1. Prommail: log collection and reporting tool on the client
  2. Loki: log aggregation system, receiving prommail log reports and aggregating storage, providing a LogQL query interface
  3. Grafana/Nightingale: a unified observation platform with log analysis/alarm and other functions

Data flow direction

This article takes reading Docker container log files as an example.

Reminder: It is not recommended to read the Docker Container log file directly. This article is only an example of reading the file log; It is recommended to use Docker API or logspout ilogtail And other tools to read Docker Container logs.

Reading the Docker log file is only an example of Loki reading the file log, Not recommended This practice;
For file logs, please reasonably configure log rotation (Docker, please reasonably set log max size and max file);
The impact of log rotation on loki is shown in:
https://grafana.com/docs/loki/latest/send-data/promtail/logrotation/

The data flow direction is as follows:

Several points of construction

None of you who can search this article is Xiaobai, so I will not teach you how to deploy from zero to one, just talk about the content I have modified.

Again, it is not recommended to read the Docker Container log file directly. This article is only an example of reading the file log; It is recommended to use Docker API or logspout``ilogtail And other tools to read Docker Container logs.

Docker logs add intuitive tags

By default, there is no container name and other information in the log file. After Japanese collection, different containers are distinguished by file name or ID, and the log file name is a full-length container ID.

To make the results easy to retrieve and more intuitive, Docker can add some tag parameters to the log:

See: https://docs.docker.com/config/containers/logging/log_tags/

  1. Docker run container (valid for this container):
     docker run --log-driver json-file --log-opt tag="{{.Name}}" XXX
  2. Append to Docker daemon.json (valid for containers running after this):
     "log-opts": { //... omitted previously "tag": "{{.ImageName}}" }

Reflected in the log, that is:

 { "log":"xxxx", "stream":"stdout ", "attrs":{ "tag":"test-container-1" }, "time":"2024-01-28T13:31:05.824003795Z" }

LOKI Add Label (s)

file: https://grafana.com/docs/loki/latest/get-started/labels/

I looked at it briefly, but didn't study it in depth copy sth. without catching its spirit Copy:

 pipeline_stages: - json: expressions: output: log stream: stream attrs: - json: expressions: tag: source: attrs - regex: expression: (? P<container_name>(?: [^|]*[^|])) source: tag - timestamp: format: RFC3339Nano source: time - labels: stream: container_name: - output: source: output

In this way, when you use LogQL query in grafana or nightingale, you can use the defined label as the keyword:

 {container_name="test-container-1"}

About alarms

After some data analysis, execute some scripts to complete the alarm.

Both Nightingale and Grafana support graphical interface configuration and user-defined alarm template. Combine LogSQL to create a 404 status code exceeds threshold in X minutes The alarm of is very simple.

last

Each major company has its own internal Metrics system, which is much richer in functions than those of open source (although most of them are also based on open source modifications).

The industry has expertise. If you want to catch up with the Metrics system in some commercial companies in terms of functional richness, the workload can not be solved by one or two laymen looking at it for two days; It is even more impossible to catch up on performance.

Zimiao haunting blog (azimiao. com) All rights reserved. Please note the link when reprinting: https://www.azimiao.com/10264.html
Welcome to the Zimiao haunting blog exchange group: three hundred and thirteen million seven hundred and thirty-two thousand

Comment

*

*