Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: update the doc of slow subs #19

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file modified en_US/modules/assets/slow_subscribers_statistics_2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified en_US/modules/assets/slow_subscribers_statistics_3.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified en_US/modules/assets/slow_subscribers_statistics_4.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
97 changes: 52 additions & 45 deletions en_US/modules/slow_subscribers_statistics.md
Original file line number Diff line number Diff line change
@@ -1,85 +1,92 @@

# Table of Contents

1. [Slow subscribers statistics](#org10c5186)
1. [Create module](#org395e425)
2. [Implementation note](#orgdd4d5d6)
1. [Latency calculation method](#org693a7e9)
2. [Average latency calculation method](#org712747a)
3. [Configuration description](#orgcd5b688)
4. [Slow subscribers record](#orgb3ce581)
1. [Slow subscribers statistics](#org0a58d32)
1. [Create module](#org7939dfc)
2. [Implementation note](#org417d240)
3. [Configuration description](#orgf0feb6e)
4. [Slow subscribers record](#orga6267c1)


<a id="org10c5186"></a>
<a id="org0a58d32"></a>

## Slow subscribers statistics
# Slow subscribers statistics

This function ranks subscribers in descending order according to the average latency of message transmission
This module ranks subscribers and topics in descending order according to the latency of message transmission

<a id="org395e425"></a>
<a id="org7939dfc"></a>

### Create module
## Create module

Open EMQ X Dashboard, click on the "Module" on the left. Then, select "add module":

![image](./assets/slow_subscribers_statistics_1.png)

Select the **Slow Subscribers Statistics** module, and then click *Start*

<a id="orgdd4d5d6"></a>
<a id="org417d240"></a>

### Implementation note
## Implementation note

This function will track the time consumption of QoS1 and QoS2 messages to complete the whole process of message transmission after arriving at EMQX, and then use the exponential moving average algorithm to calculate the average message transmission latency of the subscriber, and then rank the subscribers according to the latency.
This function will track the time consumption of the entire message transmission process after the QoS1 and QoS2 messages arrive at EMQX, and then calculate the message transmission latency according to the options in the configuration.
Afterwards, the subscribers and topics are ranked according to the latency.

Since QoS1 and QoS2 messages may fail to complete the transmission process due to various reasons at the same time, this function will also try to add subscribers to the ranking according to the time of expiration.
<a id="orgf0feb6e"></a>

Note: In order to avoid performance overhead, the minimum latency for statistical ranking is 100ms
## Configuration description

![image](./assets/slow_subscribers_statistics_2.png)

<a id="org693a7e9"></a>
- Stats Threshold

#### latency calculation method
*Stats Threshold* is used to determine whether subscribers can participate in statistics. If the latency of subscribers is lower than this value, they will not be counted

- QoS1
It starts from *publishing* messages to EMQX, until EMQX receives *puback*
- QoS2
It starts from *publishing* messages to EMQX, until EMQX receives *pubcomp*

<a id="org712747a"></a>
- Maximum Number of Statistics

#### Average latency calculation method
This field determines the upper limit of the number in the statistical record table

The average latency adopts [Exponential Moving Average Algorithm](https://en.wikipedia.org/wiki/Moving_average#Exponential_moving_average) to weight the transmission latency of each message, avoiding the influence of single jitter or historical extreme value on the average Value.
- Eviction Time of Record

The configuration of the number of samples is *`emqx.zone.latency_samples`*. The minimum value of this value in the *emqx.conf* is 1. when it is 1, all historical values will be ignored, which will make it impossible to avoid the impact of single transmission jitter on the results. This is not recommended.
*expire interval* controls the effective time of each piece of data in the statistical record. If the data has not been updated within this time range, it will be removed. (For example, after a message is sent, it is added to the statistics record because of the long latency. If the message is not sent again for a long time that exceeds this value, it will be cleared)

The larger the value, the greater the influence of the historical value. If it is too large, the average latency update may not reflect the current real latency in time.
- Stats Type

<a id="orgcd5b688"></a>
The ways to calculate the latency are as follows:

### Configuration description
1. whole

![image](./assets/slow_subscribers_statistics_2.png)
From the time the message arrives at EMQX until the message completes transmission

- latency threshold
*latency threshold* is used to determine whether subscribers can participate in statistics. If the latency of subscribers is lower than this value, they will not be counted
- Maximum number of statistics
This field determines the upper limit of the number in the statistical record table
- Effective duration
*Effective duration* controls the effective time of each piece of data in the statistical record. If the data has not been updated within this time range, it will be removed. (For example, after a message is sent, it is added to the statistics record because of the long latency. If the message is not sent again for a long time that exceeds this value, it will be cleared)
- Push interval
Slow subscribers statistics can be pushed to the system message*\(SYS/brokers/\)(node)/slow<sub>subs</sub>*. *Push interval* is used to control the time interval of the push, if set to 0, no push will be made
- Push QoS
QoS when pushing messages to the system
- Number of push batches
The batch mode is used to push messages to the system. If the slow subscribers statistics are large, pushing all messages at one time may lead to a block. This value can be appropriately reduced.
2. internal

From when the message arrives at EMQX until when EMQX starts delivering the message

3. response

From the time EMQX starts delivering the message, until the message completes transmission

Definition of message completion transmission:

<a id="orgb3ce581"></a>
1. QoS0

### Slow subscribers record
When EMQX starts to deliver

2. QoS1

When EMQX receives *puback* from the client

3. Qos2

When EMQX receives *pubcomp* from the client


<a id="orga6267c1"></a>

## Slow subscribers record

![image](./assets/slow_subscribers_statistics_3.png)

Under this tab, the subscriber information will be displayed in descending order according to the time latency. After Clicking *Client ID*, it will display the subscriber details, where you can analyze and find the problem.

![image](./assets/slow_subscribers_statistics_4.png)
Expand Down
Binary file modified zh_CN/modules/assets/slow_subscribers_statistics_2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified zh_CN/modules/assets/slow_subscribers_statistics_3.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified zh_CN/modules/assets/slow_subscribers_statistics_4.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
93 changes: 46 additions & 47 deletions zh_CN/modules/slow_subscribers_statistics.md
Original file line number Diff line number Diff line change
@@ -1,23 +1,21 @@

# Table of Contents

1. [慢订阅统计](#org10c5186)
1. [创建模块](#org395e425)
2. [实现说明](#orgdd4d5d6)
1. [时延计算方法](#org693a7e9)
2. [平均时延计算方法](#org712747a)
3. [配置说明](#orgcd5b688)
4. [慢订阅记录](#orgb3ce581)
1. [慢订阅统计](#org0a58d32)
1. [开启模块](#org7939dfc)
2. [实现说明](#org417d240)
3. [配置说明](#orgf0feb6e)
4. [慢订阅记录](#orga6267c1)


<a id="org10c5186"></a>
<a id="org0a58d32"></a>

# 慢订阅统计

该功能按照消息传输的平均时延, 从高到低对订阅者进行排名
该功能按照消息传输的耗时, 从高到低对订阅者进行排名


<a id="org395e425"></a>
<a id="org7939dfc"></a>

## 创建模块

Expand All @@ -28,70 +26,71 @@
选择 ****慢订阅统计**** 模块, 然后点击 *启动* 即可


<a id="orgdd4d5d6"></a>
<a id="org417d240"></a>

## 实现说明

该功能会追踪QoS1和QoS2消息到达EMQX后,完成消息传输全流程的时间消耗,然后采用指数移动平均算法,计算该订阅者的平均消息传输
时延,之后按照时延高低对订阅者进行统计排名
该功能会追踪 QoS1 和 QoS2 消息到达 EMQX 后, 完成消息传输全流程的时间消耗, 然后根据配置中的选项,计算消息的传输时延,
之后按照时延高低对订阅者、主题进行统计排名

同时, 因为QoS1和QoS2消息可能因为各种原因,导致无法完成传输流程,所以该功能也会在消息传输超时后,根据超时时间尝试将订阅者
加入到排名中

注意:为了避免性能开销,进行统计排名的最低时延为100ms
<a id="orgf0feb6e"></a>

## 配置说明

<a id="org693a7e9"></a>
![image](./assets/slow_subscribers_statistics_2.png)

### 时延计算方法
- 时延阈值

- QoS1
从 *publish* 消息到到EMQX内时开始计算, 直到EMQX收到 *puback* 为止
- QoS2
从 *publish* 消息到到EMQX内时开始计算, 直到EMQX收到 *pubcomp* 为止
*时延阈值* 用来判断订阅者是否可以参与统计, 如果订阅者的时延低于这个值, 将不会进行统计

- 最大统计条数

<a id="org712747a"></a>
这个字段决定统计记录表中数量上限

### 平均时延计算方法
- 有效时长

平均时延采用[指数移动平均算法](https://en.wikipedia.org/wiki/Moving_average#Exponential_moving_average)对每次消息的传输时延进行加权计算,避免了单次抖动或者历史极值对平均值的影响.
*有效时长* 控制统计记录中每一条数据的有效时间, 如果该数据在这个时间范围内, 一直没有被更新过, 将会被移除
(比如发送一条消息后因为时延很长, 被加入到统计记录中, 之后长时间没有再次发送消息, 在超过这个字段后, 将会被清除掉)

采样数的配置为 *`emqx.zone.latency_samples`*, 在 *emqx.conf* 配置中
- 时延统计类型

该值最小值为1,为1时将会忽略所有历史值,这将导致无法避免单次传输抖动对结果的影响,不建议这样做
计算时延的方式,分别为:

而该值越大,历史值的影响也将会越来越大,如果过大,可能导致平均时延更新无法及时的反应出当前真实的时延
1. whole

从消息到达 EMQX 时起,直到消息完成传输时

<a id="orgcd5b688"></a>
2. internal

## 配置说明
从消息到达 EMQX 时起,直到 EMQX 开始投递消息时

![image](./assets/slow_subscribers_statistics_2.png)
3. response

- 时延阈值
*时延阈值* 用来判断订阅者是否可以参与统计,如果订阅者的时延低于这个值,将不会进行统计
- 最大统计条数
这个字段决定统计记录表中数量上限
- 有效时长
*有效时长* 控制统计记录中每一条数据的有效时间,如果该数据在这个时间范围内,一直没有被更新过,将会被移除
(比如发送一条消息后因为时延很长,被加入到统计记录中,之后长时间没有再次发送消息,在超过这个字段后,将会被清除掉)
- 推送间隔
慢订阅统计的数据可以被推送到系统消息 *\(SYS/brokers/\){node}/slow<sub>subs</sub>* 中, *推送间隔* 用来控制推送的时间间隔,如果设置
为0, 则不会进行推送
- 推送QoS
向系统消息进行推送时的QoS
- 推送批量数
向系统消息进行推送采用的是分批模式,如果慢订阅统计数据很大,一次性全部推送,可能会导致堵塞,这个时候可以适当的减小这个值
从 EMQX 开始投递消息时起,直到消息完成传输时

消息完成传输的定义:

1. QoS0

EMQX 开始投递时

2. QoS1

<a id="orgb3ce581"></a>
EMQX 收到客户端的 *puback* 时

3. Qos2
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
3. Qos2
3. QoS2


EMQX 收到客户端的 *pubcomp* 时


<a id="orga6267c1"></a>

## 慢订阅记录

![image](./assets/slow_subscribers_statistics_3.png)
这个标签页下会按照时延,从高到底依次显示订阅者信息,点击 *Client ID* 将会显示订阅者详情,可以通过订阅者详情来进行问题分析

这个标签页下会按照时延, 从高到底依次显示订阅者和主题信息, 点击 *Client ID* 将会显示订阅者详情, 可以通过订阅者详情来进行问题分析
和查找.

![image](./assets/slow_subscribers_statistics_4.png)